Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to read from Google Cloud Storage #5069

Closed
wants to merge 6 commits into from
Closed

Add support to read from Google Cloud Storage #5069

wants to merge 6 commits into from

Conversation

tigrux
Copy link
Contributor

@tigrux tigrux commented May 30, 2023

This change adds support to read from GCS (Google Cloud Storage).
The provided filesystem expects uris with the protocol gs:// following the
convention gs://bucket/object.

The dependencies of this connector are already installed via
setup-adapters.sh. The main dependency is the Google Cloud SDK for C++.

The build system is only slighly increased.
Assuming that the dependencies have already been installed, then only 5s are added to the build time:

Build time without the GCS connector: 3m50
Build time with the GCS connector: 3m55

The connector is able to open uris and read from them
However, it cannot rename, create directories nor remove directories.

The support to write to gcs buckets is going to be provided later.

@netlify
Copy link

netlify bot commented May 30, 2023

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit c2d6d69
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/64ac9abc8e71f3000972040d

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 30, 2023
@tigrux tigrux changed the title WIP: GCS Read support Add support to read from Google Cloud Storage Jun 12, 2023
@tigrux tigrux marked this pull request as ready for review June 12, 2023 14:54
@tigrux
Copy link
Contributor Author

tigrux commented Jun 14, 2023

Hello @majetideepak, here is the second PR of the GCS Support as agreed. It cover only the read functionality. Could you please review it? Thank you in advance.

Copy link
Contributor

@kgpai kgpai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have this reviewed by @majetideepak .

@@ -417,6 +417,12 @@ jobs:
- run:
name: "Run Unit Tests"
command: |
conda init bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, I had to revert this file and change because it broke scheduled runs. Can you rebase and make this change again in .circleci/config.yml

velox/connectors/hive/storage_adapters/gcs/GCSUtil.h Outdated Show resolved Hide resolved
@pedroerp
Copy link
Contributor

@majetideepak @akashsha1 you are the ones most familiar with the cloud storage connectors code and interface. Could you help review this one?

@majetideepak
Copy link
Collaborator

I have reviewed this before as part of the larger PR. I will make a final pass by EOD. Sorry for the delay.
@tigrux Can you rebase with main when you get a chance? There is conflict.

@tigrux
Copy link
Contributor Author

tigrux commented Jun 26, 2023

I have reviewed this before as part of the larger PR. I will make a final pass by EOD. Sorry for the delay. @tigrux Can you rebase with main when you get a chance? There is conflict.

Thank you from the bottom of my heart!

@tigrux
Copy link
Contributor Author

tigrux commented Jun 26, 2023

@majetideepak Finally, a build succeeded with all the indicators passing.

Copy link
Collaborator

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tigrux PR is clean and looks nice! I have some minor comments. Thanks!

@tigrux
Copy link
Contributor Author

tigrux commented Jul 5, 2023

Hello @majetideepak. I updated the PR. Could you take a look again? Thanks.

@tigrux tigrux requested review from majetideepak and kgpai July 5, 2023 21:29
Copy link
Collaborator

@majetideepak majetideepak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @tigrux

@majetideepak
Copy link
Collaborator

@pedroerp, @kgpai can you please help merge this PR?

Copy link
Contributor

@paul-amonson paul-amonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Ignore this, wrong tab in my browser.

This change adds support to read from GCS (Google Cloud Storage).
The provided filesystem expects uris with the protocol gs:// following the
convention gs://bucket/object.

The dependencies of this connector are already installed via
setup-adapters.sh. The main dependency is the Google Cloud SDK for C++.

The build system is only slighly increased.
Assuming that the dependencies have already been installed, then only 5s are added to the build time:

Build time without the GCS connector: 3m50
Build time with the GCS connector:    3m55

The connector is able to open uris and read from them.
However, it cannot rename, create directories nor remove directories.

The support to write to gcs buckets is going to be provided later.
This change addresses comments.
A couple of changes may be needed later:
1. Take into account of the memory usage the blocks allocated by GCS.
2. Allocate resulting strings using a memory pool.
Fix wording of some comments.
Move example and tests to their respective directories.
Move configuration to HiveConfig.
- Factorize redundant error string.
- Describe the memory usage.
@tigrux tigrux requested a review from akashsha1 July 11, 2023 12:12
@tigrux
Copy link
Contributor Author

tigrux commented Jul 11, 2023

Hello @majetideepak. There are 3 approvals. Could this be merged? Thank you.

@akashsha1
Copy link
Contributor

Hello @majetideepak. There are 3 approvals. Could this be merged? Thank you.

@pedroerp / @kgpai - as well if you can help with PR merge?

@majetideepak
Copy link
Collaborator

@tigrux I don't have the merge rights. @pedroerp or @kgpai should be able to help merge.

@facebook-github-bot
Copy link
Contributor

@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@kgpai merged this pull request in 5273425.

@tigrux tigrux deleted the gcs-support-read branch September 20, 2023 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants