-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to read from Google Cloud Storage #5069
Conversation
✅ Deploy Preview for meta-velox canceled.
|
Hello @majetideepak, here is the second PR of the GCS Support as agreed. It cover only the read functionality. Could you please review it? Thank you in advance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have this reviewed by @majetideepak .
.circleci/dist_compile.yml
Outdated
@@ -417,6 +417,12 @@ jobs: | |||
- run: | |||
name: "Run Unit Tests" | |||
command: | | |||
conda init bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, I had to revert this file and change because it broke scheduled runs. Can you rebase and make this change again in .circleci/config.yml
@majetideepak @akashsha1 you are the ones most familiar with the cloud storage connectors code and interface. Could you help review this one? |
I have reviewed this before as part of the larger PR. I will make a final pass by EOD. Sorry for the delay. |
Thank you from the bottom of my heart! |
@majetideepak Finally, a build succeeded with all the indicators passing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tigrux PR is clean and looks nice! I have some minor comments. Thanks!
Hello @majetideepak. I updated the PR. Could you take a look again? Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tigrux
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Ignore this, wrong tab in my browser.
This change adds support to read from GCS (Google Cloud Storage). The provided filesystem expects uris with the protocol gs:// following the convention gs://bucket/object. The dependencies of this connector are already installed via setup-adapters.sh. The main dependency is the Google Cloud SDK for C++. The build system is only slighly increased. Assuming that the dependencies have already been installed, then only 5s are added to the build time: Build time without the GCS connector: 3m50 Build time with the GCS connector: 3m55 The connector is able to open uris and read from them. However, it cannot rename, create directories nor remove directories. The support to write to gcs buckets is going to be provided later.
This change addresses comments. A couple of changes may be needed later: 1. Take into account of the memory usage the blocks allocated by GCS. 2. Allocate resulting strings using a memory pool.
Fix wording of some comments. Move example and tests to their respective directories. Move configuration to HiveConfig.
- Factorize redundant error string. - Describe the memory usage.
Hello @majetideepak. There are 3 approvals. Could this be merged? Thank you. |
|
@kgpai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This change adds support to read from GCS (Google Cloud Storage).
The provided filesystem expects uris with the protocol gs:// following the
convention gs://bucket/object.
The dependencies of this connector are already installed via
setup-adapters.sh. The main dependency is the Google Cloud SDK for C++.
The build system is only slighly increased.
Assuming that the dependencies have already been installed, then only 5s are added to the build time:
Build time without the GCS connector: 3m50
Build time with the GCS connector: 3m55
The connector is able to open uris and read from them
However, it cannot rename, create directories nor remove directories.
The support to write to gcs buckets is going to be provided later.