-
Notifications
You must be signed in to change notification settings - Fork 872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KED-928] Full Google Cloud Platform Support #11
Comments
Hi @ulli-snyman, thank you for contributing to Kedro with adding a feature request and potentially adding support for new data sets. Adding support for GCS is something we would love to have as part of |
Hey @idanov, Getting ready for my first PR towards this issue, covering the CSVDataSet method, The common approach for testing with CGP services is to actually read/write to the service. Ive set the tests to take in GCP Configuration from ENV Vars, this is the best way I can see this working out... Would you be fine with this or have you got any other ideas as to how we could test this? |
Hi @ulli-snyman, thank you for your interest in contributing to Kedro! I'm the QA on the Kedro team. You should be able to mock out calls to the GCP library. An example is shown here. You can also see how the developers test their client code here. If you need any further assistance, please let us know. |
I've updated the title with our internal ticket number to keep track of this more easily. :) |
Hey @lorenabalan, |
Totally fine, just wanted to check in and make sure that you're not stuck on something from our end. :) |
Hey there, |
Hi @plauto! We would love the help! But it might be a good idea to just sync with @ulli-snyman as he mentioned that he has started working on a PR. Let's give him until the end of the week to reply about how far he's gotten and whether or not he needs help. If there's no status update then it's all yours. |
Sounds good to me! Thanks @yetudada |
Hey! If that’s ok, can I start working on it this week? |
Looks like there's still no reply, and sure, go for it! @plauto |
@plauto How's the development coming along? If you would like our early feedback/comments, feel free to open a draft PR so we can see if you are on the right track :) |
@921kiyo It’s going well, but these days was a bit hectic and I had no time to finish working on unit
tests. I’ll push something on Monday so that you guys can take a look at it :)
|
@921kiyo I am going to push a draft PR. Sorry for being a bit late on this, but I could find some time to work on it end of last week. There are still a couple of things to finish (e.g. unit tests for Versioned Dataset which have a bit of complexity due to the way I have structured unit tests). I look forward to get a feedback from you, when you will have some time. After that it shouldn't take long to finish up the rest! |
This blog is the general information for the feature. You got good work for this blog. We have a developing our creative content of this mind. Thank you for this blog. This for very interesting and useful. |
@ulli-snyman and everyone who has been watching this issue. We're excited to announce that In a following release of Kedro, we will have:
I'll close this issue when we have finished full support of GCS. |
@ulli-snyman This PR has been addressed and full Google Cloud Support will be available in the next release. The datasets are already available in the They all use |
Description
In kedro Docs, GCP is mentioned but cannot find any references to data connectors.
Add data connectors for Google Cloud Storage and Google Big Query
Context
Allow the use of the Google Cloud Products for me and my team of data scientists. GCP is a large cloud provider and is a very popular IAAS that is used by many people. GCS is the Google equivalent to AWS S3 and Big Query is Googles hosted database system.
Possible Implementation
using the google python client library, create the following:
kedro.io.gcs_csv
kedro.io.gcs_parquet
kedro.io.gcs_hdfs
kedrio.io.gcs_pickle
kedrio.io.gcs_json
kedro.io.gbq [load df from GBQ]
Possible Alternatives
in the S3 implematations, replace s3fs with boto3 to allow for access to both S3 and GCS with the same code. See GCP simple migration method outlined here, however this does not allow for full access to the GCS product. ie service accounts etc. and could break some functions in the s3 implementations.
Todo
Checklist
The text was updated successfully, but these errors were encountered: