Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCP Integration #22

Closed
manesioz opened this issue Dec 5, 2019 · 9 comments
Closed

GCP Integration #22

manesioz opened this issue Dec 5, 2019 · 9 comments
Labels
enhancement New feature or request

Comments

@manesioz
Copy link

manesioz commented Dec 5, 2019

First of all, thank you for open-sourcing this excellent tool!

My team uses GCP not AWS so if metaflow could be integrated that would be great. I'm sure its on your roadmap but just putting it out there :)

@savingoyal
Copy link
Collaborator

@manesioz Curious, what does your tech stack look like on GCP? Kubernetes +GCS + Airflow?

@savingoyal savingoyal added the enhancement New feature or request label Dec 5, 2019
@manesioz
Copy link
Author

manesioz commented Dec 5, 2019

We actually run Airflow on Cloud Composer, and our data lake is in BigQuery. We're currently considering migrating to Kubernetes

@savingoyal
Copy link
Collaborator

Got it. We have a similar issue open at #16.

@barrywhart
Copy link

At Mailchimp, we also use Cloud Dataflow.

We could potentially contribute to the effort to support GCP. In particular, we have a battle-tested @retry decorator that retries according to Google Cloud's documented policy: https://cloud.google.com/apis/design/errors.

We would be happy to share this code for inclusion in Metaflow. Our decoraor incorporates a fork of the Apache 2.0 licensed retrying package, which appears to be unmaintained at this point. The fork was necessary because on GCP, there is a case where the wait period between retries depends on the type of error, which was not supported by retrying:

For 429 RESOURCE_EXHAUSTED errors, the client may retry at the higher level with minimum 30s delay. Such retries are only useful for long running background jobs.

@savingoyal
Copy link
Collaborator

@barrywhart We would be happy to engage on a POC. @jaychia already has a PR out for GCS integration.

@barrywhart
Copy link

@savingoyal: We are not currently using Metaflow, but I see some potential for using it in some cases as an alternate to Airflow (complex!) and bash scripts (may not always be powerful enough for our needs). So I want to help, but also need to time box my involvement for now.

Can you point me to the GCS PR? Any thoughts on how the package might accomodate multiple @retry implementations? Could it literally just be a different decorator in a different module, or is there a need for a single, "polymorphic" @retry decorator?

@jaychia
Copy link

jaychia commented Mar 9, 2020

@savingoyal: We are not currently using Metaflow, but I see some potential for using it in some cases as an alternate to Airflow (complex!) and bash scripts (may not always be powerful enough for our needs). So I want to help, but also need to time box my involvement for now.

Can you point me to the GCS PR? Any thoughts on how the package might accomodate multiple @retry implementations? Could it literally just be a different decorator in a different module, or is there a need for a single, "polymorphic" @retry decorator?

#153 - please feel free to contribute or comment

The Metaflow S3 datastore internally does its own error handling for storage-client-related retries (retry N number of times if an error that isn't metaflow-related is thrown). I replicated that logic for the GCS datastore. See:
https://github.com/Netflix/metaflow/pull/153/files#diff-88a07e3f313e3d7fec566c156ed68baeR28-R51

Also, tenacity is a great retrying package that should be able to do the custom retry logic that you mentioned (wait period depends on type of error).

@candalfigomoro
Copy link

Is PR #153 still relevant?

sappier pushed a commit to sappier/metaflow that referenced this issue Dec 2, 2021
@jinnovation
Copy link

As of today, Metaflow now appears to support GCP. 🔥

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants