-
Notifications
You must be signed in to change notification settings - Fork 17
Authentication questions #4
Comments
(Sorry I didn't see this until now.) Have you tried running gcloud auth login to make sure you have a valid credential? If yes, your default cloud project might be a different one from where you want to run Dockerflow. To change, you can run "gcloud init". One of those ought to fix it. If not, maybe it's the bucket for your workspace that it's not able to write to. |
I am running into a similar authentication error, but I was under the assumption that I could just use the default service account for authentication since that's what I've been doing with all of my other pipeline requests thus far. Here's the full output that I got after attempting to run a particular workflow, with some paths to (potentially) sensitive data redacted:
It looks like maybe the service account info is not making it into the individual pipeline requests? I see that the info is in the pipelineArgs section of the workflow but not for the individual steps... |
This looks like a different error from Sean's, since you're able to start the pipeline successfully. That's progress! I think I've seen the error you got (401, Unauthenticated) when running using DirectPipelineRunner when I closed my laptop or it went to sleep during the middle of the pipeline run, and local Dataflow lost internet access to make the web service calls. Have you tried running Dataflow itself in the cloud with the default runner (either omit the --runner option or use --runner=DataflowPipelineRunner)? If yes, can you share the command-line call, or email me privately with enough info that I can try reproducing? Thanks! |
And FYI that Dockerflow uses an access token for API calls and GCS access. The code is this: String token = com.google.api.client.googleapis.auth.oauth2.GoogleCredential.getApplicationDefault().getAccessToken(); You can test that it works with a GCS bucket you have access to, like this: curl https://storage.googleapis.com/MY-BUCKET/MY-PATH?access_token=MY-TOKEN where MY-BUCKET/MY-PATH is the GCS path (without the gs:// prefix) and MY-TOKEN is obtained with the code above. If this ends up as a common enough problem, I can create a super-simple command-line to check the access token only. |
Cool, thanks for the info about the access token. I was able to get it working after running |
Great, glad that it's working! The cool execution graph is the whole reason I wrote Dockerflow :-) |
Another |
In theory it should work. Things to know if you're running O(10k) concurrent tasks: The Pipelines API will queue your work if you don't have enough cores of quota. It's recommended to provide more zones, like "us-*", so you can spread out work more. Dockerflow will abort by default if any of the individual 10k tasks doesn't complete. You can pass the flag --abort=false to turn this off (I'll add it to the --help message; just realized it's not documented. It's also not tested yet, so lmk if it doesn't work right.) Otherwise, I'm looking forward to hearing how it works for you! |
These are great points, @jbingham. The 10k tasks will not be concurrent necessarily due to dependencies, but hundreds may be running simultaneously. Is there a place where I can look at the various quotas that might impact a dockerflow run, particularly with respect to cores, disk, and memory? |
The main quotas are for Compute Engine (cores, disk, IP addresses). You can check and increase them here: |
Perfect. |
Thanks for the new project! This looks quite interesting. I wanted to give this a quick test and ran into the following problem. I have activated cloud dataflow API (and the others) and I thought that would allow me to run Dockerflow workflows. What did I miss?
The text was updated successfully, but these errors were encountered: