Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tf serving deployer component needs to be able to associate access credentials with launched pods #3037

Closed
amygdala opened this issue Feb 11, 2020 · 7 comments
Assignees
Labels
lifecycle/stale The issue / pull request is stale, any activities remove this label.

Comments

@amygdala
Copy link
Contributor

This component: https://github.com/kubeflow/pipelines/blob/master/components/kubeflow/deployer/component.yaml
..does not work on a KF install, because the deployed pods don't have access to the user-gcp-sa credentials. (I confirmed this by running the pipeline on a cluster whose nodes have all-cloud-platform scope, and it worked there.) This issue breaks the mnist pipelines example in the docs.

e.g., when I have rolled my own version of this solution, I specified the yaml file for the tf-serving deployment to mount the secret with the gcp-user-sa credentials.

What's the correct, more general solution for this component? Pass in an arg that indicates how to set up the credentials for the launched pods?

cc @Ark-kun @IronPan

@amygdala
Copy link
Contributor Author

Also... can you think of any quick fix that would let us fix the pipelines example featured in the KF docs, which uses this component?

@amygdala
Copy link
Contributor Author

I discovered another issue with this same component:
if there is already a serving deployment in the cluster, then a second deployment won't come up successfully. It detects the pod for the previous deployment as well and seems to wait for them both to come up (?), looping for a long time, then finally terminating with error. (sorry, I've deleted the logs, but it's easy to repro).

If you delete the first deployment before running this component again, then things work okay. But I assume that is not WAI.
LMK if I should file a separate bug for this.

@Ark-kun
Copy link
Contributor

Ark-kun commented Feb 11, 2020

@IronPan and @numerology might have some info about the secrets and the Workflow Identity that can make them unnecessary on GKE.
I also see the new community-creates serving component: https://github.com/kubeflow/pipelines/tree/master/components/kubeflow/kfserving

@amygdala
Copy link
Contributor Author

amygdala commented Feb 11, 2020

I believe that workflow identity is coming to both KF core and 'standalone' KFP, but until it does, we should mark clearly the components that don't work on KF. Would there be others like this? (In this case, I think this component did used to work on KF, a few releases ago when KF nodes used a more permissive SA, but once that got tightened down it broke.)

(btw, I view the Hosted KFP setup that gives cluster nodes all-platform scope, a temporary hack that we shouldn't design for :)

@numerology
Copy link

Indeed, this is definitely a pain and we don't have a very elegant/long-term solution during the transition phase from GCP SA to WI. However, perhaps a temporary solution could be that in the component we mount the GCP SA secret only when there is one. This can be detected by kubectl. And provide better error message if there is no GCP SA and cluster is not full-scope

@stale
Copy link

stale bot commented Jun 24, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 24, 2020
@stale
Copy link

stale bot commented Jul 1, 2020

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

@stale stale bot closed this as completed Jul 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale The issue / pull request is stale, any activities remove this label.
Projects
None yet
Development

No branches or pull requests

5 participants