Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

permission error when trying to run pipeline on kubeflow #1479

Closed
ran-haim opened this issue Nov 8, 2021 · 20 comments
Closed

permission error when trying to run pipeline on kubeflow #1479

ran-haim opened this issue Nov 8, 2021 · 20 comments
Assignees

Comments

@ran-haim
Copy link

ran-haim commented Nov 8, 2021

Hi,
I am trying to run the demo notebook sklearn-project on a local kubernetes.
I have installed kubeflow.

I get this error when trying to send the pipeline to the api server:
400 Client Error: Bad Request for url: http://mlrun-api:8080/api/projects/sk-project/pipelines?namespace=mlrun&experiment=sk-project-main: details: {'reason': 'MLRunBadRequestError("Failed creating pipeline: HTTPConnectionPool(host='ml-pipeline.mlrun.svc.cluster.local', port=8888): Max retries exceeded with url: /apis/v1beta1/experiments (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f9f94278ed0>: Failed to establish a new connection: [Errno -2] Name or service not known'))")'}

I noticed it tried to call the ml-pipeline on the wrong namespace (it uses the one mlrun is installed on).
I changed the namespace to "kubeflow" and now I get this error:
400 Client Error: Bad Request for url: http://mlrun-api:8080/api/projects/sk-project/pipelines?namespace=kubeflow&experiment=sk-project-main: details: {'reason': 'MLRunBadRequestError('Failed creating pipeline: (400)\nReason: Bad Request\nHTTP response headers: HTTPHeaderDict({\'content-type\': \'application/json\', \'date\': \'Mon, 08 Nov 2021 13:40:04 GMT\', \'content-length\': \'708\', \'x-envoy-upstream-service-time\': \'1\', \'server\': \'istio-envoy\', \'x-envoy-decorator-operation\': \'ml-pipeline.kubeflow.svc.cluster.local:8888/*\'})\nHTTP response body: {"error":"Validate experiment request failed.: Invalid input error: Invalid resource references for experiment. Expect one namespace type with owner relationship. Got: []","code":3,"message":"Validate experiment request failed.: Invalid input error: Invalid resource references for experiment. Expect one namespace type with owner relationship. Got: []","details":[{"@type":"type.googleapis.com/api.Error","error_message":"Invalid resource references for experiment. Expect one namespace type with owner relationship. Got: []","error_details":"Validate experiment request failed.: Invalid input error: Invalid resource references for experiment. Expect one namespace type with owner relationship. Got: []"}]}\n')'}

also keep in mind that the ml-pipeline service is installed on kubeflow, but I probably need to add the experiment on kubeflow-user-example-com namespace (the default example user namespace created when installing kubeflow).

In any case - what am I doing wrong?

@Hedingber
Copy link
Contributor

Hi @ran-haim
Thanks for reaching out!
What version of Kubeflow pipelines are you using ?

@Hedingber Hedingber self-assigned this Nov 8, 2021
@ran-haim
Copy link
Author

ran-haim commented Nov 8, 2021

I installed 1.4 using this guide:
https://github.com/kubeflow/manifests/tree/v1.4.0

@Hedingber
Copy link
Contributor

I see, we use 1.0.4, while semver-wise 1.4.0 should be compatible, it seems like something changed in the schema there, I suggest you to downgrade and use 1.0.4

@Hedingber
Copy link
Contributor

In the meanwhile I'll try to bring up 1.4.0 myself and see if this is an easy fix

@ran-haim
Copy link
Author

ran-haim commented Nov 8, 2021

Kubeflow version is 1.4, but it seems that pipelines version is actually 1.7.
Should I try it with kubeflow 1.0.4?

I thought this is due to some permissions problem, should'nt I give mlrun permissions to "kubeflow-user-example-com" namespace?

@Hedingber
Copy link
Contributor

Oh yeah it might be, now I get what's behind the Expect one namespace type with owner relationship error, yeah worth trying giving the permissions

@ran-haim
Copy link
Author

ran-haim commented Nov 8, 2021

I am new to kubeflow and mlrun - how do I do that?
also, can I tell mlflow api server to go to ml-pipeline.kubeflow.svc.cluster.local, but use "kubeflow-user-example-com" namespace for the pipeline?

@ran-haim
Copy link
Author

ran-haim commented Nov 9, 2021

still not able to get it to work, were you able to do it?
I also still don't get how to specify different namespace for the ml-pipeline service and the namespace for which the experiment should run.

Is there a guide for kubeflow integration I am missing?

@Hedingber
Copy link
Contributor

Hedingber commented Nov 9, 2021

Try this in order to add permissions to the mlrun-api service:

  1. kubectl -n default-tenant get role mlrun-api -o yaml > cluster-role.yaml
  2. Edit the cluster-role.yaml file - Change the kind: Role to kind: ClusterRole
  3. Execute kubectl apply -f cluster-role.yaml
  4. kubectl -n mlrun get rolebinding mlrun-api -o yaml > cluster-role-binding.yaml
  5. Edit the cluster-role-binding.yaml file - Change the kind: RoleBinding to kind: ClusterRoleBinding, Change kind: Role (in the roleRef) to kind: ClusterRole, add namespace: mlrun to the mlrun-api subject
  6. Execute kubectl apply -f cluster-role-binding.yaml

@ran-haim
Copy link
Author

ran-haim commented Nov 9, 2021

nope.
same error.

@ran-haim
Copy link
Author

ran-haim commented Nov 9, 2021

I think it is something to do with using the same namespace for the ml-pipeline service and the namespace for which the experiment should run.
Don't know how to change it though....

@Hedingber
Copy link
Contributor

I see, when you're doing skproj.run() (to run the workflow) you can pass it a namespace, can you try that ?

@ran-haim
Copy link
Author

yes, I know - but as I said if I use namespace="kubeflow" in skproj.run(), it does find the ml-pipline service, but I think it also trying to create the pipeline in that namespace, where it should be "kubeflow-user-example-com"

@Hedingber
Copy link
Contributor

ok, and if you put there kubeflow-user-example-com ?

@ran-haim
Copy link
Author

ran-haim commented Nov 10, 2021

then we go back to the original problem where it tries to call ml-pipline and cannot find it due to the domain name...

@Hedingber
Copy link
Contributor

Ok I see
Indeed looks like we currently don't support setting different namespaces for the two, as you can see here we're creating the kfp Client with the namespace you give - which should be the namespace of the ml-pipeline service - so kubeflow
But then, when we're creating the experiment, we don't pass a namespace - as you can see here this is ok for single user deployment which is what we use in the enterprise version of MLRun, you're probably installing it with multi user meaning the namespace for the experiment should be different, which is currently not supported.
I suggest to try and install the Kubeflow pipelines standalone deployment

@ran-haim
Copy link
Author

ok, makes sense due to the error I got.
I will test it on single-user mode.

@Hedingber
Copy link
Contributor

Hi @ran-haim, any updates ?

@Hedingber
Copy link
Contributor

No respond, closing
Feel free to reopen if you need more help

@ran-haim
Copy link
Author

Sorry for not responding.
Basically running in single user mode isn't relevant for us, so I just abandoned it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants