-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update kubeflow/manifests to ship correct version of KFP in 0.7? 1.31 #2239
Comments
P0 because blocking 0.7. |
@jingzhang36, should we include version API to .7 release? |
@jlewi @jessiezcc We are working on a new release 0.1.32. If it can be completed by end of this month, let's use it. Otherwise, we can use 0.1.31. I'll update soon after I get more information on 0.1.32. |
@jlewi @jessiezcc Update: (1) we can't make version API to kubeflow 0.7 unfortunately. (2) synced with Yuan on his license related change, which doesn't need to be included in kubeflow 0.7. So in short, I believe kubeflow 0.7 could just use kfp 0.1.31. Thanks! |
Thanks @jingzhang36. @jessiezcc @jingzhang36 @jessiezcc Can someone from KFP update the KFP package in kubeflow/manifests to deploy 0.1.31? |
@jlewi @jessiezcc I did a quick estimation of the workload and it turns out that this time we'll have to include much more than just kfp server image change. Here are two lists of changes inside KFP that we need to sync to Kubeflow https://github.com/kubeflow/pipelines/commits/master/manifests/kustomize and https://github.com/kubeflow/pipelines/commits/master?after=7735a1469478af0d3f5f24a5d0de908031175337+34&path%5B%5D=manifests. Given the number of items that are needed to be synced, I'll have to put a very generous time estimation on this sync, especially when this is the first time I am doing it. Moreover, since this sync would be my second priority item (the first is CI and version API), it is unlikely that I can finish all the changes in 4 days before the end of Sept. And from Sept 30, I'll be on an 8 days OOO until Oct 8. @jlewi , do you have any concern on the timeline? |
@jingzhang36 You might have to run make generate in kubeflow/manifests to regenerate the kustomize tests to pick up the updated image. I don't think we have any specific pipelines tests. |
If the update happens after Oct 8 it will likely miss the 0.7 release. Which means we would just continue to ship 0.1.23. |
0.1.31 appears to be broken because of #2101 |
I have a draft PR to upgrade to 0.1.31 It looks like the UI is unable to contact the metadata service for artifacts and executions. Javascript console shows errors like
In 0.1.31 is the pipelines UI using gRPC to contact the metadata service? Do the manifests need to be updated to create a new virtual service for this endpoint? Are we going to run into issues trying to communicate from the client to the backend using gRPC? i.e. what if the loadbalancer and ingress don't support gRPC? @dushyanthsc @neuromage @rileyjbauer @zhenghuiwang can anyone shed some light here? |
Yes, KFP has switched to using the MLMD gRPC server.
@dushyanthsc has been working with @zhenghuiwang to update the manifests in KF Metadata so it follows the same setup as well.
Can you clarify this? Which loadbalance and ingress point are you referring to? Thanks. |
I'd like to confirm what we want to do in KF. Will we still deploy a metadata-envoy server that proxies from grpc-web to MLMD gRPC server? If my understanding is right, there won't be UI work for these porting tasks. Is that correct? |
With this change - kubeflow/manifests#370 we are deploying - mlmd-grpc (( K8-deployment & service) and envoy(deployment & service) along with the kf-metadata server which means we currently have two mlmd servers running (gRPC & kf-metadata). |
@neuromage the error I'm seeing is that the pipelines UI I unable to display artifacts and executions. The call to POST /ml_metadata.MetadataStoreService Is failing. I believe this request is being issued by client side Javascript. This means the metadata service needs to be exposed through the cluster ingress and ISTIO gateway; i.e. we probably need a virtual service adding an appropriate mapping through the gateway. What is the backend service this request is supposed to be routed to? Is this request using gRPC or regular http? If it's using gRPC does that create a risk that the UI will break if people's networking setup doesn't support gRPC. For example I think GCP L7 load balancer might have added gRPC support only recently. |
|
@jlewi Let me try to explain how requests are routed. (Please correct me if there's something wrong/inaccurate) EDIT: the following is about KFP lite deployment, take it as a reference, but Kubeflow can use a different set up. Does ISTIO gateway only need to expose those accessed from public? If my understanding is correct, there's no risk. Browser still only talks to the frontend node server. All the extra routing happens in the cluster. |
Maybe I'm missing something, why do we still need kf-metadata? |
Sorry, but I'm not sure I understand. The KFP UI is proxying those requests to the gRPC backend via Envoy. By virtual service, do you mean the Envoy proxy config? If so, this is already set up.
It goes to the gRPC MLMD server.
It uses gRPC-Web
Sorry, I don't understand this. Whose UI set up would we break? |
@jlewi what client is sending this request? This setup is intended for the KFP UI. |
Realized difference between Kubeflow and KFP lite deployment. In KFP lite, we don't have a reverse proxy, so the frontend node server is used as a reverse proxy to route different API requests to their correct services. In Kubeflow, looks like we can set up a global reverse proxy. Then we can also choose to let the request directly go to metadata-envoy-service too. (Both would work, I think we can choose either way. We need to add a route for either metadata-envoy-service or ml-pipeline-ui.) Just adding my understanding here, I'm on vacation, so won't follow up further. |
@neuromage Is coming from the artifacts tab in the KFP UI. |
@neuromage @Bobgy Do you remember what was the last KFP version where the UI worked in both lite and one-click deployments? |
@Ark-kun The problem described here (metadata UI, EDIT, artifact and execution list UI in KFP) is newly implemented in KFP UI for KFP lite. It never worked before in Kubeflow. |
@Bobgy just to clarify I'm talking about the KFP UI. Here are two screenshots Clicking the artifacts tab in pipelines UI shows error code 5 Looking at the JS network I see a 404 to The developer console at the JS network shows a 404 to Which service are these requests supposed to be routed to? It looks like these are grpc-web requests? My guess is that these should be directed to the metadata-grpc-service. Since its using gRPC-web we need to leverage envoy to do http to gRPC proxying. Since Kubeflow is using ISTIO we should be able to configure ISTIO to do the http to gRPC proxying for metadata rather then explicitly installing a separate envoy proxy. I found this blog post. The standalone metadata UI looks fine. My conjecture is that the standalone (separate from KFP) metadata UI is still using http to contact the metadata service. I'm guessing the latest release of KFP was updated to use gRPC-web in its javascript client. The corresponding ISTIO changes to properly proxy web to gRPC however are missing in Kubeflow and that's why the Artifact and Executions UIs in KFP aren't working. |
To reproduce this follow these instructions to deploy a kubeflow cluster with kubeflow/manifests#392
If you make changes to kubeflow/manifests you can update ${KFAPP}/kustomize by doing
|
Sorry for mentioning metadata UI, it's a confusing term in this context. I meant exactly artifact and execution tabs in KFP UI. I haven't worked on metadata UI before, so I don't know much about it. @jlewi, I think you are right with other parts. and using ISTIO for grpcweb proxy sounds like a reasonable solution to me too. |
/cc @quanjielin |
As discussed offline, we may want to route |
Sent out kubeflow/manifests#424 to hopefully fix this. |
Keeping this open until we've verified the fix with Jeremy's changes as well. |
Update:
|
* Update the images to 0.1.31 * We need to include the pipeline-visualization-service Related to kubeflow/pipelines#2239 Use a config map to specify the config for the ml-pipeline backend server * Currently it is hardcoded to a config file baked into the docker image. * Hardcoding a config into the dockerfile generally doesn't seem like a good idea since its not easy to configure. * Per kubeflow/pipelines#2101 the hard coded config is actually broken it is not specifying the hostname and port of the visualization server.
JS developers console shows a 503 for the following URL |
Looks like it was an issue with the port; kubeflow/manifests#430 should be updated. |
* Update the images to 0.1.31 * We need to include the pipeline-visualization-service Related to kubeflow/pipelines#2239 Use a config map to specify the config for the ml-pipeline backend server * Currently it is hardcoded to a config file baked into the docker image. * Hardcoding a config into the dockerfile generally doesn't seem like a good idea since its not easy to configure. * Per kubeflow/pipelines#2101 the hard coded config is actually broken it is not specifying the hostname and port of the visualization server.
* Update the images to 0.1.31 * We need to include the pipeline-visualization-service Related to kubeflow/pipelines#2239 Use a config map to specify the config for the ml-pipeline backend server * Currently it is hardcoded to a config file baked into the docker image. * Hardcoding a config into the dockerfile generally doesn't seem like a good idea since its not easy to configure. * Per kubeflow/pipelines#2101 the hard coded config is actually broken it is not specifying the hostname and port of the visualization server.
Manifests have been merged. |
Is there a 0.7 branch? In the master considered to be the 0.7? |
We are trying to finalize Kubeflow 0.7 by end of month.
Which version of KFP should be shipped in 0.7.
We are currently shipping KFP 0.1.23.
It looks like this is about 1 month old.
It looks like there was a fairly recent release
0.1.31
https://github.com/kubeflow/pipelines/releases
@IronPan @jessiezcc Should we ship 0.1.31 in 0.7? Are there additional improvements that we would like to ship in 0.7? If so do we have an ETA for when they will land?
The text was updated successfully, but these errors were encountered: