Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update kubeflow/manifests to ship correct version of KFP in 0.7? 1.31 #2239

Closed
jlewi opened this issue Sep 25, 2019 · 39 comments · Fixed by kubeflow/manifests#424
Closed

Update kubeflow/manifests to ship correct version of KFP in 0.7? 1.31 #2239

jlewi opened this issue Sep 25, 2019 · 39 comments · Fixed by kubeflow/manifests#424

Comments

@jlewi
Copy link
Contributor

jlewi commented Sep 25, 2019

We are trying to finalize Kubeflow 0.7 by end of month.

Which version of KFP should be shipped in 0.7.

We are currently shipping KFP 0.1.23.
It looks like this is about 1 month old.
It looks like there was a fairly recent release
0.1.31
https://github.com/kubeflow/pipelines/releases

@IronPan @jessiezcc Should we ship 0.1.31 in 0.7? Are there additional improvements that we would like to ship in 0.7? If so do we have an ETA for when they will land?

@jlewi
Copy link
Contributor Author

jlewi commented Sep 25, 2019

P0 because blocking 0.7.

@jessiezcc
Copy link
Contributor

@jingzhang36, should we include version API to .7 release?

@jingzhang36
Copy link
Contributor

@jlewi @jessiezcc We are working on a new release 0.1.32. If it can be completed by end of this month, let's use it. Otherwise, we can use 0.1.31. I'll update soon after I get more information on 0.1.32.

@jingzhang36
Copy link
Contributor

@jlewi @jessiezcc Update: (1) we can't make version API to kubeflow 0.7 unfortunately. (2) synced with Yuan on his license related change, which doesn't need to be included in kubeflow 0.7.

So in short, I believe kubeflow 0.7 could just use kfp 0.1.31.

Thanks!

@jlewi
Copy link
Contributor Author

jlewi commented Sep 26, 2019

Thanks @jingzhang36.

@jessiezcc @jingzhang36 @jessiezcc Can someone from KFP update the KFP package in kubeflow/manifests to deploy 0.1.31?

@jingzhang36
Copy link
Contributor

@jlewi two things: (1) does kubeflow side has some extra tests we need to do before bumping kfp version (2) I'll do the version change after getting final confirmation from Yang @IronPan

@jingzhang36
Copy link
Contributor

@jlewi @jessiezcc I did a quick estimation of the workload and it turns out that this time we'll have to include much more than just kfp server image change. Here are two lists of changes inside KFP that we need to sync to Kubeflow https://github.com/kubeflow/pipelines/commits/master/manifests/kustomize and https://github.com/kubeflow/pipelines/commits/master?after=7735a1469478af0d3f5f24a5d0de908031175337+34&path%5B%5D=manifests.

Given the number of items that are needed to be synced, I'll have to put a very generous time estimation on this sync, especially when this is the first time I am doing it. Moreover, since this sync would be my second priority item (the first is CI and version API), it is unlikely that I can finish all the changes in 4 days before the end of Sept. And from Sept 30, I'll be on an 8 days OOO until Oct 8.

@jlewi , do you have any concern on the timeline?

@jlewi
Copy link
Contributor Author

jlewi commented Sep 26, 2019

@jingzhang36 You might have to run make generate in kubeflow/manifests to regenerate the kustomize tests to pick up the updated image. I don't think we have any specific pipelines tests.

@jlewi
Copy link
Contributor Author

jlewi commented Sep 26, 2019

If the update happens after Oct 8 it will likely miss the 0.7 release. Which means we would just continue to ship 0.1.23.

@jlewi
Copy link
Contributor Author

jlewi commented Sep 27, 2019

0.1.31 appears to be broken because of #2101

@jlewi
Copy link
Contributor Author

jlewi commented Sep 27, 2019

I have a draft PR to upgrade to 0.1.31
kubeflow/manifests#392

It looks like the UI is unable to contact the metadata service for artifacts and executions. Javascript console shows errors like

Cannot POST /ml_metadata.MetadataStoreService/GetArtifacts

In 0.1.31 is the pipelines UI using gRPC to contact the metadata service?

Do the manifests need to be updated to create a new virtual service for this endpoint?

Are we going to run into issues trying to communicate from the client to the backend using gRPC? i.e. what if the loadbalancer and ingress don't support gRPC?

@dushyanthsc @neuromage @rileyjbauer @zhenghuiwang can anyone shed some light here?

@dushyanthsc
Copy link
Contributor

My understanding was that UI should still be talking to kf-metadata and the work to port the changes from pipelines is not done yet. Adding @rmgogogo @Bobgy who are also working on metadata UI.

@neuromage
Copy link
Contributor

In 0.1.31 is the pipelines UI using gRPC to contact the metadata service?

Yes, KFP has switched to using the MLMD gRPC server.

Do the manifests need to be updated to create a new virtual service for this endpoint?

@dushyanthsc has been working with @zhenghuiwang to update the manifests in KF Metadata so it follows the same setup as well.

Are we going to run into issues trying to communicate from the client to the backend using gRPC? i.e. what if the loadbalancer and ingress don't support gRPC?

Can you clarify this? Which loadbalance and ingress point are you referring to? Thanks.

@Bobgy
Copy link
Contributor

Bobgy commented Sep 27, 2019

I'd like to confirm what we want to do in KF.

Will we still deploy a metadata-envoy server that proxies from grpc-web to MLMD gRPC server?

If my understanding is right, there won't be UI work for these porting tasks. Is that correct?

@dushyanthsc
Copy link
Contributor

With this change - kubeflow/manifests#370 we are deploying - mlmd-grpc (( K8-deployment & service) and envoy(deployment & service) along with the kf-metadata server which means we currently have two mlmd servers running (gRPC & kf-metadata).

@jlewi
Copy link
Contributor Author

jlewi commented Sep 27, 2019

@neuromage the error I'm seeing is that the pipelines UI I unable to display artifacts and executions. The call to

POST /ml_metadata.MetadataStoreService

Is failing.

I believe this request is being issued by client side Javascript.

This means the metadata service needs to be exposed through the cluster ingress and ISTIO gateway; i.e. we probably need a virtual service adding an appropriate mapping through the gateway.

What is the backend service this request is supposed to be routed to?

Is this request using gRPC or regular http?

If it's using gRPC does that create a risk that the UI will break if people's networking setup doesn't support gRPC.

For example I think GCP L7 load balancer might have added gRPC support only recently.

@jessiezcc
Copy link
Contributor

jessiezcc commented Sep 27, 2019

  • @Ark-kun who is taking over the 0.7 release work on pipeline side, and will work on manifest integration

@Bobgy
Copy link
Contributor

Bobgy commented Sep 28, 2019

@jlewi Let me try to explain how requests are routed. (Please correct me if there's something wrong/inaccurate)

EDIT: the following is about KFP lite deployment, take it as a reference, but Kubeflow can use a different set up.
Browser sends HTTP requests encoding gRPC data using grpc-web library to Frontend node server.
Node server proxies these requests library to metadata-envoy (still in HTTP).
metadata-envoy converts requests to native gRPC and proxies them to metadata-grpc-server.

Does ISTIO gateway only need to expose those accessed from public? If my understanding is correct, there's no risk. Browser still only talks to the frontend node server. All the extra routing happens in the cluster.

@Bobgy
Copy link
Contributor

Bobgy commented Sep 28, 2019

With this change - kubeflow/manifests#370 we are deploying - mlmd-grpc (( K8-deployment & service) and envoy(deployment & service) along with the kf-metadata server which means we currently have two mlmd servers running (gRPC & kf-metadata).

Maybe I'm missing something, why do we still need kf-metadata?

@neuromage
Copy link
Contributor

neuromage commented Sep 28, 2019

@jlewi

This means the metadata service needs to be exposed through the cluster ingress and ISTIO gateway; i.e. we probably need a virtual service adding an appropriate mapping through the gateway.

Sorry, but I'm not sure I understand. The KFP UI is proxying those requests to the gRPC backend via Envoy. By virtual service, do you mean the Envoy proxy config? If so, this is already set up.

What is the backend service this request is supposed to be routed to?

It goes to the gRPC MLMD server.

Is this request using gRPC or regular http?

It uses gRPC-Web

If it's using gRPC does that create a risk that the UI will break if people's networking setup doesn't support gRPC.

Sorry, I don't understand this. Whose UI set up would we break?

@neuromage
Copy link
Contributor

@jlewi what client is sending this request? This setup is intended for the KFP UI.

@Bobgy
Copy link
Contributor

Bobgy commented Sep 29, 2019

Realized difference between Kubeflow and KFP lite deployment. In KFP lite, we don't have a reverse proxy, so the frontend node server is used as a reverse proxy to route different API requests to their correct services.

In Kubeflow, looks like we can set up a global reverse proxy. Then we can also choose to let the request directly go to metadata-envoy-service too. (Both would work, I think we can choose either way. We need to add a route for either metadata-envoy-service or ml-pipeline-ui.)

Just adding my understanding here, I'm on vacation, so won't follow up further.

@jlewi
Copy link
Contributor Author

jlewi commented Sep 29, 2019

@neuromage Is coming from the artifacts tab in the KFP UI.

@Ark-kun
Copy link
Contributor

Ark-kun commented Sep 30, 2019

@neuromage @Bobgy Do you remember what was the last KFP version where the UI worked in both lite and one-click deployments?

@Bobgy
Copy link
Contributor

Bobgy commented Sep 30, 2019

@Ark-kun The problem described here (metadata UI, EDIT, artifact and execution list UI in KFP) is newly implemented in KFP UI for KFP lite. It never worked before in Kubeflow.

@jlewi
Copy link
Contributor Author

jlewi commented Sep 30, 2019

@Bobgy just to clarify I'm talking about the KFP UI. Here are two screenshots

Clicking the artifacts tab in pipelines UI shows error code 5

pipelines_artifacts

Looking at the JS network I see a 404 to
https://jlewi-0926-002.endpoints.jlewi-dev.cloud.goog/ml_metadata.MetadataStoreService/GetArtifacts

Here's the executions tab
pipelines_executions

The developer console at the JS network shows a 404 to
https://jlewi-0926-002.endpoints.jlewi-dev.cloud.goog/ml_metadata.MetadataStoreService/GetExecutions

Which service are these requests supposed to be routed to?

It looks like these are grpc-web requests?

My guess is that these should be directed to the metadata-grpc-service. Since its using gRPC-web we need to leverage envoy to do http to gRPC proxying.

Since Kubeflow is using ISTIO we should be able to configure ISTIO to do the http to gRPC proxying for metadata rather then explicitly installing a separate envoy proxy.

I found this blog post.
https://venilnoronha.io/seamless-cloud-native-apps-with-grpc-web-and-istio

The standalone metadata UI looks fine.

My conjecture is that the standalone (separate from KFP) metadata UI is still using http to contact the metadata service.

I'm guessing the latest release of KFP was updated to use gRPC-web in its javascript client. The corresponding ISTIO changes to properly proxy web to gRPC however are missing in Kubeflow and that's why the Artifact and Executions UIs in KFP aren't working.

@jlewi
Copy link
Contributor Author

jlewi commented Sep 30, 2019

To reproduce this follow these instructions to deploy a kubeflow cluster with kubeflow/manifests#392

  1. Clone Update pipelines manifest to ship 0.1.31 manifests#392
  2. Edit the kfctl_gcp_iap.yaml definition
  3. Use kfctl to deploy that kfctl_gcp_iap.yaml spec

If you make changes to kubeflow/manifests you can update ${KFAPP}/kustomize by doing

  1. Delete the cache

    rm -rf ${KFAPP}/.cache
    
  2. Rerun generate

    kfctl generate all -V
    

@Bobgy
Copy link
Contributor

Bobgy commented Sep 30, 2019

Sorry for mentioning metadata UI, it's a confusing term in this context. I meant exactly artifact and execution tabs in KFP UI.

I haven't worked on metadata UI before, so I don't know much about it.

@jlewi, I think you are right with other parts.

and using ISTIO for grpcweb proxy sounds like a reasonable solution to me too.

@quanjielin
Copy link

/cc @quanjielin

@zhenghuiwang
Copy link

zhenghuiwang commented Sep 30, 2019

As discussed offline, we may want to route ml_metadata.MetadataStoreService to metadata-envoy-service via a virtual service. Similar to
https://github.com/kubeflow/manifests/blob/master/metadata/overlays/istio/virtual-service.yaml

@neuromage
Copy link
Contributor

Sent out kubeflow/manifests#424 to hopefully fix this.

@neuromage
Copy link
Contributor

Keeping this open until we've verified the fix with Jeremy's changes as well.

@jlewi
Copy link
Contributor Author

jlewi commented Oct 4, 2019

jlewi pushed a commit to jlewi/manifests that referenced this issue Oct 4, 2019
* Update the images to 0.1.31

* We need to include the pipeline-visualization-service

Related to kubeflow/pipelines#2239

Use a config map to specify the config for the ml-pipeline backend server
  * Currently it is hardcoded to a config file baked into the docker image.
  * Hardcoding a config into the dockerfile generally doesn't seem like a good
    idea since its not easy to configure.

  * Per kubeflow/pipelines#2101 the hard coded config is actually broken it
    is not specifying the hostname and port of the visualization server.
@jlewi
Copy link
Contributor Author

jlewi commented Oct 4, 2019

  • I hit one issue with Metadata fix manifests#424 - the newly defined metadata-grpc virtual service wasn't included in the kustomization.yaml file

  • Even after making that change its still not working for me.

JS developers console shows a 503 for the following URL
https://jlewi-1003-kfp-001.endpoints.jlewi-dev.cloud.goog/ml_metadata.MetadataStoreService/GetArtifacts

@jlewi
Copy link
Contributor Author

jlewi commented Oct 5, 2019

Looks like it was an issue with the port; kubeflow/manifests#430 should be updated.

jlewi pushed a commit to jlewi/manifests that referenced this issue Oct 6, 2019
* Update the images to 0.1.31

* We need to include the pipeline-visualization-service

Related to kubeflow/pipelines#2239

Use a config map to specify the config for the ml-pipeline backend server
  * Currently it is hardcoded to a config file baked into the docker image.
  * Hardcoding a config into the dockerfile generally doesn't seem like a good
    idea since its not easy to configure.

  * Per kubeflow/pipelines#2101 the hard coded config is actually broken it
    is not specifying the hostname and port of the visualization server.
k8s-ci-robot pushed a commit to kubeflow/manifests that referenced this issue Oct 7, 2019
* Update the images to 0.1.31

* We need to include the pipeline-visualization-service

Related to kubeflow/pipelines#2239

Use a config map to specify the config for the ml-pipeline backend server
  * Currently it is hardcoded to a config file baked into the docker image.
  * Hardcoding a config into the dockerfile generally doesn't seem like a good
    idea since its not easy to configure.

  * Per kubeflow/pipelines#2101 the hard coded config is actually broken it
    is not specifying the hostname and port of the visualization server.
@jlewi
Copy link
Contributor Author

jlewi commented Oct 8, 2019

Manifests have been merged.

@Ark-kun
Copy link
Contributor

Ark-kun commented Oct 10, 2019

Is there a 0.7 branch? In the master considered to be the 0.7?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants