Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] generic viewer operator for managing user webapps in Kubeflow #5681

Closed
Bobgy opened this issue May 19, 2021 · 14 comments
Closed

[feature] generic viewer operator for managing user webapps in Kubeflow #5681

Bobgy opened this issue May 19, 2021 · 14 comments
Labels
area/backend kind/feature lifecycle/stale The issue / pull request is stale, any activities remove this label. needs more info

Comments

@Bobgy
Copy link
Contributor

Bobgy commented May 19, 2021

Feature Area

/area backend

What feature would you like to see?

Besides tensorboard, KFP viewer controller supports generic viewers.

A viewer is a long running container that exposes a webapp through a certain port. (along with required setup to expose it through ingress, e.g. virtualservice in istio)
It can help visualize outputs of a pipeline component, but it can also be used outside of KFP like #5651.

There are a few different use-cases we are currently getting:

All of them fit into this category, that makes it seem like a generic viewer operator that only abstracts the part of setting up ingress and lifecycle control seems like a good fit. The specific configuration for each different type of service we want to expose can be configured by users of viewer CRD.

Strawman Proposal

A generic viewer CRD like the following:

apiVersion: pipelines.kubeflow.org/v1beta2
kind: Viewer
spec:
  ingress:
    type: istio.virtualservice # maybe we can have more type supports
  containers:
  - name: main
    image: tensorflow:2.3
    command: ['python3', '-m']
    arguments: ['tensorboard', '--port', '8080', '--bind-all']
    envs:
    - name: AWS_SECRET
      valueFrom:
      - xxxx
    port: 8080

This custom resource will be used to setup the webapp for external access with:

  • deployment
  • service
  • virtualservice
  • authorizationpolicy

The major value coming from the generic viewer operator is to unify the resources needed to make this webapp available to users securely. Also, when creating/deleting this custom resource, operator will make sure the group of resources are created/deleted/updated.

I think the major controversial things to discuss is whether the viewer should encode domain knowledge about each type of service to start up. With the number of different use-cases we have seen, sounds to me that we'd better leave those domain knowledge to a different layer of abstraction. Curious about how others think about that.

What is the use case or pain point?

This also helps mitigate the problem that Kubeflow community has two operators to support these features: kubeflow/kubeflow#5921.


Love this idea? Give it a 👍. We prioritize fulfilling features with the most 👍.

@Bobgy Bobgy changed the title [feature] generic viewer support [feature] generic viewer operator for managing user webapps in Kubeflow May 19, 2021
@davidspek
Copy link
Contributor

I think this would be a great step forward in terms of making it possible to easy run any type of web app with Kubeflow. The example resource from above would need some additional values of course, such as resource limits and requests, etc. Another important value that needs to be added is a type classification, so that the various web apps can filter through the different types of viewer (such as the Jupyter Web App and the Tensorboards Web App).

Another possibility is to adopt the newer (I think) CRD naming scheme, resulting on something similar to notebook.webapps.kubefow.org, tensorboard.webapps.kubeflow.org and filebrowser.webapps.kubeflow.org. Although I'm not sure if the same reconciliation code can be used for each type with this setup.

If going for the first option with a type that is specified in the resource, it would be a great feature to have this configurable for the cluster admin. This way, if they have some custom web app that they would like to deploy, the existing controller can be used. Another option would be to have the type not actually affect anything in the reconciliation loop, allowing any arbitrary value to be used.

@kimwnasptd
Copy link
Member

Thank you for starting this discussion @Bobgy! This is a feature that we had also been discussing for a long time kubeflow/kubeflow#3578 (comment), for Notebooks WG. Let my add some insights we had from running Notebooks, which could help us evaluate such a proposal.

The first thing I'd like to point out is we should consider exposing a PodTemplateSpec instead of ContainerSpec, in the CR's spec, since it will allow us to configure at least the following things:

  1. A PVC to be used with the app's pod
  2. The ServiceAccount
  3. Affinities and Tolerations

With the number of different use-cases we have seen, sounds to me that we'd better leave those domain knowledge to a different layer of abstraction.

I totally agree with that approach. If we'd like to include different apps then we should be thinking of only the common APIs that someone would need to configure for exposing such apps.

Then it will be up to another abstraction, for example a managing web app, tο craft CRs for specific applications [Jupyter, TensorBoard, Captum etc], as you mentioned.

@kimwnasptd
Copy link
Member

I'd also like to expose some necessary configurations we've seen for making the Notebook servers run under a prefix. Some apps will be expecting the requrests under the prefix path while some others will always expect requests under /.

The least necessary configuration for this will be the spec.http[i].rewrite.uri field in the VirtualService.

Also, we've seen applications, like RStudio, that require the prefix to be included in a custom header in each request.

The most flexible solution here would be to allow users to modify the entire spec of a VirtualService. But this will make it more involved to create such CRs, since the user will need to provide the Service name, gateways etc. in the spec.

@kimwnasptd
Copy link
Member

kimwnasptd commented May 25, 2021

Lastly, I'd also like to point out that an interesting feature would be to allow users to configure the replicas of the underlying Deployment.

This will essentially allow users to start/stop the underlying Pods, while still maintaining the CR.

So by taking all the above into consideration I'd propose the following iteration:

Strawman proposal v2

apiVersion: pipelines.kubeflow.org/v1beta2
kind: Viewer
spec:
  ingress:
    type: istio.virtualservice # maybe we can have more types
    pathRewrite: /
    httpHeaders:
    - name: X-Forwarded-Prefix
      value: /tensorboard/kubeflow/tb-instance
  replicas: 1
  template:
    spec:
      containers:
      - name: main
        image: tensorflow:2.3
        command: ['python3', '-m']
        arguments: ['tensorboard', '--port', '8080', '--bind-all']
        envs:
        - name: AWS_SECRET
          valueFrom:
          - xxxx
        port: 8080

Would really like to hear your feedback. Also I believe another useful thing to discuss is how to handle the ports the container exposes and the underlying Service. Should we take for granted that the Service will only be sending traffic to Pod's 8080 port?

@Bobgy
Copy link
Contributor Author

Bobgy commented May 27, 2021

That already looks great!
I think we can also use ports field instead, because Istio often require named ports to start with its protocol name.

Shall we put this into a design doc now?

@davidspek
Copy link
Contributor

I was just going to say the same thing about the ports. This will also be useful for pods that expose services on multiple ports (such as metrics for example).

@kimwnasptd
Copy link
Member

@Bobgy glad to hear!

I will start working on a first iteration of a design doc so that we can further iterate and evaluate some edge cases. Do you have a template for design docs in pipelines I should follow?

@Bobgy
Copy link
Contributor Author

Bobgy commented Jun 7, 2021

@kimwnasptd Great! I am not opinionated about a template, you can use whatever template you prefer.

@ca-scribner
Copy link
Contributor

Related to this, would it be helpful to think about enabling generic dashboards through this same method (I'm thinking a dash or flask dashbord, or maybe even r-shiny?). Don't want to overload it, but general dashboards are the other use case that feel similar to this. In some ways I guess it is a generic version of the tensorboard use case

@davidspek
Copy link
Contributor

@ca-scribner The viewer CRD would be the confidante for these dashboard applications as well. Creating a web app that allows you to launch dashboards is something I’ve discussed with multiple people recently and something I would like to add in the future. Some thought will need to go into the authorization policies, so that non-Kubeflow users can access the dashboards as well (probably by using an OIDC group). A similar functionality will probably be desired for KFServing endpoints as well.

@ca-scribner
Copy link
Contributor

ca-scribner commented Jun 14, 2021 via email

@davidspek
Copy link
Contributor

One thing that will need some careful consideration with the generic viewer is how to deal with RBAC permissions. For example, if you would want to allow a user to create tensorboards, but not a file browser instance. To support this I think it will be necessary to define multiple Kinds for the different viewers, but have them share (most of) the reconciliation loop. This then also allows for some domain specific implementations as well. Adding a layer of abstraction above this controller would probably require another controller, partially defeating the purpose of a single unified controller. The different specs would look similar to the following:

apiVersion: viewer.kubeflow.org/v1beta2
kind: Tensorboard
....
apiVersion: viewer.kubeflow.org/v1beta2
kind: Filebrowser
....

@stale
Copy link

stale bot commented Oct 2, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Oct 2, 2021
@stale
Copy link

stale bot commented Mar 3, 2022

This issue has been automatically closed because it has not had recent activity. Please comment "/reopen" to reopen it.

@stale stale bot closed this as completed Mar 3, 2022
KFP Runtime Triage automation moved this from Needs More Info to Closed Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/backend kind/feature lifecycle/stale The issue / pull request is stale, any activities remove this label. needs more info
Projects
Development

No branches or pull requests

6 participants