Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-Introducing the Volumes Viewer #6876

Merged
merged 47 commits into from Jun 14, 2023

Conversation

TobiasGoerke
Copy link
Contributor

@TobiasGoerke TobiasGoerke commented Jan 5, 2023

Volumes Viewer

About two years ago, @davidspek proposed a pvcviewer. We've been using and enjoying this feature quite a lot since then and I'm convinced this feature needs to find its way to Kubeflow's core.
Unfortunately, the PR staled and never got merged even though it sparked serious interest among the community.

This is my attempt to move forward with this feature, making it available for all users.

PVCViewer

Pre-Existing Work

@davidspek provided a functional prototype of the volumes viewer. His work comprises a pvcviewer resource definition, a resource controller and changes to the volumes UI.
His changes already got partially merged into the volumes UI but are only available within the rok distribution.

The initial controller was based off the tensorboard controller and was WIP in many regards.

My Work

I've re-implemented the volumes viewer, and made it compatible with the current volumes UI's master.

Also, the controller has been re-written from scratch and is well-tested. It now comprises new features, such as restarting pods on RWO-Nodename changes.

API

I've included an example of a viewer object. The API is similar to what @kimwnasptd descibed in another thread.
Creating a VolumesViewer currently looks like this (comments included explaining its functionality).

apiVersion: kubeflow.org/v1alpha1
kind: VolumesViewer
metadata:
  name: volumesviewer-sample
  namespace: kubeflow-user-example-com
spec:
  # The podTemplate is applied to the deployment.Spec.Template.Spec
  # and thus, represents the core viewer's application
  podTemplate:
    #...
  service:
    # Specifies the application's target port used by the Service
    targetPort: 8080
    # If defined, an istio VirtualService is created, pointing to the Service
    virtualService:
      # The base prefix is suffixed by '/namespace/name' to create the
      # VirtualService's prefix and a unique URL for each started viewer
      basePrefix: "/volumesviewer"
      # You may specify the VirtualService's rewrite.
      # If not set, the prefix's value is used
      rewrite: "/"
      # By default, no timeout is set
      # timeout: 30s
  rwoScheduling:
    # If set to true, the controller detects RWO-Volumes referred to by the
    # podTemplate and uses affinities to schedule the viewer to nodes
    # where the volume is currently mounted. This enables the viewer to
    # access RWO-Volumes, even though they might already be mounted.
    enabled: true
    # Using the rwoScheduling feature, the viewer might block other application
    # from (re-starting). Setting restart to true instructs the controller to
    # re-compute the affinity in case Pods start using the viewer's RWO-Volumes.
    # Thus, the viewer might restart on another node without blocking new Pods.
    restart: true

How to install

Using the current kubeflow master:

  1. Install the kustomize applications
    • kustomize build components/crud-web-apps/volumes/manifests/overlays/istio | kubectl apply -f -
    • kustomize build components/volumes-viewer/config/overlays/kubeflow | kubectl apply -f -
  2. Build the container images and set them accordingly (alternatively, use my prebuilt images):
    • kubectl -n kubeflow set image deploy/volumes-web-app-deployment volumes-web-app=tobiasgoerke/kubeflow-volumes-web-app:test-v3
    • kubectl -n kubeflow set image deploy/volumes-viewer-controller-manager manager=tobiasgoerke/kubeflow-volumes-viewer:test-v3

Outlook and Future Work

  • There is a generic viewer controller managed by the pipelines WG. The volumes viewer would integrate into this controller nicely. However, attempts to integrate the viewer have failed and it seems the viewer itself is not moving forward, too.
    Thus, I've decided to create a new controller (like @davidspek did in his PR) and call the integration into the pipelines WG optional and out-of-scope. In case this PR gets merged, I'll rename the VolumesViewer controller and push for it to be accepted as a generic viewer. This would enable the volumes viewer, notebooks, tensorboard etc. to use a common implementation and controller. For this PR, the CRD and controller could be dropped then, leaving us with only changes to the volumes UI. I've designed this PR with this possible step in mind so that a generic viewer would only require one file to change.
  • I've created a PR in the filebrowser project which enables Filebrowser to support the tus.io protocol for resumable and chunked uploads. This comes in very handy with big uploads that may get disrupted or proxies that block large requests. This PR is currently in review.

@TobiasGoerke TobiasGoerke force-pushed the feature/volumes-viewer branch 2 times, most recently from db4f0cc to b976d4e Compare January 6, 2023 09:57
@TobiasGoerke TobiasGoerke marked this pull request as ready for review January 6, 2023 10:24
@TobiasGoerke TobiasGoerke changed the title [WIP] Re-Introducing the Volumes Viewer Re-Introducing the Volumes Viewer Jan 6, 2023
@kimwnasptd
Copy link
Member

@TobiasGoerke thanks for keep pushing this!

I'd propose we break down the next steps like this:

  1. We finalize the review and merge this PR
  2. We work on a follow up PR that creates a GH Action for publishing the image to DockerHub, like https://github.com/kubeflow/kubeflow/blob/master/.github/workflows/nb_controller_docker_publish.yaml
    • We can use the name kubeflownotebookswg/pvcviewer-controller
  3. We create a follow up PR that updates the manifests with that image
  4. We create a follow up PR that tests the manfiests are OK https://github.com/kubeflow/kubeflow/blob/master/.github/workflows/nb_controller_intergration_test.yaml
  5. Create a PR for running the unit-tests of the Controller https://github.com/kubeflow/kubeflow/blob/master/.github/workflows/notebook_controller_unit_test.yaml

Copy link
Member

@kimwnasptd kimwnasptd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a first pass. The code looks very good and it's great to see test coverage! I've left some small comments.

Would there be an "easy" way to generate some certs (Makefile rule?) so that we can also make run the controller locally?

Lastly, once we finish with the tasks on manifests/gh-actions, to ensure this component is integrated with the repo, I think the next items to tackle will be:

  1. Integrating with the volumes web app. I can help there since we had some code for this in the old Arrikto Rok flavor of the app (in this repo)
  2. Work on the culling mechanism, which would be very similar with feat: istio metrics based notebook culling (for VSCode and RStudio) #6927

components/pvc-viewer/OWNERS Outdated Show resolved Hide resolved
components/pvc-viewer/Makefile Outdated Show resolved Hide resolved
components/pvc-viewer/Makefile Outdated Show resolved Hide resolved
components/pvc-viewer/Makefile Outdated Show resolved Hide resolved
components/pvc-viewer/config/default/kustomization.yaml Outdated Show resolved Hide resolved
components/pvc-viewer/config/default/kustomization.yaml Outdated Show resolved Hide resolved
components/pvc-viewer/config/manager/manager.yaml Outdated Show resolved Hide resolved
components/pvc-viewer/Makefile Outdated Show resolved Hide resolved
@TobiasGoerke
Copy link
Contributor Author

Thank you for the review - it's much appreciated! I've tried to integrate all of your remarks.

  • We work on a follow up PR that creates a GH Action for publishing the image to DockerHub
  • We create a follow up PR that tests the manfiests are OK
  • Create a PR for running the unit-tests of the Controller

I've prepared a new branch with the three untested GH actions that we can create a PR off, once this PR is merged. See here.

Would there be an "easy" way to generate some certs (Makefile rule?) so that we can also make run the controller locally?

make run now executes openssl to generate certs and thus, the controller starts successfully (given you've applied the CRDs). Is this what you've meant?

Regarding the volumes web app:
I've created a separate branch some time ago that contains the frontend changes. We can use it as a base and create a PR off of it, once the controller is merged. You can then review/rework it if that's okay for you.

About culling: would it make sense to move this functionality to a common place so that both the notebook and pvcviewer controller can use it?

@kimwnasptd
Copy link
Member

make run now executes openssl to generate certs and thus, the controller starts successfully (given you've applied the CRDs). Is this what you've meant?

Hmm are you sure about this? Could you point me to this part on the Makefile? For me when I make run I get the following output:

1.6866788816545787e+09	ERROR	setup	problem running manager	{"error": "open /tmp/k8s-webhook-server/serving-certs/tls.crt: no such file or directory"}

About culling: would it make sense to move this functionality to a common place so that both the notebook and pvcviewer controller can use it?

Yes! And we have a place for common controller code in https://github.com/kubeflow/kubeflow/tree/master/components/common

I've created a separate branch some time ago that contains the frontend changes. We can use it as a base and create a PR off of it, once the controller is merged. You can then review/rework it if that's okay for you.

ACK!

@TobiasGoerke
Copy link
Contributor Author

TobiasGoerke commented Jun 14, 2023

make run now executes openssl to generate certs and thus, the controller starts successfully (given you've applied the CRDs). Is this what you've meant?

Hmm are you sure about this? Could you point me to this part on the Makefile? For me when I make run I get the following output:

1.6866788816545787e+09	ERROR	setup	problem running manager	{"error": "open /tmp/k8s-webhook-server/serving-certs/tls.crt: no such file or directory"}

Sorry, forgot to push that commit. It should look like this now:
image

About culling: would it make sense to move this functionality to a common place so that both the notebook and pvcviewer controller can use it?

Yes! And we have a place for common controller code in https://github.com/kubeflow/kubeflow/tree/master/components/common

This would involve refactoring the notebooks controller and move some code to the common dir. I'd suggest to implement culling in a separate PR to make the changes more maneagable. WDYT?

Also: do we wait for #6927 to get merged?

I've created a separate branch some time ago that contains the frontend changes. We can use it as a base and create a PR off of it, once the controller is merged. You can then review/rework it if that's okay for you.

ACK!

@kimwnasptd
Copy link
Member

Now everything seems to work as expected, thanks!

This would involve refactoring the notebooks controller and move some code to the common dir. I'd suggest to implement culling in a separate PR to make the changes more maneagable. WDYT?

For sure, let's not increase the scope of this PR any longer. We'll most probably need to create a design doc for #6927, since it has quite a lot of technical context and decisions in it.

@kimwnasptd
Copy link
Member

So at this point everything looks good so we can merge. We also do have a plan forward for the rest of the items (images, gh actions etc) so looking forward to next steps!

Thank you very much for your time and persistence on this @TobiasGoerke

/lgtm
/approve

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kimwnasptd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit a15edce into kubeflow:master Jun 14, 2023
2 checks passed
cd $$TMP_DIR ;\
go mod init tmp ;\
echo "Downloading $(2)" ;\
GOBIN=$(PROJECT_DIR)/bin go get $(2) ;\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioning it now, to just keep it in our cache. When I initially tried to get the controller-gen bin with Golang 1.20 it failed to create the binary.

After looking around I found out that go get gets deprecated in favor of go install. Install worked in my case, but I didn't look too much into it grafana/k6-operator#104 (comment)

Let's keep it in our cache for the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've had the same issue but didn't want to change the Makefile as the other components, such as the notebook controller, have the same issue..

@TobiasGoerke
Copy link
Contributor Author

Great to see this merged, thank you for your support!
I'll create some follow-up PRs over the next week with GH actions.
Also, I'll continue working on the frontend integration.

tzstoyanov pushed a commit to tzstoyanov/kubeflow that referenced this pull request Jun 14, 2023
* Integrating volumes viewer into volumes ui backend

* Integrating volumes viewer into volumes ui frontend

* Modified the volume viewer's manifests to be in accordance with the volumes viewer changes

* Bootstrapping/creating VolumesViewer Controller

* Changed/reverted image definitions

* Fixed code style issues

* Run prettier on index-default.component.ts

* Reverted accidental method call change back to getSelectedNamespace2

* Reverted package-lock.json

* Now using the VOLUME_VIEWER_IMAGE env variable in the viewer's podTemplate

* Set readinessProbe.initialDelaySeconds=2 for new viewers

* Removed downward api references in favor of Python variable expansion

* Revised crd schema

- Now includes a status.URL field
- Reverted changes to status.py to minimize diff

* Providing NAME as a possible var for var expansion

* Return and use the VolumesViewer.Status.URL

* Reconcile status while deletion ongoing

* Restored empty line to get file off diff

* Reducing diff on get.py

* Run prettier

* Updated OWNERS/README

* Changes to schema comments / renaming

* Improved test performance and reliability
By cleaning up resources created by tests in afterEach()
Also: Pod watch now only triggers for non-terminating RUNNING/PENDING pods, reducing the number of reconciliation calls

* Renaming VolumesViewer -> PVCViewer as discussed in community meeting

* Moving changes to volumes frontend to another PR as discussed in community meeting

* Renaming file names

* Renaming PVCViewerSpec.PodTemplate -> PodSpec

* Renaming PVCViewerSpec.Service -> Networking

* Adding the Spec.PVC field and validating/defaulting webhooks

* Adding the option to load a default podSpec from file

* Introduced PVCViewer.Status.Conditions

* Validator requires the PVC to be used in podSpec

* Added tests for the validating webhook

* Removed debug log message

* Updating manifests to work with new webhooks

* Updating documentation

* Refactored manager according to specs

* Modifying pvcviewer OWNERS

* Makefile & renaming comp -> pvcviewer-controller

* Changing nameprefix to pvcviewer-

* Setting imagePullPolicy: IfNotPresent

* Adding a base directory

* Generating TLS certs for make run

* Adding a log time encoder
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants