Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Tolerations and Affinity in Notebooks #5237

Merged
merged 2 commits into from Aug 31, 2020

Conversation

thesuperzapper
Copy link
Member

This PR adds support for toleration and affinity configs in the Notebook Spawner UI.

Resolves: #4433

Options presented to the user are specified inside spawner_ui_config.yaml. This example config allows users to ask for exclusive access to a node within node-pool called notebook-n1-standard-2:

spawnerFormDefaults:
  ...
  affinityConfig:
    # The default `configKey` from the options list
    # If readonly, the default value will be the only option
    value: "none"
    # The list of available affinity configs
    options:
      - configKey: "none"
        displayName: "None"
        affinity: {}
      # (DESC) Pod gets an exclusive "n1-standard-2" Node
      # (TIP) set PreferNoSchedule taint on this node-pool
      # (TIP) enable cluster-autoscaler on this node-pool
      # (TIP) dont let users request more CPU/MEMORY than the size of this node
      - configKey: "exclusive__n1-standard-2"
        displayName: "Exclusive: n1-standard-2"
        affinity:
          # (Require) Node having label: `node_pool=notebook-n1-standard-2`
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
                - matchExpressions:
                    - key: "node_pool"
                      operator: "In"
                      values:
                        - "notebook-n1-standard-2"
          # (Require) Node WITHOUT existing Pod having label: `notebook-name`
          podAntiAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              - labelSelector:
                  matchExpressions:
                    - key: "notebook-name"
                      operator: "Exists"
                namespaces: []
                topologyKey: "kubernetes.io/hostname"
    readOnly: false
  tolerationGroup:
    # The default `groupKey` from the options list
    # If readonly, the default value will be the only option
    value: "none"
    # The list of available tolerationGroup configs
    options:
      - groupKey: "none"
        displayName: "None"
        tolerations: []
      - groupKey: "group_1"
        displayName: "Group 1: description"
        tolerations:
          - key: "key1"
            operator: "Equal"
            value: "value1"
            effect: "NoSchedule"
          - key: "key2"
            operator: "Equal"
            value: "value2"
            effect: "NoSchedule"
    readOnly: false

Here is a picture:
new_jupyter_form

@google-cla google-cla bot added the cla: yes label Aug 25, 2020
@kubeflow-bot
Copy link
Contributor

This change is Reviewable

@k8s-ci-robot
Copy link
Contributor

Hi @thesuperzapper. Thanks for your PR.

I'm waiting for a kubeflow member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@thesuperzapper
Copy link
Member Author

I have a significant cleanup of the notebook UI which I will create PR for after this gets merged.

@thesuperzapper
Copy link
Member Author

@vkoukis or @prodonjs can you please review?

@thesuperzapper
Copy link
Member Author

or perhaps @kimwnasptd

@kimwnasptd
Copy link
Member

@thesuperzapper I'll take a stab at this shortly.

One first quick comment is that from now on, I think, it would make sense to see how we could refactor the UI and not have the form grow indefinitely in height.

@kimwnasptd
Copy link
Member

With a first look at the code it looks good, nice work @thesuperzapper!

I'll also be deploying the web app locally to make further testing and will report back

@thesuperzapper
Copy link
Member Author

/ok-to-test

@thesuperzapper
Copy link
Member Author

@kimwnasptd can we get this merged, I have a massive re-write of the UI which depends on these commits.

@kimwnasptd
Copy link
Member

Tested the code and it works as expected. Thanks for the PR @thesuperzapper, this is a functionality that a lot of user will use.

I have a massive re-write of the UI which depends on these commits.

We will have to coordinate on this, since I'm also planning on sending PRs for a rewrite I've done on the entire app in order to utilize the common code #5164 #5252 between the other web apps, for Tensorboards #3578 and a future Volumes web app #4758.

I'll also be submitting a PR for creating a Notebooks WG so we can discuss this in this new work group's calls. If you'd like to start this discussion sooner then lets create an issue for the rewrite of the app and discuss there.

@kimwnasptd
Copy link
Member

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kimwnasptd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kimwnasptd
Copy link
Member

Also @thesuperzapper could you create another PR in the manifests to update the config.yaml there as well?

@k8s-ci-robot k8s-ci-robot merged commit e445630 into kubeflow:master Aug 31, 2020
2 of 3 checks passed
@supertetelman
Copy link

I'm just deploying v1.2 using the istio manifest and using the Dex manifest, but I do not see this UI change.. Should I expect to see a UI similar to the one depicted here or has that been punted to v1.3 given all the other related UI rewrites?

@fakeburst
Copy link

@supertetelman It seems to be intended as there is a line in the official blog post telling that The artifacts for the updated Notebooks web app will be available in 1.2.1 or later

You can still make use of this feature by bumping jupyter-web-app image version to gcr.io/kubeflow-images-public/jupyter-web-app:vmaster-ge4456300. We've done it in our cluster the other day, so far so good

@thesuperzapper
Copy link
Member Author

@supertetelman @fakeburst you are correct that you need to use gcr.io/kubeflow-images-public/jupyter-web-app:vmaster-ge4456300 for these features to work.

However, that image was intended for Kubeflow 1.2 (so I will try and get that fixed).

For now, you can edit the kustomization.yaml in the kubeflow-apps folder generated by kfctl build ... to fix this:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
  - XXXXX
# ----------------
# ↓ our changes ↓
# ----------------
images:
  - name: gcr.io/kubeflow-images-public/jupyter-web-app
    newName: gcr.io/kubeflow-images-public/jupyter-web-app
    newTag: vmaster-ge4456300

@zoltan-fedor
Copy link

zoltan-fedor commented Dec 21, 2020

Hi @thesuperzapper,
Even after pulling image vmaster-ge4456300, I still am not seeing the UI changes - see screens below.
Any ideas?

The new image has been pulled (it says already exist because I have redeployed that pod a few times to see if it makes any difference - it didn't):
image

Still no annotations or tolerations options in the UI:
image

@zoltan-fedor
Copy link

Please ignore, I needed to clean the browser cache to see the changed UI.
Thanks!

saffaalvi pushed a commit to StatCan/kubeflow that referenced this pull request Feb 11, 2021
* add tolerationGroup configs to notebook form

* add affinity configs to notebook form
saffaalvi pushed a commit to StatCan/kubeflow that referenced this pull request Feb 12, 2021
* add tolerationGroup configs to notebook form

* add affinity configs to notebook form
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NodeSelector spec to jupyter notebooks servers
7 participants