Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run nvidia-gpu device-plugin daemonset as an addon on GCE nodes that have nvidia GPUs attached #54826

Merged
merged 4 commits into from
Nov 13, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
11 changes: 6 additions & 5 deletions cluster/addons/addon-manager/README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,27 @@
### Addon-manager

addon-manager manages two classes of addons with given template files.
addon-manager manages two classes of addons with given template files in
`$ADDON_PATH` (default `/etc/kubernetes/addons/`).
- Addons with label `addonmanager.kubernetes.io/mode=Reconcile` will be periodically
reconciled. Direct manipulation to these addons through apiserver is discouraged because
addon-manager will bring them back to the original state. In particular:
- Addon will be re-created if it is deleted.
- Addon will be reconfigured to the state given by the supplied fields in the template
file periodically.
- Addon will be deleted when its manifest file is deleted.
- Addon will be deleted when its manifest file is deleted from the `$ADDON_PATH`.
- Addons with label `addonmanager.kubernetes.io/mode=EnsureExists` will be checked for
existence only. Users can edit these addons as they want. In particular:
- Addon will only be created/re-created with the given template file when there is no
instance of the resource with that name.
- Addon will not be deleted when the manifest file is deleted.
- Addon will not be deleted when the manifest file is deleted from the `$ADDON_PATH`.

Notes:
- Label `kubernetes.io/cluster-service=true` is deprecated (only for Addon Manager).
In future release (after one year), Addon Manager may not respect it anymore. Addons
have this label but without `addonmanager.kubernetes.io/mode=EnsureExists` will be
treated as "reconcile class addons" for now.
- Resources under $ADDON_PATH (default `/etc/kubernetes/addons/`) needs to have either one
of these two labels. Meanwhile namespaced resources need to be in `kube-system` namespace.
- Resources under `$ADDON_PATH` need to have either one of these two labels.
Meanwhile namespaced resources need to be in `kube-system` namespace.
Otherwise it will be omitted.
- The above label and namespace rule does not stand for `/opt/namespace.yaml` and
resources under `/etc/kubernetes/admission-controls/`. addon-manager will attempt to
Expand Down
3 changes: 0 additions & 3 deletions cluster/addons/addon-manager/kube-addons.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,9 +26,6 @@
# 3. Kubectl prints the output to stderr (the output should be captured and then
# logged)

# The business logic for whether a given object should be created
# was already enforced by salt, and /etc/kubernetes/addons is the
# managed result is of that. Start everything below that directory.
KUBECTL=${KUBECTL_BIN:-/usr/local/bin/kubectl}
KUBECTL_OPTS=${KUBECTL_OPTS:-}

Expand Down
45 changes: 45 additions & 0 deletions cluster/addons/device-plugins/nvidia-gpu/daemonset.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: nvidia-gpu-device-plugin
namespace: kube-system
labels:
k8s-app: nvidia-gpu-device-plugin
addonmanager.kubernetes.io/mode: Reconcile
spec:
template:
metadata:
labels:
k8s-app: nvidia-gpu-device-plugin
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-accelerator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious how is this different from using nodeSelector field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nodeSelector can only do key=value checks. I wanted to do key exists check (because I want to run it on nodes that have nvidia-tesla-k80 as value or nvidia-tesla-p100 as value or any later value we may add.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good to know :)

operator: Exists
hostNetwork: true
hostPID: true
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
- name: dev
hostPath:
path: /dev
containers:
- image: "gcr.io/google-containers/nvidia-gpu-device-plugin@sha256:943a62949cd80c26e7371d4e123dac61b4cc7281390721aaa95f265171094842"
command: ["/usr/bin/nvidia-gpu-device-plugin", "-logtostderr"]
name: nvidia-gpu-device-plugin
resources:
requests:
cpu: 10m
memory: 10Mi
securityContext:
privileged: true
volumeMounts:
- name: device-plugin
mountPath: /device-plugin
- name: dev
mountPath: /dev
5 changes: 5 additions & 0 deletions cluster/common.sh
Original file line number Diff line number Diff line change
Expand Up @@ -873,6 +873,11 @@ EOF
if [ -n "${CLUSTER_SIGNING_DURATION:-}" ]; then
cat >>$file <<EOF
CLUSTER_SIGNING_DURATION: $(yaml-quote ${CLUSTER_SIGNING_DURATION})
EOF
fi
if [[ "${NODE_ACCELERATORS:-}" == *"type=nvidia"* ]]; then
cat >>$file <<EOF
ENABLE_NVIDIA_GPU_DEVICE_PLUGIN: $(yaml-quote "true")
EOF
fi

Expand Down
5 changes: 4 additions & 1 deletion cluster/gce/config-default.sh
Original file line number Diff line number Diff line change
Expand Up @@ -182,7 +182,10 @@ RUNTIME_CONFIG="${KUBE_RUNTIME_CONFIG:-}"
FEATURE_GATES="${KUBE_FEATURE_GATES:-ExperimentalCriticalPodAnnotation=true}"

if [[ ! -z "${NODE_ACCELERATORS}" ]]; then
FEATURE_GATES="${FEATURE_GATES},Accelerators=true"
FEATURE_GATES="${FEATURE_GATES},DevicePlugins=true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does switching this work across upgrades? Or if it doesn't that should be mentioned as part of the release note for this pr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accelerators is an alpha feature. DevicePlugins is also an alpha feature (and the replacement for Accelerators).

I had added the following release note to the PR:

GCE nodes with NVIDIA GPUs attached now expose `nvidia.com/gpu` as a resource instead of `alpha.kubernetes.io/nvidia-gpu`.

which captures the difference between clusters created using the old script and clusters created using the new script. But I haven't actually tried upgrading a cluster that had the old flag to a cluster with the new flag. In GKE, we don't have to worry about this because we don't allow alpha cluster upgrade.

What should be the right release note here?
cc - @vishh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upgrades are not really supported when alpha features are turned on. So I don't see much value in thinking about the upgrade/downgrade scenario. We recommend users to stick to specific versions and our workflow has changed considerably over releases while in alpha.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the release note as it is LGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense; I wasn't sure where this was on the alpha -> ga slider. Even with alpha features it's nice to have a release note so that people using them will know how we've changed things and I agree that the release note looks good.

It would be pretty easy to run ./cluster/upgrade.sh to upgrade a GCE cluster and see what happens to the nodes w.r.t. labels. I'm guessing that they would change to the new label, but not entirely sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@roberthbailey , I tried the following:

# Last commit from master on this branch
git checkout 55e216f56eac0082acc6be655d9ae09cf9ba38a8

go run hack/e2e.go -- -v --build

export NODE_ACCELERATORS=type=nvidia-tesla-k80,count=2; export KUBE_NODE_OS_DISTRIBUTION=gci; export KUBE_GCE_ZONE=us-west1-b; export KUBE_GCE_NODE_IMAGE=gke-1-8-2-gke-0-cos-stable-60-9592-90-0-v171103-pre-nvda-gpu; export KUBE_GCE_NODE_PROJECT=gke-node-images;
cluster/kube-up.sh

# Latest commit on this branch
git checkout cf292754ba423aa6782564ea83fe48cc1ed677d4

go run hack/e2e.go -- -v --build

export NODE_ACCELERATORS=type=nvidia-tesla-k80,count=2; export KUBE_NODE_OS_DISTRIBUTION=gci; export KUBE_GCE_ZONE=us-west1-b; export KUBE_GCE_NODE_IMAGE=gke-1-8-2-gke-0-cos-stable-60-9592-90-0-v171103-pre-nvda-gpu; export KUBE_GCE_NODE_PROJECT=gke-node-images;
cluster/gce/upgrade.sh -l

Master got the new label cloud.google.com/gke-accelerator=nvidia-tesla-k80. Master also had the correct feature-gate DevicePlugins=true set. However, not sure how to check node upgrade because upgrade.sh doesn't support upgrading nodes to local binaries.

if [[ "${NODE_ACCELERATORS}" =~ .*type=([a-zA-Z0-9-]+).* ]]; then
NODE_LABELS="${NODE_LABELS},cloud.google.com/gke-accelerator=${BASH_REMATCH[1]}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the cases where we want to enable deviceplugins but not set node labels?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, we would enable device plugins by default (there would be no feature gate). So line 185 would go away.

Ideally, we would also like that each node that has special device add a node label. Lines 186-188 are doing that, they see a special device (accelerator) and are adding a label to the node for that. As long as GCE APIs follow the convention of specifying accelerators as type=TYPE,count=COUNT, this line would continue to work. Once, GCE adds devices that are not accelerators we would have to add more logic here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That didn't really answer my question but looking at the code again, is the reason for the conditional here to capture the device type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is BASH_REMATCH portable across at least linux & mac (and ideally cygwin)? This runs client side so it needs to work on places where we run kube-up.sh.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to http://git.savannah.gnu.org/cgit/bash.git/tree/CHANGES?h=bash-4.4#n4839, BASH_REMATCH was added in bash-3.0 which was released in 2004

I tested it on my mac and it works there. I don't have access to a Windows machine but will ask someone with windows to test it on cygwin/mingw. Thanks for pointing this out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I ran a Windows VM on GCP, installed cygwin and tested this if block. It works as expected. Fun experience! :D

fi
fi

# Optional: Install cluster DNS.
Expand Down
11 changes: 7 additions & 4 deletions cluster/gce/config-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -108,10 +108,6 @@ RUNTIME_CONFIG="${KUBE_RUNTIME_CONFIG:-}"
# Optional: set feature gates
FEATURE_GATES="${KUBE_FEATURE_GATES:-ExperimentalCriticalPodAnnotation=true}"

if [[ ! -z "${NODE_ACCELERATORS}" ]]; then
FEATURE_GATES="${FEATURE_GATES},Accelerators=true"
fi

TERMINATED_POD_GC_THRESHOLD=${TERMINATED_POD_GC_THRESHOLD:-100}

# Extra docker options for nodes.
Expand Down Expand Up @@ -228,6 +224,13 @@ if [[ ${KUBE_ENABLE_INSECURE_REGISTRY:-false} == "true" ]]; then
EXTRA_DOCKER_OPTS="${EXTRA_DOCKER_OPTS} --insecure-registry 10.0.0.0/8"
fi

if [[ ! -z "${NODE_ACCELERATORS}" ]]; then
FEATURE_GATES="${FEATURE_GATES},DevicePlugins=true"
if [[ "${NODE_ACCELERATORS}" =~ .*type=([a-zA-Z0-9-]+).* ]]; then
NODE_LABELS="${NODE_LABELS},cloud.google.com/gke-accelerator=${BASH_REMATCH[1]}"
fi
fi

# Optional: Install cluster DNS.
ENABLE_CLUSTER_DNS="${KUBE_ENABLE_CLUSTER_DNS:-true}"
DNS_SERVER_IP="10.0.0.10"
Expand Down
3 changes: 3 additions & 0 deletions cluster/gce/gci/configure-helper.sh
Original file line number Diff line number Diff line change
Expand Up @@ -1796,6 +1796,9 @@ function start-kube-addons {
if [[ "${ENABLE_METRICS_SERVER:-}" == "true" ]]; then
setup-addon-manifests "addons" "metrics-server"
fi
if [[ "${ENABLE_NVIDIA_GPU_DEVICE_PLUGIN:-}" == "true" ]]; then
setup-addon-manifests "addons" "device-plugins/nvidia-gpu"
fi
if [[ "${ENABLE_CLUSTER_DNS:-}" == "true" ]]; then
setup-addon-manifests "addons" "dns"
local -r kubedns_file="${dst_dir}/dns/kube-dns.yaml"
Expand Down