Mount kubelet and container runtime rootdir on LSSD #93305

alculquicondor · 2020-07-21T18:30:01Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

When environment variable NODE_LOCAL_SSD_EPHEMERAL=true,
create a RAID 0 array on all attached Local SSDs on NVMe interfaces to mount:

kubelet root dir
container runtime root dir
pod logs dir

Those directories account for all ephemeral storage.

Does this PR introduce a user-facing change?:

NONE

alculquicondor · 2020-07-21T18:30:44Z

/assign @mattcary @msau42

msau42 · 2020-07-21T18:53:40Z

cluster/gce/gci/configure-helper.sh

+    if [ -e "${ssd}" ]; then
+      # This workaround to find if the NVMe device is a disk is required because
+      # the existing Google images does not expose NVMe devices in /dev/disk/by-id
+      if [[ `udevadm info --query=property --name=${ssd} | grep DEVTYPE | sed "s/DEVTYPE=//"` == "disk" ]]; then


Does this handle the case where PD could be nvme?

It was copied over from ensure-local-ssds

I think the existing logic doesn't handle PD as nvme and assumes all nvme devices are local SSDs. @mattcary do you want to restrict this new ephemeral option to only scsi PD boot disk for now, and update this logic to handle nvme PD boot disk later?

We want nvme support. Is there any doc on how to detect nvme PDs?

using ID_MODEL=nvme_card

Updated to lsblk

cluster/gce/gci/configure-helper.sh

msau42 · 2020-07-21T18:57:29Z

cluster/gce/gci/configure-helper.sh

+    md_device="/dev/md/0"
+    echo "y" | mdadm --create "${md_device}" --level=0 --raid-devices=${#devices[@]} ${devices[@]}
+  fi
+  local ephemeral_mountpoint="/mnt/disks/kube-ephemeral-ssd"


Can we mount this somewhere other than /mnt/disks? This is currently used as the default discovery directory for local PVs, so if we ever want to support both at the same time, this will conflict.

The rest of /mnt is read-only. I see this for /mnt/disks:

tmpfs /mnt/disks tmpfs rw,relatime,size=256k,mode=755 0 0

One possibility would be /mnt/stateful_partition/ephemeral_storage, but then we are putting it inside the boot disk.

cluster/gce/gci/configure-helper.sh

msau42 · 2020-07-24T17:25:44Z

/lgtm
/assign @cheftako

mattcary · 2020-07-24T20:15:48Z

/lgtm
as well

alculquicondor · 2020-07-27T13:44:23Z

/retest

alculquicondor · 2020-07-28T18:13:26Z

Friendly ping @cheftako

Giving the freeze, we are not looking for merging it now, but just to get early feedback

cheftako · 2020-07-29T23:47:45Z

cluster/gce/util.sh

@@ -1215,6 +1215,7 @@ CONTAINER_RUNTIME_TEST_HANDLER: $(yaml-quote ${CONTAINER_RUNTIME_TEST_HANDLER:-}
 UBUNTU_INSTALL_CONTAINERD_VERSION: $(yaml-quote ${UBUNTU_INSTALL_CONTAINERD_VERSION:-})
 UBUNTU_INSTALL_RUNC_VERSION: $(yaml-quote ${UBUNTU_INSTALL_RUNC_VERSION:-})
 NODE_LOCAL_SSDS_EXT: $(yaml-quote ${NODE_LOCAL_SSDS_EXT:-})
+NODE_LOCAL_SSDS_EPHEMERAL: $(yaml-quote ${NODE_LOCAL_SSDS_EPHEMERAL:-})


cheftako · 2020-07-29T23:48:01Z

/lgtm

alculquicondor · 2020-09-01T21:04:27Z

/retest

msau42 · 2020-09-01T21:12:59Z

cluster/gce/gci/configure-helper.sh

+  # Move the container runtime's directory to the new location to preserve
+  # preloaded images.
+  if [ ! -d "${ephemeral_mountpoint}/${container_runtime}" ]; then
+    mv "/var/lib/${container_runtime}" "${ephemeral_mountpoint}/${container_runtime}"


We unmounted "/var/lib/${container_runtime}" above. Is there any data here to move?

They are rw remounts. I have yet to verify in Ubuntu, though. I'll get back to this.

Sorry, I don't follow. Do you mean that the cases where there are preloaded images, /var/lib/docker isn't mounted so the unmount is a no-op?

I rechecked. The OS images that mount are the containerd ones. And it's just a RW remount in the same disk. We need to unmount prior to moving the folder.

This is independent of having preloaded images or not.

I'm still not following. Why do we need to unmount? If we're concerned about someone writing to it during the move, the umount isn't enough because it will silently fail if the dir is already in use (ie there's a race).

We need to unmount to be able to move the data. Otherwise we would get a "resource busy" error. Nothing is writing to it, as the container runtime is stopped.

finally I understand, bind-mounted dirs make that resource busy error. thx

alculquicondor · 2020-09-02T16:05:44Z

There are new shell check requirements. PTAL at last 2 commits

mattcary · 2020-09-02T17:42:52Z

cluster/gce/gci/configure-helper.sh

+    seen_arrays=(/dev/md/*)
+    device=${seen_arrays[0]}
+    echo "Setting RAID array with local SSDs on device ${device}"
+    if [ ! -e "$device" ]; then


Why can we assume an existing raid array has to be our local SSDs?

answer: we have to, because it's too complicated to figure out what an existing raid is from when the node is restarted.

There's no other mechanism that creates RAID arrays in the startup script.

mattcary · 2020-09-02T17:44:56Z

cluster/gce/gci/configure-helper.sh

+  # mount container runtime root dir on SSD
+  local container_runtime="${CONTAINER_RUNTIME:-docker}"
+  systemctl stop "$container_runtime"
+  # Some images mount the container runtime root dir.


Would it be more precise to say "remount" here? That gives a better hint as to what's going on IMHO.

mattcary · 2020-09-02T17:56:00Z

/lgtm

fwiw

alculquicondor · 2020-09-08T14:42:38Z

Rebased and squashed

@cheftako

alculquicondor · 2020-09-08T15:25:14Z

/retest

mattcary · 2020-09-08T17:14:29Z

Still
/lgtm

When environment variable NODE_LOCAL_SSD_EPHEMERAL=true, create a RAID 0 array on all attached SSDs to mount: - kubelet root dir - container runtime root dir - pod logs dir Those directories account for all ephemeral storage. An array is not created when there is only one SSD. Change-Id: I22137f1d83fc19e9ef58a556d7461da43e4ab9bd Signed-off-by: Aldo Culquicondor <acondor@google.com>

alculquicondor · 2020-09-15T22:32:56Z

ping @cheftako

/retest

cheftako · 2020-09-16T00:22:30Z

/lgtm
/approve

k8s-ci-robot · 2020-09-16T00:23:03Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, cheftako

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster/gce/OWNERS~~ [cheftako]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

When environment variable NODE_LOCAL_SSD_EPHEMERAL=true, create a RAID 0 array on all attached SSDs to mount: - kubelet root dir - container runtime root dir - pod logs dir Those directories account for all ephemeral storage. An array is not created when there is only one SSD. OSS: kubernetes#93305 Signed-off-by: Aldo Culquicondor <acondor@google.com> Change-Id: Ib15524d6e6fab7a5fadda7bc1a64765f1364327f

k8s-ci-robot requested review from mml and roycaihw July 21, 2020 18:30

k8s-ci-robot added sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 21, 2020

k8s-ci-robot assigned mattcary and msau42 Jul 21, 2020

msau42 reviewed Jul 21, 2020

View reviewed changes

mattcary reviewed Jul 21, 2020

View reviewed changes

cluster/gce/gci/configure-helper.sh Show resolved Hide resolved

alculquicondor force-pushed the lssd-ephemeral branch 2 times, most recently from becefd1 to 17ca5f1 Compare July 23, 2020 17:44

alculquicondor commented Jul 23, 2020

View reviewed changes

cluster/gce/gci/configure-helper.sh Outdated Show resolved Hide resolved

alculquicondor force-pushed the lssd-ephemeral branch from 17ca5f1 to e632ad7 Compare July 24, 2020 15:01

k8s-ci-robot assigned cheftako Jul 24, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 24, 2020

alculquicondor force-pushed the lssd-ephemeral branch from e632ad7 to a7cc28b Compare July 29, 2020 22:30

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 29, 2020

cheftako reviewed Jul 29, 2020

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 1, 2020

msau42 reviewed Sep 1, 2020

View reviewed changes

alculquicondor force-pushed the lssd-ephemeral branch from f7aa584 to 0ea9665 Compare September 2, 2020 15:17

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 2, 2020

mattcary reviewed Sep 2, 2020

View reviewed changes

alculquicondor force-pushed the lssd-ephemeral branch from 0ea9665 to add0519 Compare September 2, 2020 17:48

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 2, 2020

alculquicondor force-pushed the lssd-ephemeral branch from add0519 to 072a544 Compare September 8, 2020 14:42

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 8, 2020

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 8, 2020

alculquicondor force-pushed the lssd-ephemeral branch from 072a544 to 2ae4eeb Compare September 14, 2020 18:34

k8s-ci-robot removed lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 14, 2020

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 16, 2020

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 16, 2020

k8s-ci-robot merged commit 6b39cdf into kubernetes:master Sep 22, 2020

k8s-ci-robot added this to the v1.20 milestone Sep 22, 2020

Mount kubelet and container runtime rootdir on LSSD #93305

Mount kubelet and container runtime rootdir on LSSD #93305

Conversation

alculquicondor commented Jul 21, 2020 • edited

alculquicondor commented Jul 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

msau42 commented Jul 24, 2020

mattcary commented Jul 24, 2020

alculquicondor commented Jul 27, 2020

alculquicondor commented Jul 28, 2020

Choose a reason for hiding this comment

cheftako commented Jul 29, 2020

alculquicondor commented Sep 1, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alculquicondor commented Sep 2, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattcary commented Sep 2, 2020

alculquicondor commented Sep 8, 2020

alculquicondor commented Sep 8, 2020

mattcary commented Sep 8, 2020

alculquicondor commented Sep 15, 2020

cheftako commented Sep 16, 2020

k8s-ci-robot commented Sep 16, 2020

alculquicondor commented Jul 21, 2020 •

edited