Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test: support storage tests with non-standard kubelet root directory #108253

Merged
merged 1 commit into from
Mar 15, 2022

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Feb 21, 2022

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Some storage tests deploy DaemonSets which hard-code /var/lib/kubelet as root
directory for kubelet registration and pod directory. To enable those tests in
clusters with some other directory, a new global parameter gets added and the
value defined there gets patched into the relevant places of a CSI driver's
YAML files.

Which issue(s) this PR fixes:

Fixes #92664

Special notes for your reviewer:

A global parameter is used because other tests might also need the same
information.

There are currently no Prow jobs which use a non-standard path. I verified manually that the new flag has the intended behavior:

# Without flags.
$ kubectl get pods --all-namespaces -o yaml | grep /var/lib/kubelet
                k:{"mountPath":"/var/lib/kubelet/plugins"}:
                k:{"mountPath":"/var/lib/kubelet/pods"}:
      - mountPath: /var/lib/kubelet/pods
      - mountPath: /var/lib/kubelet/plugins
      - --kubelet-registration-path=/var/lib/kubelet/plugins/csi-hostpath-provisioning-8787/csi.sock
        path: /var/lib/kubelet/plugins/csi-hostpath-provisioning-8787
        path: /var/lib/kubelet/pods
        path: /var/lib/kubelet/plugins_registry
        path: /var/lib/kubelet/plugins

# With -kubelet-root=/var/lib/kubeletxxx
$ kubectl get pods --all-namespaces -o yaml | grep /var/lib/kubelet
                k:{"mountPath":"/var/lib/kubeletxxx/plugins"}:
                k:{"mountPath":"/var/lib/kubeletxxx/pods"}:
      - mountPath: /var/lib/kubeletxxx/pods
      - mountPath: /var/lib/kubeletxxx/plugins
      - --kubelet-registration-path=/var/lib/kubeletxxx/plugins/csi-hostpath-provisioning-4758/csi.sock
        path: /var/lib/kubeletxxx/plugins/csi-hostpath-provisioning-4758
        path: /var/lib/kubeletxxx/pods
        path: /var/lib/kubeletxxx/plugins_registry
        path: /var/lib/kubeletxxx/plugins

Does this PR introduce a user-facing change?

The e2e.test binary supports a new `--kubelet-root` parameter to override the default `/var/lib/kubelet` path. CSI storage tests use this.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 21, 2022
@pohly
Copy link
Contributor Author

pohly commented Feb 21, 2022

/cc @jonesbr17

@k8s-ci-robot
Copy link
Contributor

@pohly: GitHub didn't allow me to request PR reviews from the following users: jonesbr17.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @jonesbr17

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 21, 2022
@pohly
Copy link
Contributor Author

pohly commented Feb 21, 2022

/retest

/hold

I'm working on a Prow job with non-standard kubelet root. Let's see whether I can get that working before merging this PR.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 21, 2022
@pohly
Copy link
Contributor Author

pohly commented Feb 23, 2022

The idea with using a unique path in pull-kubernetes-e2e-kind and then using that path to find leaked loop devices didn't work out. Therefore running pull-kubernetes-e2e-kind with a non-standard path is probably not worth it. But that is debatable: on the one hand it ensures that tests don't make assumptions about the kubelet path, on the other hand it creates additional friction for the upstream maintainers.

This PR currently also doesn't change where csi-driver-host-path puts its persistent state. That is still /var/lib/csi-data-dir. @jonesbr17: do you know whether that might be a problem in the cluster where you encountered issue #92664?

For future reference, here's a patch which also changes csi-data-dir:

diff --git a/test/e2e/storage/drivers/csi.go b/test/e2e/storage/drivers/csi.go
index 10af124fc4c..9ca76bb0cfa 100644
--- a/test/e2e/storage/drivers/csi.go
+++ b/test/e2e/storage/drivers/csi.go
@@ -45,6 +45,7 @@ import (
 	"strings"
 	"sync"
 	"time"
+	"path"
 
 	"github.com/onsi/ginkgo"
 	spb "google.golang.org/genproto/googleapis/rpc/status"
@@ -73,6 +74,7 @@ import (
 	"k8s.io/kubernetes/test/e2e/storage/drivers/proxy"
 	storageframework "k8s.io/kubernetes/test/e2e/storage/framework"
 	"k8s.io/kubernetes/test/e2e/storage/utils"
+	e2eframework "k8s.io/kubernetes/test/e2e/framework"
 
 	"google.golang.org/grpc"
 )
@@ -227,6 +229,20 @@ func (h *hostpathCSIDriver) PrepareTest(f *framework.Framework) (*storageframewo
 		DriverNamespace:     driverNamespace,
 	}
 
+	// We put the state directory alongside the kubelet state directory:
+	// /var/lib/kubelet -> /var/lib/csi-hostpath-data.
+	//
+	// If kubelet is told to use something like /var/lib/<test specific ID>/kubelet,
+	// then our directory also automatically uses that test specific directory.
+	// We also make that available inside the container under the same path as on
+	// the host, in contrast to the default deployment file, which uses /csi-data-dir.
+	//
+	// This is relevant for loop device cleanup under kind Prow jobs: when a loop
+	// device shows up with /var/lib/<test specific ID> after the cluster has been
+	// torn down, https://github.com/kubernetes-sigs/kind/blob/main/hack/ci/e2e-k8s.sh
+	// knows that it can delete the loop device.
+	statedir := path.Join(path.Dir(strings.TrimRight(e2eframework.TestContext.KubeletRoot, "/")), "csi-hostpath-data")
+
 	o := utils.PatchCSIOptions{
 		OldDriverName:       h.driverInfo.Name,
 		NewDriverName:       config.GetUniqueDriverName(),
@@ -239,6 +255,8 @@ func (h *hostpathCSIDriver) PrepareTest(f *framework.Framework) (*storageframewo
 			// Enable volume lifecycle checks, to report failure if
 			// the volume is not unpublished / unstaged correctly.
 			"--check-volume-lifecycle=true",
+
+			"--statedir", statedir,
 		},
 		ProvisionerContainerName: "csi-provisioner",
 		SnapshotterContainerName: "csi-snapshotter",
@@ -272,12 +290,28 @@ func (h *hostpathCSIDriver) PrepareTest(f *framework.Framework) (*storageframewo
 				switch container.Name {
 				case "csi-external-health-monitor-agent", "csi-external-health-monitor-controller":
 					// Remove these containers.
+				case "hostpath":
+					for i, volmount := range container.VolumeMounts {
+						if volmount.Name == "csi-data-dir" {
+							volmount.MountPath = statedir
+							container.VolumeMounts[i] = volmount
+							break
+						}
+					}
+					containers = append(containers, container)
 				default:
 					// Keep the others.
 					containers = append(containers, container)
 				}
 			}
 			item.Spec.Template.Spec.Containers = containers
+			for i, volume := range item.Spec.Template.Spec.Volumes {
+				if volume.Name == "csi-data-dir" {
+					volume.HostPath.Path = statedir
+					item.Spec.Template.Spec.Volumes[i] = volume
+					break
+				}
+			}
 		}
 		return nil
 	}, h.manifests...)

I tested this PR here locally with a modified version of https://github.com/kubernetes-sigs/kind/blob/main/hack/ci/e2e-k8s.sh:

diff --git a/hack/ci/e2e-k8s.sh b/hack/ci/e2e-k8s.sh
index 9d91b21c..c42bbd3c 100755
--- a/hack/ci/e2e-k8s.sh
+++ b/hack/ci/e2e-k8s.sh
@@ -26,6 +26,30 @@ set -o errexit -o nounset -o xtrace
 #          false - (default) APIs and features left at defaults
 # 
 
+# This unique ID will be used to run kubelet with a --root-dir that is specific
+# to this test run. Loop devices created for raw block volumes will have a path
+# that is inside that directory and will show up on the host with that path (no
+# namespacing for loop devices!). While cleaning up we can delete those.
+#
+# Only some Kubernetes test/e2e binaries have support for running with a
+# non-standard kubelet root directory (added in
+# https://github.com/kubernetes/kubernetes/pull/108253). We leave this empty
+# if not available and skip loop device cleanup.
+TEST_UUID=
+TEST_ROOT_DIR=
+KUBELET_ROOT_DIR=
+if grep -q kubelet-root test/e2e/framework/test_context.go; then
+    TEST_UUID=$(dd if=/dev/random count=16 bs=1 status=none | base64)
+    TEST_ROOT_DIR="/var/lib/$TEST_UUID"
+    KUBELET_ROOT_DIR="${TEST_ROOT_DIR}/kubelet"
+fi
+
+# Root privileges are required for cleaning up loop devices.
+SUDO=
+if [ $(id -u) -ne 0 ]; then
+    SUDO=sudo
+fi
+
 # cleanup logic for cleanup on exit
 CLEANED_UP=false
 cleanup() {
@@ -42,6 +66,17 @@ cleanup() {
   if [ -n "${TMP_DIR:-}" ]; then
     rm -rf "${TMP_DIR:?}"
   fi
+
+  # Clean up leaked loop devices?
+  if [ "$TEST_UUID" ]; then
+    ${SUDO} losetup --noheadings --output NAME,BACK-FILE | while read -r dev path; do
+      if echo "$path" | grep -q "^${TEST_ROOT_DIR}/"; then
+        echo "Cleaning up leaked ${dev} for ${path}."
+        ${SUDO} losetup -d "${dev}"
+      fi
+    done
+  fi
+
   CLEANED_UP=true
 }
 
@@ -86,6 +121,10 @@ create_cluster() {
     esac
   fi
 
+  # NOTE: the indendation on the next line is meaningful!
+  kubelet_extra_args="${kubelet_extra_args}
+      \"root-dir\": \"${KUBELET_ROOT_DIR}\""
+
   # JSON map injected into featureGates config
   feature_gates="{}"
   # --runtime-config argument value passed to the API server
@@ -229,6 +268,7 @@ run_tests() {
   # running a builtin like `wait`, saving the PID also allows us to forward the
   # interrupt
   ./hack/ginkgo-e2e.sh \
+    $(if [ "${KUBELET_ROOT_DIR}" ]; then echo "--kubelet-root=${KUBELET_ROOT_DIR}"; fi) \
     '--provider=skeleton' "--num-nodes=${NUM_NODES}" \
     "--ginkgo.focus=${FOCUS}" "--ginkgo.skip=${SKIP}" \
     "--report-dir=${ARTIFACTS}" '--disable-log-dump=true' &

Tests passed, just the loop device cleanup didn't work out as intended.

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 23, 2022
@pacoxu
Copy link
Member

pacoxu commented Mar 1, 2022

It is a little dup with KubeVolumeDir.

flags.StringVar(&TestContext.KubeVolumeDir, "volume-dir", "/var/lib/kubelet", "Path to the directory containing the kubelet volumes.")

EDITED: rename it to KubeRootDir and use for both scenarios. (But it may affect some volume test cases.) Or can we reuse it directly?

@pohly
Copy link
Contributor Author

pohly commented Mar 1, 2022

It is a little dup with KubeVolumeDir.

Good catch, I hadn't seen that one.

rename it to KubeRootDir and use for both scenarios. (But it may affect some volume test cases.) Or can we reuse it directly?

You want me to rename KubeletRoot to KubeRootDir, right? KubeletRootDir makes sense to me, but not so much KubeRootDir - what would Kube stand for, i.e. which component of Kubernetes?

I'm not sure what you mean with Or can we reuse it directly? - what is it here? KubeVolumeDir? Using that and going up one level feels hacky to me.

@pacoxu
Copy link
Member

pacoxu commented Mar 1, 2022

I mean can we reuse KubeVolumeDir? If the name is not valid for this purpose, we may rename it.

@pohly
Copy link
Contributor Author

pohly commented Mar 1, 2022

KubeVolumeDir is too specific. We can of course assume that KubeletRootDir = KubeVolumeDir/.. but that looks backwards to me. I prefer to specify KubeletRootDir and then derived the other directories from that.

@pacoxu
Copy link
Member

pacoxu commented Mar 2, 2022

I prefer to specify KubeletRootDir and then derived the other directories from that.

I agree.

@pohly
Copy link
Contributor Author

pohly commented Mar 3, 2022

I must have been confused by the KubeVolumeDir name: that parameter already is the kubelet root dir, not the pods/*/volumes directory underneath that.

I think the parameter was badly named when it was introduced. I'm now torn between renaming the parameter (may cause problems for anyone already using it) and just updating the parameter's description (less intrusive, my still harder to discover).

@pacoxu : any preference?

@pacoxu
Copy link
Member

pacoxu commented Mar 3, 2022

@pacoxu : any preference?

😂 either is ok for me.

Some storage tests deploy DaemonSets which hard-code /var/lib/kubelet as root
directory for kubelet registration and pod directory. There was already a
parameter which allowed specifying the root directory, just with a very
confusing name ("--volume-dir") and matching field name. A --kubelet-root-dir
parameters gets added because this may make it easier to find the parameter,
with the old name preserved as an alias for the same field for backwards
compatibility.
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 5, 2022
@pacoxu
Copy link
Member

pacoxu commented Mar 5, 2022

/assign @SergeyKanzhelev @ehashman

@pohly
Copy link
Contributor Author

pohly commented Mar 5, 2022

@SergeyKanzhelev @ehashman: there are some comments in node tests about "framework.TestContext.KubeVolumeDir is currently not populated for node e2e", for example here:

// TODO: remove hardcoded kubelet volume directory path
// framework.TestContext.KubeVolumeDir is currently not populated for node e2e
hostLogFile = "/var/lib/kubelet/pods/" + string(pod.UID) + "/volumes/kubernetes.io~empty-dir" + logFile

I don't know what is meant with "not populated" and therefore didn't touch those parts.

The default /var/lib/kubelet should always be set, so using it would be better than hard-coding (in that example) hostLogFile = "/var/lib/kubelet/pods/". Let me know whether you want me update also those code lines.

@pohly pohly closed this Mar 5, 2022
@pohly pohly reopened this Mar 5, 2022
@pohly
Copy link
Contributor Author

pohly commented Mar 6, 2022

/retest

@klueska
Copy link
Contributor

klueska commented Mar 14, 2022

/approve

Copy link
Member

@ehashman ehashman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

for node test

Comment on lines +73 to +74
// Inject non-standard kubelet path.
*p = substKubeletRootDir(*p)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a little weird to me. But since it looks like @pohly is a storage test approver I trust you to do the right thing here...

volumeHostPath := fmt.Sprintf("%s/pods/%s/volumes/kubernetes.io~empty-dir/%s", framework.TestContext.KubeVolumeDir, foundPod.UID, volumeName)
ginkgo.By(fmt.Sprintf("confirming a container with the same label can read the file under --volume-dir=%s", framework.TestContext.KubeVolumeDir))
volumeHostPath := fmt.Sprintf("%s/pods/%s/volumes/kubernetes.io~empty-dir/%s", framework.TestContext.KubeletRootDir, foundPod.UID, volumeName)
ginkgo.By(fmt.Sprintf("confirming a container with the same label can read the file under --kubelet-root-dir=%s", framework.TestContext.KubeletRootDir))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine.

@ehashman
Copy link
Member

/retest

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ehashman, klueska, pacoxu, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 14, 2022
@ehashman
Copy link
Member

/triage accepted
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Mar 14, 2022
@pohly
Copy link
Contributor Author

pohly commented Mar 14, 2022

/retest

1 similar comment
@pohly
Copy link
Contributor Author

pohly commented Mar 14, 2022

/retest

@pacoxu
Copy link
Member

pacoxu commented Mar 15, 2022

/test pull-kubernetes-integration

@k8s-triage-robot
Copy link

The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass.

This bot retests PRs for certain kubernetes repos according to the following rules:

  • The PR does have any do-not-merge/* labels
  • The PR does not have the needs-ok-to-test label
  • The PR is mergeable (does not have a needs-rebase label)
  • The PR is approved (has cncf-cla: yes, lgtm, approved labels)
  • The PR is failing tests required for merge

You can:

/retest

@pacoxu pacoxu moved this from Triage to Done in SIG Node CI/Test Board Mar 15, 2022
@pacoxu pacoxu moved this from Triage to Done in SIG Node PR Triage Mar 15, 2022
@k8s-ci-robot k8s-ci-robot merged commit 7cc4fcd into kubernetes:master Mar 15, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.24 milestone Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Archived in project
Development

Successfully merging this pull request may close these issues.

CSI Volume tests should work with non-default kubelet root directories
7 participants