-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: support storage tests with non-standard kubelet root directory #108253
Conversation
/cc @jonesbr17 |
@pohly: GitHub didn't allow me to request PR reviews from the following users: jonesbr17. Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest /hold I'm working on a Prow job with non-standard kubelet root. Let's see whether I can get that working before merging this PR. |
The idea with using a unique path in pull-kubernetes-e2e-kind and then using that path to find leaked loop devices didn't work out. Therefore running pull-kubernetes-e2e-kind with a non-standard path is probably not worth it. But that is debatable: on the one hand it ensures that tests don't make assumptions about the kubelet path, on the other hand it creates additional friction for the upstream maintainers. This PR currently also doesn't change where csi-driver-host-path puts its persistent state. That is still /var/lib/csi-data-dir. @jonesbr17: do you know whether that might be a problem in the cluster where you encountered issue #92664? For future reference, here's a patch which also changes csi-data-dir: diff --git a/test/e2e/storage/drivers/csi.go b/test/e2e/storage/drivers/csi.go
index 10af124fc4c..9ca76bb0cfa 100644
--- a/test/e2e/storage/drivers/csi.go
+++ b/test/e2e/storage/drivers/csi.go
@@ -45,6 +45,7 @@ import (
"strings"
"sync"
"time"
+ "path"
"github.com/onsi/ginkgo"
spb "google.golang.org/genproto/googleapis/rpc/status"
@@ -73,6 +74,7 @@ import (
"k8s.io/kubernetes/test/e2e/storage/drivers/proxy"
storageframework "k8s.io/kubernetes/test/e2e/storage/framework"
"k8s.io/kubernetes/test/e2e/storage/utils"
+ e2eframework "k8s.io/kubernetes/test/e2e/framework"
"google.golang.org/grpc"
)
@@ -227,6 +229,20 @@ func (h *hostpathCSIDriver) PrepareTest(f *framework.Framework) (*storageframewo
DriverNamespace: driverNamespace,
}
+ // We put the state directory alongside the kubelet state directory:
+ // /var/lib/kubelet -> /var/lib/csi-hostpath-data.
+ //
+ // If kubelet is told to use something like /var/lib/<test specific ID>/kubelet,
+ // then our directory also automatically uses that test specific directory.
+ // We also make that available inside the container under the same path as on
+ // the host, in contrast to the default deployment file, which uses /csi-data-dir.
+ //
+ // This is relevant for loop device cleanup under kind Prow jobs: when a loop
+ // device shows up with /var/lib/<test specific ID> after the cluster has been
+ // torn down, https://github.com/kubernetes-sigs/kind/blob/main/hack/ci/e2e-k8s.sh
+ // knows that it can delete the loop device.
+ statedir := path.Join(path.Dir(strings.TrimRight(e2eframework.TestContext.KubeletRoot, "/")), "csi-hostpath-data")
+
o := utils.PatchCSIOptions{
OldDriverName: h.driverInfo.Name,
NewDriverName: config.GetUniqueDriverName(),
@@ -239,6 +255,8 @@ func (h *hostpathCSIDriver) PrepareTest(f *framework.Framework) (*storageframewo
// Enable volume lifecycle checks, to report failure if
// the volume is not unpublished / unstaged correctly.
"--check-volume-lifecycle=true",
+
+ "--statedir", statedir,
},
ProvisionerContainerName: "csi-provisioner",
SnapshotterContainerName: "csi-snapshotter",
@@ -272,12 +290,28 @@ func (h *hostpathCSIDriver) PrepareTest(f *framework.Framework) (*storageframewo
switch container.Name {
case "csi-external-health-monitor-agent", "csi-external-health-monitor-controller":
// Remove these containers.
+ case "hostpath":
+ for i, volmount := range container.VolumeMounts {
+ if volmount.Name == "csi-data-dir" {
+ volmount.MountPath = statedir
+ container.VolumeMounts[i] = volmount
+ break
+ }
+ }
+ containers = append(containers, container)
default:
// Keep the others.
containers = append(containers, container)
}
}
item.Spec.Template.Spec.Containers = containers
+ for i, volume := range item.Spec.Template.Spec.Volumes {
+ if volume.Name == "csi-data-dir" {
+ volume.HostPath.Path = statedir
+ item.Spec.Template.Spec.Volumes[i] = volume
+ break
+ }
+ }
}
return nil
}, h.manifests...) I tested this PR here locally with a modified version of https://github.com/kubernetes-sigs/kind/blob/main/hack/ci/e2e-k8s.sh: diff --git a/hack/ci/e2e-k8s.sh b/hack/ci/e2e-k8s.sh
index 9d91b21c..c42bbd3c 100755
--- a/hack/ci/e2e-k8s.sh
+++ b/hack/ci/e2e-k8s.sh
@@ -26,6 +26,30 @@ set -o errexit -o nounset -o xtrace
# false - (default) APIs and features left at defaults
#
+# This unique ID will be used to run kubelet with a --root-dir that is specific
+# to this test run. Loop devices created for raw block volumes will have a path
+# that is inside that directory and will show up on the host with that path (no
+# namespacing for loop devices!). While cleaning up we can delete those.
+#
+# Only some Kubernetes test/e2e binaries have support for running with a
+# non-standard kubelet root directory (added in
+# https://github.com/kubernetes/kubernetes/pull/108253). We leave this empty
+# if not available and skip loop device cleanup.
+TEST_UUID=
+TEST_ROOT_DIR=
+KUBELET_ROOT_DIR=
+if grep -q kubelet-root test/e2e/framework/test_context.go; then
+ TEST_UUID=$(dd if=/dev/random count=16 bs=1 status=none | base64)
+ TEST_ROOT_DIR="/var/lib/$TEST_UUID"
+ KUBELET_ROOT_DIR="${TEST_ROOT_DIR}/kubelet"
+fi
+
+# Root privileges are required for cleaning up loop devices.
+SUDO=
+if [ $(id -u) -ne 0 ]; then
+ SUDO=sudo
+fi
+
# cleanup logic for cleanup on exit
CLEANED_UP=false
cleanup() {
@@ -42,6 +66,17 @@ cleanup() {
if [ -n "${TMP_DIR:-}" ]; then
rm -rf "${TMP_DIR:?}"
fi
+
+ # Clean up leaked loop devices?
+ if [ "$TEST_UUID" ]; then
+ ${SUDO} losetup --noheadings --output NAME,BACK-FILE | while read -r dev path; do
+ if echo "$path" | grep -q "^${TEST_ROOT_DIR}/"; then
+ echo "Cleaning up leaked ${dev} for ${path}."
+ ${SUDO} losetup -d "${dev}"
+ fi
+ done
+ fi
+
CLEANED_UP=true
}
@@ -86,6 +121,10 @@ create_cluster() {
esac
fi
+ # NOTE: the indendation on the next line is meaningful!
+ kubelet_extra_args="${kubelet_extra_args}
+ \"root-dir\": \"${KUBELET_ROOT_DIR}\""
+
# JSON map injected into featureGates config
feature_gates="{}"
# --runtime-config argument value passed to the API server
@@ -229,6 +268,7 @@ run_tests() {
# running a builtin like `wait`, saving the PID also allows us to forward the
# interrupt
./hack/ginkgo-e2e.sh \
+ $(if [ "${KUBELET_ROOT_DIR}" ]; then echo "--kubelet-root=${KUBELET_ROOT_DIR}"; fi) \
'--provider=skeleton' "--num-nodes=${NUM_NODES}" \
"--ginkgo.focus=${FOCUS}" "--ginkgo.skip=${SKIP}" \
"--report-dir=${ARTIFACTS}" '--disable-log-dump=true' & Tests passed, just the loop device cleanup didn't work out as intended. /hold cancel |
It is a little dup with kubernetes/test/e2e/framework/test_context.go Line 340 in 77f6476
EDITED: rename it to |
Good catch, I hadn't seen that one.
You want me to rename I'm not sure what you mean with |
I mean can we reuse |
|
I agree. |
I must have been confused by the I think the parameter was badly named when it was introduced. I'm now torn between renaming the parameter (may cause problems for anyone already using it) and just updating the parameter's description (less intrusive, my still harder to discover). @pacoxu : any preference? |
😂 either is ok for me. |
Some storage tests deploy DaemonSets which hard-code /var/lib/kubelet as root directory for kubelet registration and pod directory. There was already a parameter which allowed specifying the root directory, just with a very confusing name ("--volume-dir") and matching field name. A --kubelet-root-dir parameters gets added because this may make it easier to find the parameter, with the old name preserved as an alias for the same field for backwards compatibility.
/assign @SergeyKanzhelev @ehashman |
@SergeyKanzhelev @ehashman: there are some comments in node tests about "framework.TestContext.KubeVolumeDir is currently not populated for node e2e", for example here: kubernetes/test/e2e_node/node_problem_detector_linux.go Lines 297 to 299 in 5b9910a
I don't know what is meant with "not populated" and therefore didn't touch those parts. The default |
/retest |
/approve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
for node test
// Inject non-standard kubelet path. | ||
*p = substKubeletRootDir(*p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a little weird to me. But since it looks like @pohly is a storage test approver I trust you to do the right thing here...
volumeHostPath := fmt.Sprintf("%s/pods/%s/volumes/kubernetes.io~empty-dir/%s", framework.TestContext.KubeVolumeDir, foundPod.UID, volumeName) | ||
ginkgo.By(fmt.Sprintf("confirming a container with the same label can read the file under --volume-dir=%s", framework.TestContext.KubeVolumeDir)) | ||
volumeHostPath := fmt.Sprintf("%s/pods/%s/volumes/kubernetes.io~empty-dir/%s", framework.TestContext.KubeletRootDir, foundPod.UID, volumeName) | ||
ginkgo.By(fmt.Sprintf("confirming a container with the same label can read the file under --kubelet-root-dir=%s", framework.TestContext.KubeletRootDir)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine.
/retest |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ehashman, klueska, pacoxu, pohly The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/triage accepted |
/retest |
1 similar comment
/retest |
/test pull-kubernetes-integration |
The Kubernetes project has merge-blocking tests that are currently too flaky to consistently pass. This bot retests PRs for certain kubernetes repos according to the following rules:
You can:
/retest |
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Some storage tests deploy DaemonSets which hard-code /var/lib/kubelet as root
directory for kubelet registration and pod directory. To enable those tests in
clusters with some other directory, a new global parameter gets added and the
value defined there gets patched into the relevant places of a CSI driver's
YAML files.
Which issue(s) this PR fixes:
Fixes #92664
Special notes for your reviewer:
A global parameter is used because other tests might also need the same
information.
There are currently no Prow jobs which use a non-standard path. I verified manually that the new flag has the intended behavior:
Does this PR introduce a user-facing change?