Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pod + nsenter instead of SSH in mount propagation tests #82424

Merged

Conversation

jsafrane
Copy link
Member

@jsafrane jsafrane commented Sep 6, 2019

Not all Kubernetes clusters allow ssh to nodes during test.

What type of PR is this?
/kind cleanup

Does this PR introduce a user-facing change?:

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/sig storage
/sig node

cc @msau42 @derekwaynecarr, PTAL

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. sig/storage Categorizes an issue or PR as relevant to SIG Storage. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. sig/node Categorizes an issue or PR as relevant to SIG Node. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 6, 2019
@k8s-ci-robot k8s-ci-robot added area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Sep 6, 2019
Copy link
Contributor

@mattjmcnaughton mattjmcnaughton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign @tallclair
/test pull-kubernetes-integration

/lgtm

This looks good to me, but would love an additional pair of eyes from the approver.

@@ -102,8 +105,8 @@ var _ = SIGDescribe("Mount propagation", func() {
// running in parallel.
hostDir := "/var/lib/kubelet/" + f.Namespace.Name
defer func() {
cleanCmd := fmt.Sprintf("sudo rm -rf %q", hostDir)
e2essh.IssueSSHCommand(cleanCmd, framework.TestContext.Provider, node)
cleanCmd := fmt.Sprintf("rm -rf %q", hostDir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my own understanding, can you share why we no longer need to use sudo?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're definitely root now (the pod is root)

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 6, 2019
@aojea
Copy link
Member

aojea commented Sep 6, 2019

/cc

Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple questions about generalizing this, but otherwise lgtm

@@ -85,6 +85,9 @@ var _ = SIGDescribe("Mount propagation", func() {
// tmpfs to a subdirectory there. We check that these mounts are
// propagated to the right places.

hostExec := utils.NewHostExec(f)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an observation for generalizing this - I think the hostexec pod should set hostNetwork, hostIPC, and hostPID to true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends how far do we want to go.

For storage, we're fine if the host exec pod does nsenter to host mount namespace, so we can check mounts and what's in /var/lib/kubelet without potential mount propagation hiccups.

Do we have tests that actually need host pid or network namespace? Would it be better to add them to nsenter command line instead?

args := []string{
"exec",
fmt.Sprintf("--namespace=%v", pod.Namespace),
pod.Name,
"--",
"nsenter",
"--mount=/rootfs/proc/1/ns/mnt",
"--",
"sh",
"-c",
cmd,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[I think we can leave generalizing for a follow up] ?

@@ -102,8 +105,8 @@ var _ = SIGDescribe("Mount propagation", func() {
// running in parallel.
hostDir := "/var/lib/kubelet/" + f.Namespace.Name
defer func() {
cleanCmd := fmt.Sprintf("sudo rm -rf %q", hostDir)
e2essh.IssueSSHCommand(cleanCmd, framework.TestContext.Provider, node)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove e2essh.IssueSSHCommand, or just have it call hostexec under the hood?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that most tests that use ssh should move to hostexec, however, there may be some that need real ssh.

What does kubectl exec <hostesxec pod> nsenter [...] systemctl restart kubelet do when kubelet is restarted in the middle of kubectl exec? Our disruptive tests do that. systemctl restart (in host mount namespace) should work, it writes to a socket and it reaches the real systemd on the host. I am not sure about the kubectl exec part and interrupted connection to kubelet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd assume just like real SSH we will need to handle re-connect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there are solutions, but they are far more risky than good old ssh. I'd prefer gradual move to hostexec.

@BenTheElder
Copy link
Member

/priority important-longterm
needs rebase

thanks for doing this

@k8s-ci-robot k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Sep 13, 2019
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 13, 2019
@jsafrane
Copy link
Member Author

rebased

@BenTheElder
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 16, 2019
@BenTheElder
Copy link
Member

/approve
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 18, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: BenTheElder, jsafrane

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 18, 2019
@jsafrane
Copy link
Member Author

@BenTheElder @tallclair, can ve move forward with this PR? Or do you insist on generalization?

@BenTheElder
Copy link
Member

I would like to move forward and follow on with generalization later. If this pattern works well for some tests it would be great to move everything over but we shouldn't block on that.

@BenTheElder
Copy link
Member

/hold cancel
/label needs-rebase

@k8s-ci-robot
Copy link
Contributor

@BenTheElder: The label(s) /label needs-rebase cannot be applied. These labels are supported: api-review, community/discussion, community/maintenance, community/question, cuj/build-train-deploy, cuj/multi-user, platform/aws, platform/azure, platform/gcp, platform/minikube, platform/other

In response to this:

/hold cancel
/label needs-rebase

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 30, 2019
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

1 similar comment
@fejta-bot
Copy link

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to fejta).

Review the full test history for this PR.

Silence the bot with an /lgtm cancel or /hold comment for consistent failures.

Not all Kubernetes clusters allow ssh to nodes during test.
@jsafrane jsafrane force-pushed the mount-propagation-remove-ssh branch from d659e11 to 5dba3cb Compare October 1, 2019 08:21
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 1, 2019
@jsafrane
Copy link
Member Author

jsafrane commented Oct 1, 2019

rebased

@jsafrane jsafrane force-pushed the mount-propagation-remove-ssh branch from 5dba3cb to 0c3293a Compare October 1, 2019 08:31
@BenTheElder
Copy link
Member

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 1, 2019
@k8s-ci-robot k8s-ci-robot merged commit 867fef5 into kubernetes:master Oct 1, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Oct 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants