Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix mirror pod nfs test failure due to differing NFS versions #119765

Merged

Conversation

tzneal
Copy link
Contributor

@tzneal tzneal commented Aug 4, 2023

What type of PR is this?

/kind failing-test

What this PR does / why we need it:

/exports *(rw,fsid=0,insecure,no_root_squash)

can be mounted as /exports using NFSv3 and / using NFSv4

Use the '/' form of the mount which is common across other NFS tests.

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. labels Aug 4, 2023
@k8s-ci-robot
Copy link
Contributor

Please note that we're already in Test Freeze for the release-1.28 branch. This means every merged PR will be automatically fast-forwarded via the periodic ci-fast-forward job to the release branch of the upcoming v1.28.0 release.

Fast forwards are scheduled to happen every 6 hours, whereas the most recent run was: Fri Aug 4 16:28:53 UTC 2023.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 4, 2023
@tzneal
Copy link
Contributor Author

tzneal commented Aug 4, 2023

/test

@k8s-ci-robot
Copy link
Contributor

@tzneal: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

  • /test pull-cadvisor-e2e-kubernetes
  • /test pull-kubernetes-conformance-kind-ga-only-parallel
  • /test pull-kubernetes-coverage-unit
  • /test pull-kubernetes-dependencies
  • /test pull-kubernetes-dependencies-go-canary
  • /test pull-kubernetes-e2e-gce
  • /test pull-kubernetes-e2e-gce-100-performance
  • /test pull-kubernetes-e2e-gce-big-performance
  • /test pull-kubernetes-e2e-gce-canary
  • /test pull-kubernetes-e2e-gce-cos
  • /test pull-kubernetes-e2e-gce-cos-canary
  • /test pull-kubernetes-e2e-gce-cos-no-stage
  • /test pull-kubernetes-e2e-gce-network-proxy-http-connect
  • /test pull-kubernetes-e2e-gce-scale-performance-manual
  • /test pull-kubernetes-e2e-kind
  • /test pull-kubernetes-e2e-kind-ipv6
  • /test pull-kubernetes-integration
  • /test pull-kubernetes-integration-go-canary
  • /test pull-kubernetes-kubemark-e2e-gce-scale
  • /test pull-kubernetes-node-e2e-containerd
  • /test pull-kubernetes-typecheck
  • /test pull-kubernetes-unit
  • /test pull-kubernetes-unit-go-canary
  • /test pull-kubernetes-update
  • /test pull-kubernetes-verify
  • /test pull-kubernetes-verify-go-canary

The following commands are available to trigger optional jobs:

  • /test check-dependency-stats
  • /test pull-ci-kubernetes-unit-windows
  • /test pull-e2e-gce-cloud-provider-disabled
  • /test pull-kubernetes-conformance-image-test
  • /test pull-kubernetes-conformance-kind-ga-only
  • /test pull-kubernetes-conformance-kind-ipv6-parallel
  • /test pull-kubernetes-cos-cgroupv1-containerd-node-e2e
  • /test pull-kubernetes-cos-cgroupv1-containerd-node-e2e-features
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-eviction
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-features
  • /test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
  • /test pull-kubernetes-cross
  • /test pull-kubernetes-e2e-autoscaling-hpa-cm
  • /test pull-kubernetes-e2e-autoscaling-hpa-cpu
  • /test pull-kubernetes-e2e-capz-azure-disk
  • /test pull-kubernetes-e2e-capz-azure-disk-vmss
  • /test pull-kubernetes-e2e-capz-azure-file
  • /test pull-kubernetes-e2e-capz-azure-file-vmss
  • /test pull-kubernetes-e2e-capz-conformance
  • /test pull-kubernetes-e2e-capz-windows
  • /test pull-kubernetes-e2e-capz-windows-alpha-feature-vpa
  • /test pull-kubernetes-e2e-capz-windows-alpha-features
  • /test pull-kubernetes-e2e-capz-windows-serial-slow-hpa
  • /test pull-kubernetes-e2e-containerd-gce
  • /test pull-kubernetes-e2e-gce-correctness
  • /test pull-kubernetes-e2e-gce-cos-alpha-features
  • /test pull-kubernetes-e2e-gce-cos-kubetest2
  • /test pull-kubernetes-e2e-gce-csi-serial
  • /test pull-kubernetes-e2e-gce-device-plugin-gpu
  • /test pull-kubernetes-e2e-gce-kubelet-credential-provider
  • /test pull-kubernetes-e2e-gce-network-proxy-grpc
  • /test pull-kubernetes-e2e-gce-serial
  • /test pull-kubernetes-e2e-gce-storage-disruptive
  • /test pull-kubernetes-e2e-gce-storage-slow
  • /test pull-kubernetes-e2e-gce-storage-snapshot
  • /test pull-kubernetes-e2e-gci-gce-autoscaling
  • /test pull-kubernetes-e2e-gci-gce-ingress
  • /test pull-kubernetes-e2e-gci-gce-ipvs
  • /test pull-kubernetes-e2e-inplace-pod-resize-containerd-main-v2
  • /test pull-kubernetes-e2e-kind-alpha-features
  • /test pull-kubernetes-e2e-kind-canary
  • /test pull-kubernetes-e2e-kind-dual-canary
  • /test pull-kubernetes-e2e-kind-ipv6-canary
  • /test pull-kubernetes-e2e-kind-ipvs-dual-canary
  • /test pull-kubernetes-e2e-kind-kms
  • /test pull-kubernetes-e2e-kind-multizone
  • /test pull-kubernetes-e2e-kops-aws
  • /test pull-kubernetes-e2e-storage-kind-disruptive
  • /test pull-kubernetes-e2e-ubuntu-gce-network-policies
  • /test pull-kubernetes-integration-eks
  • /test pull-kubernetes-kind-dra
  • /test pull-kubernetes-kind-json-logging
  • /test pull-kubernetes-kind-text-logging
  • /test pull-kubernetes-kubemark-e2e-gce-big
  • /test pull-kubernetes-local-e2e
  • /test pull-kubernetes-node-arm64-e2e-containerd-ec2
  • /test pull-kubernetes-node-arm64-e2e-containerd-serial-ec2
  • /test pull-kubernetes-node-arm64-ubuntu-serial-gce
  • /test pull-kubernetes-node-crio-cgrpv1-evented-pleg-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-e2e
  • /test pull-kubernetes-node-crio-cgrpv2-e2e-kubetest2
  • /test pull-kubernetes-node-crio-e2e
  • /test pull-kubernetes-node-crio-e2e-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-alpha-features
  • /test pull-kubernetes-node-e2e-containerd-ec2
  • /test pull-kubernetes-node-e2e-containerd-features
  • /test pull-kubernetes-node-e2e-containerd-features-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-kubetest2
  • /test pull-kubernetes-node-e2e-containerd-serial-ec2
  • /test pull-kubernetes-node-e2e-containerd-sidecar-containers
  • /test pull-kubernetes-node-e2e-containerd-standalone-mode
  • /test pull-kubernetes-node-e2e-containerd-standalone-mode-all-alpha
  • /test pull-kubernetes-node-e2e-crio-dra
  • /test pull-kubernetes-node-kubelet-credential-provider
  • /test pull-kubernetes-node-kubelet-serial-containerd
  • /test pull-kubernetes-node-kubelet-serial-containerd-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager-kubetest2
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv1
  • /test pull-kubernetes-node-kubelet-serial-crio-cgroupv2
  • /test pull-kubernetes-node-kubelet-serial-hugepages
  • /test pull-kubernetes-node-kubelet-serial-memory-manager
  • /test pull-kubernetes-node-kubelet-serial-pod-disruption-conditions
  • /test pull-kubernetes-node-kubelet-serial-topology-manager
  • /test pull-kubernetes-node-kubelet-serial-topology-manager-kubetest2
  • /test pull-kubernetes-node-memoryqos-cgrpv2
  • /test pull-kubernetes-node-swap-fedora
  • /test pull-kubernetes-node-swap-fedora-serial
  • /test pull-kubernetes-node-swap-ubuntu-serial
  • /test pull-kubernetes-unit-experimental
  • /test pull-kubernetes-verify-strict-lint
  • /test pull-publishing-bot-validate

Use /test all to run the following jobs that were automatically triggered:

  • pull-kubernetes-conformance-kind-ga-only-parallel
  • pull-kubernetes-conformance-kind-ipv6-parallel
  • pull-kubernetes-dependencies
  • pull-kubernetes-e2e-gce
  • pull-kubernetes-e2e-kind
  • pull-kubernetes-e2e-kind-ipv6
  • pull-kubernetes-integration
  • pull-kubernetes-node-e2e-containerd
  • pull-kubernetes-typecheck
  • pull-kubernetes-unit
  • pull-kubernetes-verify

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added area/test sig/node Categorizes an issue or PR as relevant to SIG Node. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Aug 4, 2023
@tzneal
Copy link
Contributor Author

tzneal commented Aug 4, 2023

/test pull-kubernetes-node-e2e-containerd-serial-ec2
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
/test pull-kubernetes-e2e-gce-serial

@bart0sh
Copy link
Contributor

bart0sh commented Aug 6, 2023

@tzneal

/exports *(rw,fsid=0,insecure,no_root_squash) can be mounted as /exports using NFSv3 and / using NFSv4

Can you explain the reason?

test/e2e_node/mirror_pod_test.go Outdated Show resolved Hide resolved
test/e2e_node/mirror_pod_test.go Outdated Show resolved Hide resolved
@bart0sh bart0sh moved this from Triage to Needs Reviewer in SIG Node PR Triage Aug 6, 2023
@bart0sh
Copy link
Contributor

bart0sh commented Aug 6, 2023

/triage accepted
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Aug 6, 2023
@pacoxu
Copy link
Member

pacoxu commented Aug 7, 2023

for i in "$@"; do
# fsid=0: needed for NFSv4
echo "$i *(rw,fsid=0,insecure,no_root_squash)" >> /etc/exports
if [ -v gid ] ; then
chmod 070 "$i"
chgrp "$gid" "$i"
fi
# move index.html to here
/bin/cp /tmp/index.html "$i/"
chmod 644 "$i/index.html"
echo "Serving $i"
done

  Aug  7 02:23:08.062: INFO: At 2023-08-07 02:20:58 +0000 UTC - event for static-pod-nfs-test-pod19db961e-7df7-4ec0-bce0-2d7a25b7fc27-i-031fd0c2185b13817: {kubelet i-031fd0c2185b13817} FailedMount: MountVolume.SetUp failed for volume "nfs-vol" : mount failed: exit status 32
  Mounting command: mount
  Mounting arguments: -t nfs 10.22.0.132:/exports /var/lib/kubelet/pods/d0cdf0466fdf2b3b7fb7b61845f2242e/volumes/kubernetes.io~nfs/nfs-vol
  Output: Created symlink /run/systemd/system/remote-fs.target.wants/rpc-statd.service → /usr/lib/systemd/system/rpc-statd.service.
  mount.nfs: mounting 10.22.0.132:/exports failed, reason given by server: No such file or directory

  Aug  7 02:23:08.062: INFO: At 2023-08-07 02:20:59 +0000 UTC - event for static-pod-nfs-test-pod19db961e-7df7-4ec0-bce0-2d7a25b7fc27-i-031fd0c2185b13817: {kubelet i-031fd0c2185b13817} FailedMount: MountVolume.SetUp failed for volume "nfs-vol" : mount failed: exit status 32
  Mounting command: mount
  Mounting arguments: -t nfs 10.22.0.132:/exports /var/lib/kubelet/pods/d0cdf0466fdf2b3b7fb7b61845f2242e/volumes/kubernetes.io~nfs/nfs-vol
  Output: mount.nfs: mounting 10.22.0.132:/exports failed, reason given by server: No such file or directory

  Aug  7 02:23:08.062: INFO: At 2023-08-07 02:23:00 +0000 UTC - event for nfs-server: {kubelet i-031fd0c2185b13817} Killing: Stopping container nfs-server
  Aug  7 02:23:08.063: INFO: POD  NODE  PHASE  GRACE  CONDITIONS
  Aug  7 02:23:08.063: INFO: 
  Aug  7 02:23:08.064: INFO: 

Log from https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-cgroupv2-containerd-node-al2023-e2e-serial-ec2-eks/1688357029514579968.

The fix overall LGTM.

@aojea
Copy link
Member

aojea commented Aug 7, 2023

@tzneal

/exports *(rw,fsid=0,insecure,no_root_squash) can be mounted as /exports using NFSv3 and / using NFSv4

Can you explain the reason?

why the path is different, I assume you can server both protocols and use the same paths

@tzneal
Copy link
Contributor Author

tzneal commented Aug 7, 2023

@tzneal

/exports *(rw,fsid=0,insecure,no_root_squash) can be mounted as /exports using NFSv3 and / using NFSv4

Can you explain the reason?

why the path is different, I assume you can server both protocols and use the same paths

From https://linux.die.net/man/5/exports,

For NFSv4, there is a distinguished filesystem which is the root of all exported filesystem. This is specified with fsid=root or fsid=0 both of which mean exactly the same thing.

In testing this psuedo-root filesystem is called '/' so the mount path differs between v3 and v3. There may be another way to build an exports file that works for both? But the NFS test image sets fsid=0 for the mounts. I don't think this is correct in the presence of multiple mounts anyway, but its been that way for 8 years.

@tzneal
Copy link
Contributor Author

tzneal commented Aug 7, 2023

/retest

@aojea
Copy link
Member

aojea commented Aug 7, 2023

/sig storage

@k8s-ci-robot k8s-ci-robot added the sig/storage Categorizes an issue or PR as relevant to SIG Storage. label Aug 7, 2023
// In NFSv3 this root filesystem is mounted as '/'. This function is an attempt to try to gloss over those differences
// and make this test reliable for both versions.
func mountPath() string {
cmd := exec.Command("lsmod")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if nfsv4 module is loaded it doesn't mean that it's used. We may want to look at mtab/run mount, etc instead of making assumptions like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it will display in mtab (unless there is some other NFS filesystem that is already mounted). My logic was that the current code makes assumes NFSv3, this assumption may still be wrong in some cases but is less wrong in that the tests pass :)

@jsafrane
Copy link
Member

This is odd. We have tests that always uses Path: "/" and nobody has been complaining:

return &v1.VolumeSource{
NFS: &v1.NFSVolumeSource{
Server: nv.serverHost,
Path: "/",
ReadOnly: readOnly,
},
}

Look for [Driver: nfs] here: https://testgrid.k8s.io/sig-storage-kubernetes#gce-slow

Alternative solution would be to always use a NFS PV + PVC in the test, then you can use mountOptions: ["vers=4.1"] in the PV. Mount options are not available in in-line volumes.

/exports *(rw,fsid=0,insecure,no_root_squash)

can be mounted as `/exports` using NFSv3 and `/` using NFSv4

Mount as '/', since clients that support both can try both.
@tzneal tzneal force-pushed the detect-nfsv3-and-change-mount-path branch from 8bdb482 to 717c149 Compare August 11, 2023 12:27
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Aug 11, 2023
@tzneal
Copy link
Contributor Author

tzneal commented Aug 11, 2023

This is odd. We have tests that always uses Path: "/" and nobody has been complaining:

Interesting, I'll change that here. I know clients can try both, maybe just defaulting to the NFSv4 path is the most straightforward fix.

@tzneal
Copy link
Contributor Author

tzneal commented Aug 11, 2023

/test pull-kubernetes-node-e2e-containerd-serial-ec2
/test pull-kubernetes-cos-cgroupv2-containerd-node-e2e-serial
/test pull-kubernetes-e2e-gce-serial

@jsafrane
Copy link
Member

Do people run [NodeConformance] tests on weird distros / kernels? Because we run gce-slow basically only on Google's COS.

@tzneal
Copy link
Contributor Author

tzneal commented Aug 11, 2023

Do people run [NodeConformance] tests on weird distros / kernels? Because we run gce-slow basically only on Google's COS.

Yes, @upodroid, @dims and I have been working on getting tests to run on a variety of images including both Ubuntu and AL2. IMO, there shouldn't be anything in NodeConformance that can't run on any community OS that provides the right facilities.

@upodroid
Copy link
Member

Do people run [NodeConformance] tests on weird distros / kernels? Because we run gce-slow basically only on Google's COS.

Historically we tested everything against COS but it isn't the correct baseline for defining NodeConformance. NodeConformance tests should run with zero modifications on Ubuntu and Centos(bit complicated today but represented by AL2 for now, would have been Centos 7 before all the madness).

COS runs only on GCP and it is specifically designed for running containers and Google has made some design choices to support that objective. There is a good reason GKE also provides an alternate node OS that is based on a "mainstream" OS.

@tzneal
Copy link
Contributor Author

tzneal commented Aug 11, 2023

/retest

@tzneal
Copy link
Contributor Author

tzneal commented Aug 11, 2023

Looks like the root mount path of '/' works everywhere and is also used in other tests:

return &v1.VolumeSource{
NFS: &v1.NFSVolumeSource{
Server: nv.serverHost,
Path: "/",
ReadOnly: readOnly,
},
}

@jsafrane
Copy link
Member

Looks like the root mount path of '/' works everywhere and is also used in other tests:

I think that's the right approach, at least from what I know about NFS.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 15, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: b921cb91c125ae7f5ed5f16de9d723fc236494ad

@bart0sh bart0sh moved this from Needs Reviewer to Needs Approver in SIG Node PR Triage Aug 15, 2023
Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

since we have approval from sig storage, this lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: SergeyKanzhelev, tzneal

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 15, 2023
@k8s-ci-robot k8s-ci-robot merged commit 061ae8a into kubernetes:master Aug 16, 2023
16 checks passed
SIG Node CI/Test Board automation moved this from Triage to Done Aug 16, 2023
SIG Node PR Triage automation moved this from Needs Approver to Done Aug 16, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note-none Denotes a PR that doesn't merit a release note. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

8 participants