Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when updated ConfigMaps to pods,pod's filesystem by the Kubelet get errors #112081

Closed
pacoxu opened this issue Aug 27, 2022 · 11 comments · Fixed by #112624
Closed

when updated ConfigMaps to pods,pod's filesystem by the Kubelet get errors #112081

pacoxu opened this issue Aug 27, 2022 · 11 comments · Fixed by #112624
Assignees
Labels
kind/regression Categorizes issue or PR as related to a regression from a prior release. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.

Comments

@pacoxu
Copy link
Member

pacoxu commented Aug 27, 2022

See #107329

OpenShift found issues rendering updated ConfigMaps to pods. When ConfigMaps get updated within the API, they do not get rendered to the resulting pod's filesystem by the Kubelet with the following error:

Aug 26 20:22:30 test1-lkz4t-master-1 kubenswrapper[1474]: E0826 20:22:30.751336    1474 nestedpendingoperations.go:335] Operation for "{volumeName:kubernetes.io/configmap/d381fba5-463d-420a-8ef2-ba4d9a94846d-trusted-ca-bundle podName:d381fba5-463d-420a-8ef2-ba4d9a94846d nodeName:}" failed. No retries permitted until 2022-08-26 20:24:32.751309823 +0000 UTC m=+2312.350897767 (durationBeforeRetry 2m2s). Error: MountVolume.SetUp
 failed for volume "trusted-ca-bundle" (UniqueName: "kubernetes.io/configmap/d381fba5-463d-420a-8ef2-ba4d9a94846d-trusted-ca-bundle") pod "insights-operator-5655ffbd97-x2gqw" (UID: "d381fba5-463d-420a-8ef2-ba4d9a94846d") : requesting quota on existing directory /var/lib/kubelet/pods/d381fba5-463d-420a-8ef2-ba4d9a94846d/volumes/kubernetes.io~configmap/trusted-ca-bundle but different pod be50da94-ebf1-4cb0-9e0b-2949fd2bed7b f9
8fa631-4940-4217-9b82-4e5ad6720238

We will need to backport to 1.25.

Pr is open to Move LocalStorageCapacityIsolationFSQuotaMonitoring back to Alpha.

/sig node
/kind regression
/priority important-soon

Originally posted by @rphillips in #112076

@rphillips Would you link to openshift issue or give me more evidence or reproduce steps?

@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. kind/regression Categorizes issue or PR as related to a regression from a prior release. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Aug 27, 2022
@dghubble
Copy link
Contributor

Going through conformance testing for Typhoon v1.25.0, I see the same break.

Aug 27 02:02:43.168: INFO: At 2022-08-27 02:00:45 +0000 UTC - event for pod-configmaps-6ca4b12d-71de-4a0a-844a-c0b75151a18e: {kubelet ip-10-0-32-215} FailedMount: MountVolume.SetUp       failed for volume "configmap-volume" : requesting quota on existing directory /var/lib/kubelet/pods/f09fae17-ff16-4a05-aab3-7b897cb5b732/volumes/kubernetes.io~configmap/configmap-volu      me but different pod 673ad247-abf0-434e-99eb-1c3f57d7fdaa a4568e94-2b2d-438f-a4bd-c9edc814e478

@pacoxu
Copy link
Member Author

pacoxu commented Aug 30, 2022

it was happening in our CI, so I don't have a test reproducer... https://kubernetes.slack.com/archives/C0BP8PW9G/p1659618168634919

@pacoxu
Copy link
Member Author

pacoxu commented Sep 15, 2022

/triage accepted
/assign

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Sep 15, 2022
@pacoxu
Copy link
Member Author

pacoxu commented Sep 16, 2022

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-crio-cgroupv1-node-e2e-conformance/1563396196611919872

/*
Release: v1.9
Testname: ConfigMap Volume, update
Description: The ConfigMap that is created MUST be accessible to read from the newly created Pod using the volume mount that is mapped to custom path in the Pod. When the ConfigMap is updated the change to the config map MUST be verified by reading the content from the mounted file in the Pod.
*/
framework.ConformanceIt("updates should be reflected in volume [NodeConformance]", func() {
podLogTimeout := e2epod.GetPodSecretUpdateTimeout(f.ClientSet)
containerTimeoutArg := fmt.Sprintf("--retry_time=%v", int(podLogTimeout.Seconds()))
name := "configmap-test-upd-" + string(uuid.NewUUID())
volumeName := "configmap-volume"
volumeMountPath := "/etc/configmap-volume"
configMap := &v1.ConfigMap{
ObjectMeta: metav1.ObjectMeta{
Namespace: f.Namespace.Name,
Name: name,
},
Data: map[string]string{
"data-1": "value-1",
},
}
ginkgo.By(fmt.Sprintf("Creating configMap with name %s", configMap.Name))
var err error
if configMap, err = f.ClientSet.CoreV1().ConfigMaps(f.Namespace.Name).Create(context.TODO(), configMap, metav1.CreateOptions{}); err != nil {
framework.Failf("unable to create test configMap %s: %v", configMap.Name, err)
}
pod := createConfigMapVolumeMounttestPod(f.Namespace.Name, volumeName, name, volumeMountPath,
"--break_on_expected_content=false", containerTimeoutArg, "--file_content_in_loop=/etc/configmap-volume/data-1")
ginkgo.By("Creating the pod")
f.PodClient().CreateSync(pod)
pollLogs := func() (string, error) {
return e2epod.GetPodLogs(f.ClientSet, f.Namespace.Name, pod.Name, pod.Spec.Containers[0].Name)
}
gomega.Eventually(pollLogs, podLogTimeout, framework.Poll).Should(gomega.ContainSubstring("value-1"))
ginkgo.By(fmt.Sprintf("Updating configmap %v", configMap.Name))
configMap.ResourceVersion = "" // to force update
configMap.Data["data-1"] = "value-2"
_, err = f.ClientSet.CoreV1().ConfigMaps(f.Namespace.Name).Update(context.TODO(), configMap, metav1.UpdateOptions{})
framework.ExpectNoError(err, "Failed to update configmap %q in namespace %q", configMap.Name, f.Namespace.Name)
ginkgo.By("waiting to observe update in volume")
gomega.Eventually(pollLogs, podLogTimeout, framework.Poll).Should(gomega.ContainSubstring("value-2"))
})

The test is simple

  • create a configmap
  • mount it in a pod
  • edit the configmap
  • check if the configmap mounted file is changed later

dghubble added a commit to poseidon/typhoon that referenced this issue Sep 19, 2022
* LocalStorageCapacityIsolationFSQuotaMonitoring was reverted back to
alpha in v1.25.1, so we don't need to explicitly disable it anymore

Rel: kubernetes/kubernetes#112081
dghubble-robot pushed a commit to poseidon/terraform-onprem-kubernetes that referenced this issue Sep 20, 2022
* LocalStorageCapacityIsolationFSQuotaMonitoring was reverted back to
alpha in v1.25.1, so we don't need to explicitly disable it anymore

Rel: kubernetes/kubernetes#112081
dghubble-robot pushed a commit to poseidon/terraform-azure-kubernetes that referenced this issue Sep 20, 2022
* LocalStorageCapacityIsolationFSQuotaMonitoring was reverted back to
alpha in v1.25.1, so we don't need to explicitly disable it anymore

Rel: kubernetes/kubernetes#112081
dghubble-robot pushed a commit to poseidon/terraform-digitalocean-kubernetes that referenced this issue Sep 20, 2022
* LocalStorageCapacityIsolationFSQuotaMonitoring was reverted back to
alpha in v1.25.1, so we don't need to explicitly disable it anymore

Rel: kubernetes/kubernetes#112081
dghubble-robot pushed a commit to poseidon/terraform-aws-kubernetes that referenced this issue Sep 20, 2022
* LocalStorageCapacityIsolationFSQuotaMonitoring was reverted back to
alpha in v1.25.1, so we don't need to explicitly disable it anymore

Rel: kubernetes/kubernetes#112081
dghubble-robot pushed a commit to poseidon/terraform-google-kubernetes that referenced this issue Sep 20, 2022
* LocalStorageCapacityIsolationFSQuotaMonitoring was reverted back to
alpha in v1.25.1, so we don't need to explicitly disable it anymore

Rel: kubernetes/kubernetes#112081
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 15, 2022
@pacoxu
Copy link
Member Author

pacoxu commented Dec 27, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 27, 2022
@pacoxu
Copy link
Member Author

pacoxu commented Feb 3, 2023

/reopen

@k8s-ci-robot k8s-ci-robot reopened this Feb 3, 2023
@k8s-ci-robot
Copy link
Contributor

@pacoxu: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Snaipe pushed a commit to aristanetworks/monsoon that referenced this issue Apr 13, 2023
* LocalStorageCapacityIsolationFSQuotaMonitoring was reverted back to
alpha in v1.25.1, so we don't need to explicitly disable it anymore

Rel: kubernetes/kubernetes#112081
@k8s-triage-robot
Copy link

This issue is labeled with priority/important-soon but has not been updated in over 90 days, and should be re-triaged.
Important-soon issues must be staffed and worked on either currently, or very soon, ideally in time for the next release.

You can:

  • Confirm that this issue is still relevant with /triage accepted (org members only)
  • Deprioritize it with /priority important-longterm or /priority backlog
  • Close this issue with /close

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels May 4, 2023
@pacoxu
Copy link
Member Author

pacoxu commented Jul 24, 2023

This should be fixed by #115314.
/close

@k8s-ci-robot
Copy link
Contributor

@pacoxu: Closing this issue.

In response to this:

This should be fixed by #115314.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/regression Categorizes issue or PR as related to a regression from a prior release. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/node Categorizes an issue or PR as relevant to SIG Node.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants