Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeing flakey testing where default namespace doesn't exist #2626

Closed
jonathan-innis opened this issue Dec 19, 2023 · 6 comments · Fixed by #2668
Closed

Seeing flakey testing where default namespace doesn't exist #2626

jonathan-innis opened this issue Dec 19, 2023 · 6 comments · Fixed by #2668
Labels
kind/support Categorizes issue or PR as a support question.

Comments

@jonathan-innis
Copy link
Member

Currently, we are running envtest in Karpenter where we are starting up all of the binaries and then applying pods to the default namespace. Every now and then, we are seeing failures in our pod metric testing (seen here: https://github.com/kubernetes-sigs/karpenter/actions/runs/7251475446/job/19753909305?pr=885). When this happens, it always contains the same error

Pod Metrics [It] should update the pod state metrics
/home/runner/work/karpenter/karpenter/pkg/controllers/metrics/pod/suite_test.go:56

  [FAILED] Expected success, but got an error:
      <*errors.StatusError | 0xc000134aa0>: 
      namespaces "default" not found
      {
          ErrStatus: {
              TypeMeta: {Kind: "", APIVersion: ""},
              ListMeta: {
                  SelfLink: "",
                  ResourceVersion: "",
                  Continue: "",
                  RemainingItemCount: nil,
              },
              Status: "Failure",
              Message: "namespaces \"default\" not found",
              Reason: "NotFound",
              Details: {Name: "default", Group: "", Kind: "namespaces", UID: "", Causes: nil, RetryAfterSeconds: 0},
              Code: 404,
          },
      }
  In [It] at: /home/runner/work/karpenter/karpenter/pkg/controllers/metrics/pod/suite_test.go:58 @ 12/18/23 17:16:52.[71](https://github.com/kubernetes-sigs/karpenter/actions/runs/7251475446/job/19753909305?pr=885#step:5:72)9

This error indicates that the default namespace doesn't exist at the apiserver when we are applying the object; however, I would generally expect this namespace to exist once the apiserver starts up.

It's worth noting that we don't see this error in other testing where we are running envtest.Environment with the CRDs option set. I suspect that this is due to the fact that this gives the setup a little more time to populate this namespace at the apiserver before it returns.

Is there a general recommendation here? Obviously we could wait for the default namespace to exist in our own code, but it would be nice if we could be assured that it would always exist when we start testing from upstream.

@troy0820
Copy link
Member

/kind support

@k8s-ci-robot k8s-ci-robot added the kind/support Categorizes issue or PR as a support question. label Dec 19, 2023
@sbueringer
Copy link
Member

It's worth noting that we don't see this error in other testing where we are running envtest.Environment with the CRDs option set. I suspect that this is due to the fact that this gives the setup a little more time to populate this namespace at the apiserver before it returns.

Probably as envtest first deploys CRDs and then waits for them to be available.

Which client call gives you this error?

@jonathan-innis
Copy link
Member Author

Probably as envtest first deploys CRDs and then waits for them to be available

Yeah, this is what I was suspecting. There must be enough time in there that it's able to bring-up the default namespace.

Which client call gives you this error

It's a call to create a pod with no namespace specified.

@sbueringer
Copy link
Member

sbueringer commented Jan 3, 2024

@jonathan-innis I'm fine with adding a "waitForDefaultNamespace" call here: https://github.com/kubernetes-sigs/controller-runtime/blob/21779fbe6419e03ae1cd86a69b35d2725c6f7558/pkg/envtest/server.go#L273C1-L273C1

@jonathan-innis
Copy link
Member Author

jonathan-innis commented Jan 3, 2024

Perfect, I'll try to put-up a PR to add this in shortly 🎉 Thanks for the help!

@jonathan-innis
Copy link
Member Author

@sbueringer Opened a PR with the wait mechanism. Frustratingly, I've been trying to repro this locally but no dice, so it's hard for me to tell if this is actually going to resolve our issue. I'd suspect that it will, but it's hard to say without being able to prove it.

Let me know if you have any thoughts on ways I could force this race.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants