Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug][Operator][Leader election?] Operator failure and restart, logs attached #601

Closed
1 of 2 tasks
DmitriGekhtman opened this issue Sep 28, 2022 · 17 comments · Fixed by #946
Closed
1 of 2 tasks

[Bug][Operator][Leader election?] Operator failure and restart, logs attached #601

DmitriGekhtman opened this issue Sep 28, 2022 · 17 comments · Fixed by #946
Assignees
Labels
bug Something isn't working operator P1 Issue that should be fixed within a few weeks

Comments

@DmitriGekhtman
Copy link
Collaborator

Search before asking

  • I searched the issues and found no similar issues.

KubeRay Component

ray-operator

What happened + What you expected to happen

I was running the nightly KubeRay operator for development purposes and observed a failure with some strange error messages logged.

The logs had roughly 1000 lines of Read request instance not found error with names of pods unrelated to KubeRay,
followed by some complaints about leases and leader election,
followed by a crash and restart.

Ideally, there shouldn't be any issues related to leader election, since we don't support leader election right now.

Only the last few lines of Read request instance not found error! are pasted below.

2022-09-28T18:52:13.868Z	INFO	controllers.RayCluster	Read request instance not found error!	{"name": "kube-system/konnectivity-agent-795c8dddf.171917992063a820"}
2022-09-28T18:52:13.869Z	INFO	controllers.RayCluster	Read request instance not found error!	{"name": "kube-system/konnectivity-agent-795c8dddf.171917a01bf956aa"}
2022-09-28T18:52:13.869Z	INFO	controllers.RayCluster	Read request instance not found error!	{"name": "kube-system/konnectivity-agent.171913fcc9e51685"}
2022-09-28T18:52:13.869Z	INFO	controllers.RayCluster	Read request instance not found error!	{"name": "kube-system/konnectivity-agent.1719140d14d73e54"}
2022-09-28T18:52:23.872Z	INFO	controllers.RayCluster	Read request instance not found error!	{"name": "kube-system/konnectivity-agent-795c8dddf-cqw2g.1719179920d0d07b"}
2022-09-28T18:52:53.884Z	INFO	controllers.RayCluster	Read request instance not found error!	{"name": "kube-system/konnectivity-agent-795c8dddf-pp5jv.171917a01c50dde1"}
2022-09-28T19:59:07.446Z	INFO	controllers.RayCluster	Read request instance not found error!	{"name": "default/gk3-autopilot-cluster-1-nap-1l3c0g4h-554d0488-ltwl.1718f0a38651b857"}
2022-09-28T19:59:07.447Z	INFO	controllers.RayCluster	Read request instance not found error!	{"name": "default/gk3-autopilot-cluster-1-nap-1l3c0g4h-83672b81-t6rs.1718f0a3d8493393"}
E0928 21:27:20.873442       1 leaderelection.go:330] error retrieving resource lock ray-system/ray-operator-leader: Get "https://10.8.56.1:443/api/v1/namespaces/ray-system/configmaps/ray-operator-leader": context deadline exceeded
2022-09-28T21:27:20.874Z	DEBUG	events	Normal	{"object": {"kind":"ConfigMap","apiVersion":"v1"}, "reason": "LeaderElection", "message": "kuberay-operator-56749d657d-x5f65_085aa91b-20ef-4011-b7f4-4edff573f5c7 stopped leading"}
2022-09-28T21:27:20.874Z	DEBUG	events	Normal	{"object": {"kind":"Lease","namespace":"ray-system","name":"ray-operator-leader","uid":"e6a67c4c-c3a9-4b83-a0c6-7a9a3f02a1a2","apiVersion":"coordination.k8s.io/v1","resourceVersion":"31046663"}, "reason": "LeaderElection", "message": "kuberay-operator-56749d657d-x5f65_085aa91b-20ef-4011-b7f4-4edff573f5c7 stopped leading"}
I0928 21:27:20.874714       1 leaderelection.go:283] failed to renew lease ray-system/ray-operator-leader: timed out waiting for the condition
2022-09-28T21:27:20.903Z	INFO	Stopping and waiting for non leader election runnables
2022-09-28T21:27:20.903Z	INFO	Stopping and waiting for leader election runnables
2022-09-28T21:27:20.903Z	INFO	Stopping and waiting for caches
2022-09-28T21:27:20.903Z	INFO	Stopping and waiting for webhooks
2022-09-28T21:27:20.903Z	INFO	Wait completed, proceeding to shutdown the manager
2022-09-28T21:27:20.903Z	ERROR	setup	problem running manager	{"error": "leader election lost"}

Reproduction script

I don't know how to reproduce this yet.

Anything else

I don't know yet.

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@DmitriGekhtman DmitriGekhtman added bug Something isn't working P1 Issue that should be fixed within a few weeks operator labels Sep 28, 2022
@jeevb
Copy link

jeevb commented Mar 2, 2023

We are observing this as well with kuberay/operator:v.0.4.0. We have also confirmed that our crash loops are due to the operator being OOMKilled, even with a memory limit of 2Gi, and even when we have no RayCluster objects.

It seems likely that the issue is a result of this line:

Watches(&source.Kind{Type: &corev1.Event{}}, &handler.EnqueueRequestForObject{}).

We are now running a test with this line omitted, and can confirm that the spurious logs of unrelated resources are gone, and memory usage is lower.

This seems to suggest that the line will watch every event (and consequently trigger a reconcile on each one): https://github.com/kubernetes-sigs/controller-runtime/blob/f6f37e6cc1ec7b7d18a266a6614f86df211b1a0a/pkg/handler/enqueue.go#L35

To watch events related to child objects only, this seems more appropriate: https://github.com/kubernetes-sigs/controller-runtime/blob/f6f37e6cc1ec7b7d18a266a6614f86df211b1a0a/pkg/handler/enqueue_owner.go#L42

I believe .Owns already captures all events from pods and services owned by RayCluster objects, so the separate .Watch for events might be extraneous.

@DmitriGekhtman
Copy link
Collaborator Author

Ok, watching events was not a good idea cc @kevin85421 @Jeffwan @wilsonwang371 -- I'd recommend simplifying the implementation of the fault tolerance feature so that it doesn't do this.

@wilsonwang371
Copy link
Collaborator

sounds reasonable. I will take a look at this and discuss this with @Jeffwan

@jeevb
Copy link

jeevb commented Mar 2, 2023

This PR seems to suggest that the events watching was meant to support recovering from readiness/liveness probe failures. My understanding is this is already covered by watching the child pods. I’m curious if I’m missing something, and should be looking out for something in particular while we run a test with events watching disabled.

@kevin85421 kevin85421 added this to the v0.5.0 release milestone Mar 2, 2023
@wilsonwang371
Copy link
Collaborator

we are filtering out the events that we do not care here.

if event.InvolvedObject.Kind != "Pod" || event.Type != "Warning" || event.Reason != "Unhealthy" ||

So I think we may need to change this part to make it only process the pod events it cares about.

This PR seems to suggest that the events watching was meant to support recovering from readiness/liveness probe failures. My understanding is this is already covered by watching the child pods. I’m curious if I’m missing something, and should be looking out for something in particular while we run a test with events watching disabled.

@Jeffwan
Copy link
Collaborator

Jeffwan commented Mar 2, 2023

Let's try to filter the event with owner type = RayCluster. that's enough to bring down the memory usage.

@kevin85421
Copy link
Member

kevin85421 commented Mar 7, 2023

We can have a hotfix PR for this issue recently, but we finally need to stop watching events because operator operations should be idempotent and stateless. However, events are time-sensitive and deleting Pods based on events are not idempotent.

@wilsonwang371 wilsonwang371 linked a pull request Mar 7, 2023 that will close this issue
4 tasks
@wilsonwang371
Copy link
Collaborator

This PR seems to suggest that the events watching was meant to support recovering from readiness/liveness probe failures. My understanding is this is already covered by watching the child pods. I’m curious if I’m missing something, and should be looking out for something in particular while we run a test with events watching disabled.

Hi Jeev,

Are you able to try our patched version later and help us to verify this?

@jeevb
Copy link

jeevb commented Mar 16, 2023

Are you able to try our patched version later and help us to verify this?

Yes, I’ll deploy from the PR branch tomorrow and let it run for awhile to collect metrics. Thanks for working on this so promptly! :)

@jeevb
Copy link

jeevb commented Mar 16, 2023

Getting lots of spurious logs related to events from unrelated objects (on df56e5c52309f85eb9a72a2081afc17bbe88f9c8):

2023-03-16T20:12:25.884Z	INFO	controllers.RayCluster	reconcile RayCluster Event	{"event name": "gke-metadata-server-nmtxj.174cfcdb11e529a1"}
2023-03-16T20:12:25.884Z	ERROR	controller.raycluster-controller	Reconciler error	{"reconciler group": "ray.io", "reconciler kind": "RayCluster", "name": "gke-metadata-server-nmtxj.174cfcdb11e529a1", "namespace": "kube-system", "error": "Index with name field:metadata.name does not exist"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227
2023-03-16T20:12:25.985Z	INFO	controllers.RayCluster	reconcile RayCluster Event	{"event name": "gke-metadata-server-bltns.174cfcddbcea53f2"}
2023-03-16T20:12:25.985Z	ERROR	controller.raycluster-controller	Reconciler error	{"reconciler group": "ray.io", "reconciler kind": "RayCluster", "name": "gke-metadata-server-bltns.174cfcddbcea53f2", "namespace": "kube-system", "error": "Index with name field:metadata.name does not exist"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227
2023-03-16T20:12:26.084Z	INFO	controllers.RayCluster	reconcile RayCluster Event	{"event name": "gke-metadata-server-q78nk.174cfcde1063988c"}
2023-03-16T20:12:26.084Z	ERROR	controller.raycluster-controller	Reconciler error	{"reconciler group": "ray.io", "reconciler kind": "RayCluster", "name": "gke-metadata-server-q78nk.174cfcde1063988c", "namespace": "kube-system", "error": "Index with name field:metadata.name does not exist"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227

@wilsonwang371
Copy link
Collaborator

Getting lots of spurious logs related to events from unrelated objects (on df56e5c52309f85eb9a72a2081afc17bbe88f9c8):

2023-03-16T20:12:25.884Z	INFO	controllers.RayCluster	reconcile RayCluster Event	{"event name": "gke-metadata-server-nmtxj.174cfcdb11e529a1"}
2023-03-16T20:12:25.884Z	ERROR	controller.raycluster-controller	Reconciler error	{"reconciler group": "ray.io", "reconciler kind": "RayCluster", "name": "gke-metadata-server-nmtxj.174cfcdb11e529a1", "namespace": "kube-system", "error": "Index with name field:metadata.name does not exist"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227
2023-03-16T20:12:25.985Z	INFO	controllers.RayCluster	reconcile RayCluster Event	{"event name": "gke-metadata-server-bltns.174cfcddbcea53f2"}
2023-03-16T20:12:25.985Z	ERROR	controller.raycluster-controller	Reconciler error	{"reconciler group": "ray.io", "reconciler kind": "RayCluster", "name": "gke-metadata-server-bltns.174cfcddbcea53f2", "namespace": "kube-system", "error": "Index with name field:metadata.name does not exist"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227
2023-03-16T20:12:26.084Z	INFO	controllers.RayCluster	reconcile RayCluster Event	{"event name": "gke-metadata-server-q78nk.174cfcde1063988c"}
2023-03-16T20:12:26.084Z	ERROR	controller.raycluster-controller	Reconciler error	{"reconciler group": "ray.io", "reconciler kind": "RayCluster", "name": "gke-metadata-server-q78nk.174cfcde1063988c", "namespace": "kube-system", "error": "Index with name field:metadata.name does not exist"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
	/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227

thanks. let me take a look

@wilsonwang371
Copy link
Collaborator

Hi @jeevb ,

I manually tested the latest code on my machine and it is working as expected now. The issue you are seeing is because of too many debug messages that I forgot to disable.

You can try the latest patch and see the result.

@wilsonwang371
Copy link
Collaborator

Hi @jeevb can you take a look at this again and see if no more extra logs?

@jeevb
Copy link

jeevb commented Mar 27, 2023

Yes, will test today and report back!

@jeevb
Copy link

jeevb commented Mar 28, 2023

Not seeing spam of messages anymore, but seeing these ~10 log messages associated with unrelated pods at startup:

2023-03-28T03:24:55.557Z	INFO	controllers.RayCluster	no ray node pod found for event	{"event": "&Event{ObjectMeta:{gke-metadata-server-kf45m.175075bb8d2a4d81  kube-system  c3768cad-377d-43c6-9aef-c648170a4dd7 500053417 0 2023-03-28 02:55:09 +0000 UTC <nil> <nil> map[] map[] [] []  [{kubelet Update v1 2023-03-28 02:55:09 +0000 UTC FieldsV1 {\"f:count\":{},\"f:firstTimestamp\":{},\"f:involvedObject\":{},\"f:lastTimestamp\":{},\"f:message\":{},\"f:reason\":{},\"f:source\":{\"f:component\":{},\"f:host\":{}},\"f:type\":{}} }]},InvolvedObject:ObjectReference{Kind:Pod,Namespace:kube-system,Name:gke-metadata-server-kf45m,UID:4c109758-c5db-42a4-afaa-f759dd984ce3,APIVersion:v1,ResourceVersion:4099672145,FieldPath:spec.containers{gke-metadata-server},},Reason:Unhealthy,Message:Readiness probe failed: Get \"http://127.0.0.1:989/healthz\": dial tcp 127.0.0.1:989: connect: connection refused,Source:EventSource{Component:kubelet,Host:gke-cr-west1-n1-32-208-preempt-202303-c73cb1c6-rw5j,},FirstTimestamp:2023-03-28 02:55:09 +0000 UTC,LastTimestamp:2023-03-28 02:55:09 +0000 UTC,Count:1,Type:Warning,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}"}
2023-03-28T03:24:55.557Z	INFO	controllers.RayCluster	no ray node pod found for event	{"event": "&Event{ObjectMeta:{gke-metadata-server-qcd6c.175075bb7adbf927  kube-system  a705c3d8-2121-43f5-b8ce-49492d8f3f87 500053397 0 2023-03-28 02:55:09 +0000 UTC <nil> <nil> map[] map[] [] []  [{kubelet Update v1 2023-03-28 02:55:09 +0000 UTC FieldsV1 {\"f:count\":{},\"f:firstTimestamp\":{},\"f:involvedObject\":{},\"f:lastTimestamp\":{},\"f:message\":{},\"f:reason\":{},\"f:source\":{\"f:component\":{},\"f:host\":{}},\"f:type\":{}} }]},InvolvedObject:ObjectReference{Kind:Pod,Namespace:kube-system,Name:gke-metadata-server-qcd6c,UID:c607a02f-f2ae-4b41-a6af-c7f4f26421be,APIVersion:v1,ResourceVersion:4099672288,FieldPath:spec.containers{gke-metadata-server},},Reason:Unhealthy,Message:Readiness probe failed: Get \"http://127.0.0.1:989/healthz\": dial tcp 127.0.0.1:989: connect: connection refused,Source:EventSource{Component:kubelet,Host:gke-cr-west1-n1-32-208-preempt-202303-4964322f-69mr,},FirstTimestamp:2023-03-28 02:55:09 +0000 UTC,LastTimestamp:2023-03-28 02:55:09 +0000 UTC,Count:1,Type:Warning,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}"}
2023-03-28T03:24:55.557Z	INFO	controllers.RayCluster	no ray node pod found for event	{"event": "&Event{ObjectMeta:{gke-metadata-server-cmx5x.175075c006fe45fd  kube-system  630d5f76-84e2-4453-a86e-de454edb9fce 500053961 0 2023-03-28 02:55:28 +0000 UTC <nil> <nil> map[] map[] [] []  [{kubelet Update v1 2023-03-28 02:55:28 +0000 UTC FieldsV1 {\"f:count\":{},\"f:firstTimestamp\":{},\"f:involvedObject\":{},\"f:lastTimestamp\":{},\"f:message\":{},\"f:reason\":{},\"f:source\":{\"f:component\":{},\"f:host\":{}},\"f:type\":{}} }]},InvolvedObject:ObjectReference{Kind:Pod,Namespace:kube-system,Name:gke-metadata-server-cmx5x,UID:ebd44c55-0353-4673-8fd5-74c9ffef055c,APIVersion:v1,ResourceVersion:4099672960,FieldPath:spec.containers{gke-metadata-server},},Reason:Unhealthy,Message:Readiness probe failed: Get \"http://127.0.0.1:989/healthz\": dial tcp 127.0.0.1:989: connect: connection refused,Source:EventSource{Component:kubelet,Host:gke-cr-west1-n1-32-208-preempt-202303-4964322f-7l6p,},FirstTimestamp:2023-03-28 02:55:28 +0000 UTC,LastTimestamp:2023-03-28 02:55:29 +0000 UTC,Count:2,Type:Warning,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}"}

@wilsonwang371
Copy link
Collaborator

Not seeing spam of messages anymore, but seeing these ~10 log messages associated with unrelated pods at startup:

2023-03-28T03:24:55.557Z	INFO	controllers.RayCluster	no ray node pod found for event	{"event": "&Event{ObjectMeta:{gke-metadata-server-kf45m.175075bb8d2a4d81  kube-system  c3768cad-377d-43c6-9aef-c648170a4dd7 500053417 0 2023-03-28 02:55:09 +0000 UTC <nil> <nil> map[] map[] [] []  [{kubelet Update v1 2023-03-28 02:55:09 +0000 UTC FieldsV1 {\"f:count\":{},\"f:firstTimestamp\":{},\"f:involvedObject\":{},\"f:lastTimestamp\":{},\"f:message\":{},\"f:reason\":{},\"f:source\":{\"f:component\":{},\"f:host\":{}},\"f:type\":{}} }]},InvolvedObject:ObjectReference{Kind:Pod,Namespace:kube-system,Name:gke-metadata-server-kf45m,UID:4c109758-c5db-42a4-afaa-f759dd984ce3,APIVersion:v1,ResourceVersion:4099672145,FieldPath:spec.containers{gke-metadata-server},},Reason:Unhealthy,Message:Readiness probe failed: Get \"http://127.0.0.1:989/healthz\": dial tcp 127.0.0.1:989: connect: connection refused,Source:EventSource{Component:kubelet,Host:gke-cr-west1-n1-32-208-preempt-202303-c73cb1c6-rw5j,},FirstTimestamp:2023-03-28 02:55:09 +0000 UTC,LastTimestamp:2023-03-28 02:55:09 +0000 UTC,Count:1,Type:Warning,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}"}
2023-03-28T03:24:55.557Z	INFO	controllers.RayCluster	no ray node pod found for event	{"event": "&Event{ObjectMeta:{gke-metadata-server-qcd6c.175075bb7adbf927  kube-system  a705c3d8-2121-43f5-b8ce-49492d8f3f87 500053397 0 2023-03-28 02:55:09 +0000 UTC <nil> <nil> map[] map[] [] []  [{kubelet Update v1 2023-03-28 02:55:09 +0000 UTC FieldsV1 {\"f:count\":{},\"f:firstTimestamp\":{},\"f:involvedObject\":{},\"f:lastTimestamp\":{},\"f:message\":{},\"f:reason\":{},\"f:source\":{\"f:component\":{},\"f:host\":{}},\"f:type\":{}} }]},InvolvedObject:ObjectReference{Kind:Pod,Namespace:kube-system,Name:gke-metadata-server-qcd6c,UID:c607a02f-f2ae-4b41-a6af-c7f4f26421be,APIVersion:v1,ResourceVersion:4099672288,FieldPath:spec.containers{gke-metadata-server},},Reason:Unhealthy,Message:Readiness probe failed: Get \"http://127.0.0.1:989/healthz\": dial tcp 127.0.0.1:989: connect: connection refused,Source:EventSource{Component:kubelet,Host:gke-cr-west1-n1-32-208-preempt-202303-4964322f-69mr,},FirstTimestamp:2023-03-28 02:55:09 +0000 UTC,LastTimestamp:2023-03-28 02:55:09 +0000 UTC,Count:1,Type:Warning,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}"}
2023-03-28T03:24:55.557Z	INFO	controllers.RayCluster	no ray node pod found for event	{"event": "&Event{ObjectMeta:{gke-metadata-server-cmx5x.175075c006fe45fd  kube-system  630d5f76-84e2-4453-a86e-de454edb9fce 500053961 0 2023-03-28 02:55:28 +0000 UTC <nil> <nil> map[] map[] [] []  [{kubelet Update v1 2023-03-28 02:55:28 +0000 UTC FieldsV1 {\"f:count\":{},\"f:firstTimestamp\":{},\"f:involvedObject\":{},\"f:lastTimestamp\":{},\"f:message\":{},\"f:reason\":{},\"f:source\":{\"f:component\":{},\"f:host\":{}},\"f:type\":{}} }]},InvolvedObject:ObjectReference{Kind:Pod,Namespace:kube-system,Name:gke-metadata-server-cmx5x,UID:ebd44c55-0353-4673-8fd5-74c9ffef055c,APIVersion:v1,ResourceVersion:4099672960,FieldPath:spec.containers{gke-metadata-server},},Reason:Unhealthy,Message:Readiness probe failed: Get \"http://127.0.0.1:989/healthz\": dial tcp 127.0.0.1:989: connect: connection refused,Source:EventSource{Component:kubelet,Host:gke-cr-west1-n1-32-208-preempt-202303-4964322f-7l6p,},FirstTimestamp:2023-03-28 02:55:28 +0000 UTC,LastTimestamp:2023-03-28 02:55:29 +0000 UTC,Count:2,Type:Warning,EventTime:0001-01-01 00:00:00 +0000 UTC,Series:nil,Action:,Related:nil,ReportingController:,ReportingInstance:,}"}

This is generally ok since this is the case when we have an unhealthy pod but we are not going to deal with it. If this is also not something we want, we can remove it later.

@jeevb
Copy link

jeevb commented Mar 28, 2023

Everything looks good so far. Anything in particular I should test?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working operator P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants