Foreground deletion of OpentelemetryColletor object causes objects to be created and deleted in endless loop #2364

huzhekun · 2023-11-17T02:30:56Z

Component(s)

No response

What happened?

Description

When deleting an OpentelemetryColletor object with the command kubectl delete opentelemetrycollector cluster -n observability-metrics --cascade=foreground the object does not delete and is instead stuck in a cycle trying to recreate dependent objects and dependent objects being deleted

Steps to Reproduce

Delete an OpentelemetryColletor object with --cascade=foreground

Expected Result

Object's dependent resources delete cleanly then the collector is deleted

Actual Result

Collector is not deleted, the underlying resources such as deployments or daemonsets and others are stuck in a cycle of being deleted and created multiple times per second

Example of watching the underlying deployment of the collector (after running the command to delete the collector object)

$ kubectl get deployment -n observability-metrics -w
NAME                READY   UP-TO-DATE   AVAILABLE   AGE
cluster-collector   2/2     2            2           33s
cluster-collector   2/2     2            2           33s
cluster-collector   1/2     1            1           34s
cluster-collector   0/2     0            0           34s
cluster-collector   0/2     0            0           35s
cluster-collector   0/2     0            0           36s
cluster-collector   0/2     0            0           0s
cluster-collector   0/2     0            0           0s
cluster-collector   0/2     0            0           0s
cluster-collector   0/2     2            0           0s
cluster-collector   0/2     2            0           1s
cluster-collector   0/2     2            0           1s
cluster-collector   0/2     1            0           2s
cluster-collector   0/2     0            0           2s
cluster-collector   0/2     0            0           5s
cluster-collector   0/2     0            0           5s
cluster-collector   0/2     0            0           0s
cluster-collector   0/2     0            0           0s
cluster-collector   0/2     0            0           0s
cluster-collector   0/2     2            0           0s
cluster-collector   0/2     2            0           1s
cluster-collector   0/2     2            0           1s
cluster-collector   1/2     1            1           2s
cluster-collector   0/2     0            0           2s
cluster-collector   0/2     0            0           6s
cluster-collector   0/2     0            0           6s
cluster-collector   0/2     0            0           0s
cluster-collector   0/2     0            0           0s
cluster-collector   0/2     0            0           0s
cluster-collector   0/2     2            0           0s
...

Kubernetes Version

1.25

Operator version

0.83.0

Collector version

0.83.0

Environment information

Environment

OS: (e.g., "Amazon Linux 2")

Log output

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

jaronoff97 · 2023-11-17T20:35:29Z

hey, this is something I haven't tested but can look in to. We should probably write a test for this as well. My bet is the reconciler isn't checking for a deletion timestamp as a blocker to reconciliation

jaronoff97 · 2023-11-21T22:36:44Z

okay was easily able to repro this. I think the problem has to do with deletion timestamp and finalizers. The fix should be simple – check for a deletion timestamp on the CRD we get. I'm not positive what to do about finalizers, i don't think we need to do anything special for it, but going to check with @pavolloffay on that one.

jaronoff97 · 2023-11-21T23:00:56Z

Yep, checking for the deletion timestamp was enough. I also found a fix for a pervasive operator issue that I'm going to solve like cockroach db here by using the retry.

jaronoff97 · 2023-11-27T16:07:38Z

Should be all set in the next release. I wrote a unit test to catch this and also tested manually on a kind cluster. Please let me know if you see any further issues after upgrading. Thank you!

huzhekun added bug Something isn't working needs triage labels Nov 17, 2023

jaronoff97 self-assigned this Nov 17, 2023

jaronoff97 added area:collector Issues for deploying collector and removed needs triage labels Nov 17, 2023

jaronoff97 mentioned this issue Nov 22, 2023

Fix cascading delete, retry on conflict #2383

Merged

jaronoff97 closed this as completed in #2383 Nov 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Foreground deletion of OpentelemetryColletor object causes objects to be created and deleted in endless loop #2364

Foreground deletion of OpentelemetryColletor object causes objects to be created and deleted in endless loop #2364

huzhekun commented Nov 17, 2023

jaronoff97 commented Nov 17, 2023 •

edited

jaronoff97 commented Nov 21, 2023

jaronoff97 commented Nov 21, 2023

jaronoff97 commented Nov 27, 2023

Foreground deletion of OpentelemetryColletor object causes objects to be created and deleted in endless loop #2364

Foreground deletion of OpentelemetryColletor object causes objects to be created and deleted in endless loop #2364

Comments

huzhekun commented Nov 17, 2023

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Kubernetes Version

Operator version

Collector version

Environment information

Environment

Log output

Additional context

jaronoff97 commented Nov 17, 2023 • edited

jaronoff97 commented Nov 21, 2023

jaronoff97 commented Nov 21, 2023

jaronoff97 commented Nov 27, 2023

jaronoff97 commented Nov 17, 2023 •

edited