Clouddriver with Mysql unable to invalidate stale cache #5958

piyushGoyal2 · 2020-08-05T18:46:53Z

Issue Summary:

Clouddriver Mysql unable to invalidate stale cache.

Cloud Provider(s):

AWS, EKS, Kubernetes

Environment:

1.19.x - EKS -Clouddriver - Aurora Mysql 5.7.12

Feature Area:

Kubernetes CloudDriver, CloudDriver MySQL

Description:

My clouddriver is backed by Aurora Mysql - 5.7.12. Somehow Mysql is not able to actually invalidate stale cache. On infrastructure page, under cluster tab, even if the kuberntes resource(deployment, replicaset or pod) gets deleted either through the cluster tab or kubectl or a delete manifest stage, it still gets listed on the cluster tab ever and ever. Checking Mysql tables, the cats_pod table still contains the older pods entries with a ttlseconds=-1.

Steps to Reproduce:

Deploy a spinnaker managed kubernetes resource.
Delete it from cluster tab on infrastructure page.
The resource still exists.

Additional Details:

ajordens · 2020-08-06T00:07:16Z

Assigning to the Kubernetes SIG as it seems like it might be more related to how the cloud provider itself is caching.

piyushGoyal2 · 2020-08-06T22:59:02Z

@ajordens - Thanks. Yeah makes sense. Waiting for guys to debug on this along with them. Looks like a blocker for us right now to move clouddriver to mysql.

karlskewes · 2020-08-06T23:53:05Z

We have both ec2 and kubernetes accounts configured.
Only ec2 accounts appear to be targeted for cleanup.
We have removed a couple of Kubernetes accounts weeks ago but the items remain in the Infrastructure view. Clicking on them just shows a white side pane, no details.

Looking through our clouddriver logs we can only see cleanup running for ec2 accounts, as per below logs:

$ kubectl logs clouddriver-caching-54cbc7b8cd-455n8 | grep -i clean
2020-08-06 22:19:34.780  INFO 1 --- [ionAction-50992] .n.s.c.a.a.CleanupDetachedInstancesAgent : Looking for instances pending termination in <ec2-account-1>:ap-southeast-2
2020-08-06 22:19:34.829  INFO 1 --- [ionAction-50992] .n.s.c.a.a.CleanupDetachedInstancesAgent : Looking for instances pending termination in <ec2-account-2>:ap-southeast-2
2020-08-06 22:19:34.924  INFO 1 --- [ionAction-50992] .n.s.c.a.a.CleanupDetachedInstancesAgent : Looking for instances pending termination in <ec2-account-2>:ap-southeast-1
2020-08-06 22:19:35.066  INFO 1 --- [ionAction-50992] .n.s.c.a.a.CleanupDetachedInstancesAgent : Looking for instances pending termination in <ec2-account-2>:us-east-2
2020-08-06 22:19:35.297  INFO 1 --- [ionAction-50992] .n.s.c.a.a.CleanupDetachedInstancesAgent : Looking for instances pending termination in <ec2-account-3>:ap-southeast-2
2020-08-06 22:19:35.347  INFO 1 --- [ionAction-50992] .n.s.c.a.a.CleanupDetachedInstancesAgent : Looking for instances pending termination in <ec2-account-4>:ap-southeast-2
2020-08-06 22:21:15.773  INFO 1 --- [ionAction-50975] c.n.s.c.sql.SqlTaskCleanupAgent          : Cleaning up 3 completed tasks (82 states, 3 result objects)
2020-08-06 22:36:17.189  INFO 1 --- [ionAction-51001] c.n.s.c.sql.SqlTaskCleanupAgent          : Cleaning up 2 completed tasks (49 states, 2 result objects)

$ kubectl logs clouddriver-rw-68b9c8dd96-k497v | grep -i clean
2020-07-20 01:45:28.745  INFO 1 --- [           main] c.n.spinnaker.cats.sql.cache.SqlCache    : Configured for com.netflix.spinnaker.clouddriver.aws.provider.AwsCleanupProvider

Looking at the original issue #4803 and corresponding PR spinnaker/clouddriver#4232 that added the cleanup agent I see there is a flag which was commented in the PR:

This functionality defaults to disabled. It can be enabled with sql.unknown-agent-cleanup-agent.enabled=true

https://github.com/spinnaker/clouddriver/blob/master/cats/cats-sql/src/main/kotlin/com/netflix/spinnaker/config/SqlCacheConfiguration.kt#L168

Do we need to enable this for Kubernetes? is it an 'unknown-agent' ??

piyushGoyal2 · 2020-08-24T23:01:55Z

@ezimanyi : Hi Eric. Any updates on this? Would love to hear your stance on same.

karlskewes · 2020-09-10T01:38:59Z

Per RZ's guidance in Slack we enabled sql.unknown-agent-cleanup-agent.enabled=true for clouddriver-caching and all of the old Kubernetes application replicas (Spinnaker clusters) and accounts were cleaned up. 🎉

As always with Databases, suggest taking a snapshot first. Here's a validating query (I think).

mysql> SELECT COUNT(*) FROM cats_v1_clusters;
+----------+
| COUNT(*) |
+----------+
|     500 |  << was ~1000
+----------+
1 row in set (0.00 sec)

I'll work on a PR to the Clouddriver SQL docs but ideally as suggested this could be enabled automatically if using SQL and Kubernetes.

ezimanyi · 2020-09-17T22:13:30Z

@robzienert : Any thoughts on whether it would be safe to set sql.unknown-agent-cleanup-agent.enabled=true by default? I remember discussing disabling it by default when you added that in spinnaker/clouddriver#4232 but forget if that was mostly for safety reasons as part of the rollout of the change, or if that was intended to stay off by default always.

I think it should be enabled by default for kubernetes users (as suggested by @kskewes above), but it feels like a strange coupling to have the default here depend on which cloud providers are enabled so thought it might be worth just enabling by default for everyone.

piyushGoyal2 · 2020-09-20T22:45:02Z

@ezimanyi and @kskewes - I believe apart from older accounts cleanup, the main reason for opening up issue was if the cluster state changes, then the infrastructure tab doesn't change real time along with cluster giving a stale view of cluster.

Let me cross verify the configuration which @kskewes suggested and will update the issue.

ezimanyi · 2020-09-22T20:08:08Z

@piyushGoyal2 : If you're seeing the cluster tab fail to update to account for changes, the root cause is likely that your caching cycles are not completing quickly enough. There is ongoing performance improvement work to make this less likely, and a discussion of workarounds on the closed issue #5611.

spinnakerbot · 2020-11-06T20:10:19Z

This issue hasn't been updated in 45 days, so we are tagging it as 'stale'. If you want to remove this label, comment:

@spinnakerbot remove-label stale

spinnakerbot · 2020-12-21T20:15:21Z

This issue is tagged as 'stale' and hasn't been updated in 45 days, so we are tagging it as 'to-be-closed'. It will be closed in 45 days unless updates are made. If you want to remove this label, comment:

@spinnakerbot remove-label to-be-closed

spinnakerbot · 2021-02-23T01:25:21Z

This issue is tagged as 'stale' and hasn't been updated in 45 days, so we are tagging it as 'to-be-closed'. It will be closed in 45 days unless updates are made. If you want to remove this label, comment:

@spinnakerbot remove-label to-be-closed

spinnakerbot · 2021-04-09T01:30:22Z

This issue is tagged as 'to-be-closed' and hasn't been updated in 45 days, so we are closing it. You can always reopen this issue if needed.

ajordens added the sig/kubernetes label Aug 6, 2020

ajordens added the component/clouddriver label Aug 6, 2020

ezimanyi self-assigned this Aug 7, 2020

karlskewes mentioned this issue Sep 10, 2020

docs(clouddriver): Add SQL cleanup agent config for Kubernetes spinnaker/spinnaker.github.io#2022

Merged

ezimanyi removed their assignment Sep 22, 2020

spinnakerbot added the stale label Nov 6, 2020

spinnakerbot added the to-be-closed label Dec 21, 2020

github-actions bot removed the to-be-closed label Jan 9, 2021

spinnakerbot added the to-be-closed label Feb 23, 2021

spinnakerbot closed this as completed Apr 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clouddriver with Mysql unable to invalidate stale cache #5958

Clouddriver with Mysql unable to invalidate stale cache #5958

piyushGoyal2 commented Aug 5, 2020

ajordens commented Aug 6, 2020

piyushGoyal2 commented Aug 6, 2020

karlskewes commented Aug 6, 2020 •

edited

piyushGoyal2 commented Aug 24, 2020 •

edited

karlskewes commented Sep 10, 2020 •

edited

ezimanyi commented Sep 17, 2020

piyushGoyal2 commented Sep 20, 2020

ezimanyi commented Sep 22, 2020

spinnakerbot commented Nov 6, 2020

spinnakerbot commented Dec 21, 2020

spinnakerbot commented Feb 23, 2021

spinnakerbot commented Apr 9, 2021

Clouddriver with Mysql unable to invalidate stale cache #5958

Clouddriver with Mysql unable to invalidate stale cache #5958

Comments

piyushGoyal2 commented Aug 5, 2020

Issue Summary:

Cloud Provider(s):

Environment:

Feature Area:

Description:

Steps to Reproduce:

Additional Details:

ajordens commented Aug 6, 2020

piyushGoyal2 commented Aug 6, 2020

karlskewes commented Aug 6, 2020 • edited

piyushGoyal2 commented Aug 24, 2020 • edited

karlskewes commented Sep 10, 2020 • edited

ezimanyi commented Sep 17, 2020

piyushGoyal2 commented Sep 20, 2020

ezimanyi commented Sep 22, 2020

spinnakerbot commented Nov 6, 2020

spinnakerbot commented Dec 21, 2020

spinnakerbot commented Feb 23, 2021

spinnakerbot commented Apr 9, 2021

karlskewes commented Aug 6, 2020 •

edited

piyushGoyal2 commented Aug 24, 2020 •

edited

karlskewes commented Sep 10, 2020 •

edited