fleet-agent cleanup continually deleting releases #523

thedadams · 2021-09-08T00:36:13Z

The cleanup loop in fleet-agent isn't able to properly identify releases with bundles. For a given release, it looks for the bundle deployment associated to that release. If the release has a different name than fleet-agent expects (based on the bundle deployment) then fleet-agent deletes the release.

The problem is that then a release with the exact same name is created and the next time the cleanup loop runs, fleet-agent will delete the release again.

For example, a bundle deployment with name mcc-anupamafinalrcrke2-managed-system-upgrade creates a release with name mcc-anupamafinalrcrke2-managed-system-upgrade-c-407d2.

It seems that this function should look at status.release as well when trying to determine the name of the release from a bundle deployment.

The text was updated successfully, but these errors were encountered:

cbron · 2021-09-17T18:48:51Z

Haven't been able to repro, if anyone else see's this let us know.

0x4c6565 · 2022-06-15T14:03:08Z

For now, it would seem a workaround is possible with the method described in #795 (comment)

papanito · 2022-06-15T14:38:13Z

@0x4c6565 as far it concerns helm charts. For manifests, there is no such thing as release names as I understand

0x4c6565 · 2022-06-15T14:44:51Z

@papanito it does appear to also work for non-helm bundles (this results in a generated helm chart for the manifests)

lee:~$ helm -n cert-manager list
NAME                            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                                                                           APP VERSION
cert-manager                    cert-manager    2               2022-06-15 13:43:22.663655261 +0000 UTC deployed        cert-manager-v1.8.0                                                             v1.8.0   
cert-manager-crds               cert-manager    2               2022-06-15 13:43:12.378313113 +0000 UTC deployed        longredactedpath-cert-manager-crds-v0.0.0+git-5d4301a7b565

cert-manager-crds is just a directory of CRD manifests, however Fleet actually generates a Helm chart to deploy these (it would appear Fleet just uses Helm for everything to manage the lifecycle of resources)

.
├── crds
│   └── cert-manager.crds.yaml
└── fleet.yaml

fleet.yaml:

helm:
  releaseName: cert-manager-crds

erSitzt · 2022-07-11T21:01:51Z

This should really be fixed as it causes more problem down the line due to other controllers/operators starting to do stuff each time the resources get recreated... :(

snasovich · 2022-09-20T22:39:59Z

@mattfarina @manno , given that the fix made it into 0.4.0 release candidate, should the milestone be changed to 2.6.9 that includes Fleet 0.4.0?

mattfarina · 2022-09-21T16:29:26Z

@snasovich correct. I made the change.

MSpencer87 · 2022-10-12T23:41:46Z

Verified on Rancher v2.6.9-rc3 with fleet:100.1.0+up0.4.0-rc3

Reproduced on Rancher v2.6.3 with Fleet v0.3.8

Steps to reproduce:

Fork the fleet-examples repo
Bring up HA Rancher on v2.6.3, ensure the local cluster has Fleet v0.3.8
Create two downstream clusters (following Fleet p0)
Inside the forked fleet-examples repo, navigate to single-cluster and copy the helm folder, renaming it with very long name(54+ characters)
Remove releaseName: guestbook from line 10 of fleet.yaml
Push up the changes to the forked fleet-examples repo and navigate Continuous Delivery inside Rancher's main menu
Select Git Repos tab, select Add Repository
Enter a simple name for this test repo and enter the URL and branch of the forked fleet examples repo/branch
Enter the path of the copied/renamed folder inside the fleet-examples repo (ex: single-cluster/ms-example-name-verylong-longest-name-ever-example-test-bundle-with-excessive-characters-over-fiftyfour-to-trigger-failure-on-naming)
Select Create and ensure the bundle is created with a trimmed and appended version of the long folder name
View the fleet-agent logs for the deletion of the bunlde (24-60min) "level=info msg="Deleting unknown bundle ID..."

Steps for validation:

Utilizing the same fork of fleet-examples repo as above, Bring up HA Rancher on v2.6.9-rc, ensure the local cluster has Fleet v0.4.0
Create two downstream clusters (following Fleet p0)
Navigate to Continuous Delivery inside Rancher's main menu
Select Git Repos tab, select Add Repository
Enter a simple name for this test repo and enter the URL and branch of the forked fleet examples repo/branch
Enter the path of the copied/renamed folder inside the fleet-examples repo (ex: single-cluster/ms-example-name-verylong-longest-name-ever-example-test-bundle-with-excessive-characters-over-fiftyfour-to-trigger-failure-on-naming)
Select Next and then Create, ensure the bundle is created with a trimmed and appended version of the long folder name
Verify the fleet-agent logs do not reflect any deletion of bundles

snasovich added [zube]: To Triage labels Sep 8, 2021

snasovich added this to the v2.6.1 milestone Sep 8, 2021

snasovich assigned kinarashah Sep 8, 2021

snasovich added [zube]: Next Up and removed [zube]: To Triage labels Sep 8, 2021

anupama2501 mentioned this issue Sep 8, 2021

AWS rke2 node driver multi-nodes are stuck in provisioning in cluster management page with only etcd and control plane active in the explore cluster page rancher/rancher#34565

Closed

cbron added [zube]: Working and removed [zube]: Next Up labels Sep 16, 2021

cbron modified the milestones: v2.6.1, v2.6.2 Sep 17, 2021

kinarashah added [zube]: Next Up and removed [zube]: Working labels Sep 20, 2021

nickgerace modified the milestones: v2.6.2, v2.6.3 Oct 19, 2021

nickgerace removed the area/fleet label Nov 1, 2021

cbron removed the [zube]: Next Up label Nov 2, 2021

nickgerace added the [zube]: Release Candidates label Nov 10, 2021

nickgerace modified the milestones: v2.6.3, v2.6.4 Nov 11, 2021

nickgerace assigned StrongMonkey and nickgerace Nov 11, 2021

nickgerace added the free-for-all label Nov 11, 2021

nickgerace added kind/good-first-issue kind/bug and removed free-for-all labels Nov 22, 2021

deniseschannon added the team/area3 label Nov 25, 2021

nickgerace removed their assignment Nov 30, 2021

nickgerace removed the kind/good-first-issue label Dec 1, 2021

Jono-SUSE-Rancher modified the milestones: v2.6.x, v2.6.5 Apr 6, 2022

zube bot added [zube]: Team Area 3 and removed [zube]: Next Up labels Apr 7, 2022

MKlimuszka modified the milestones: v2.6.5, v2.6.6 Apr 7, 2022

zube bot modified the milestones: v2.6.6, v2.7 May 24, 2022

0x4c6565 mentioned this issue Jun 15, 2022

Fleet bundles get deleted if their name is generated #795

Open

zube bot removed the team/area3 label Jul 5, 2022

erSitzt mentioned this issue Jul 11, 2022

nginx ingress tcp-services frequent disconnects rancher/rke2#3144

Closed

olblak assigned manno and unassigned prachidamle Sep 7, 2022

manno mentioned this issue Sep 9, 2022

Fix long bundle names #964

Merged

manno closed this as completed in #964 Sep 14, 2022

zube bot added [zube]: Done and removed [zube]: Team Area 3 labels Sep 14, 2022

mattfarina modified the milestones: v2.6.9, v2.7.0 Sep 14, 2022

mattfarina modified the milestones: v2.7.0, v2.6.9 Sep 21, 2022

zube bot removed the [zube]: Done label Dec 14, 2022

manno mentioned this issue Jan 24, 2023

Long bundle names will still be cleaned up continuously #1272

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fleet-agent cleanup continually deleting releases #523

fleet-agent cleanup continually deleting releases #523

thedadams commented Sep 8, 2021

cbron commented Sep 17, 2021

0x4c6565 commented Jun 15, 2022 •

edited

Loading

papanito commented Jun 15, 2022

0x4c6565 commented Jun 15, 2022 •

edited

Loading

erSitzt commented Jul 11, 2022

snasovich commented Sep 20, 2022

mattfarina commented Sep 21, 2022

MSpencer87 commented Oct 12, 2022

fleet-agent cleanup continually deleting releases #523

fleet-agent cleanup continually deleting releases #523

Comments

thedadams commented Sep 8, 2021

cbron commented Sep 17, 2021

0x4c6565 commented Jun 15, 2022 • edited Loading

papanito commented Jun 15, 2022

0x4c6565 commented Jun 15, 2022 • edited Loading

erSitzt commented Jul 11, 2022

snasovich commented Sep 20, 2022

mattfarina commented Sep 21, 2022

MSpencer87 commented Oct 12, 2022

0x4c6565 commented Jun 15, 2022 •

edited

Loading

0x4c6565 commented Jun 15, 2022 •

edited

Loading