Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fleet-agent cleanup continually deleting releases #523

Closed
thedadams opened this issue Sep 8, 2021 · 18 comments · Fixed by #964
Closed

fleet-agent cleanup continually deleting releases #523

thedadams opened this issue Sep 8, 2021 · 18 comments · Fixed by #964
Assignees
Labels
Milestone

Comments

@thedadams
Copy link

The cleanup loop in fleet-agent isn't able to properly identify releases with bundles. For a given release, it looks for the bundle deployment associated to that release. If the release has a different name than fleet-agent expects (based on the bundle deployment) then fleet-agent deletes the release.

The problem is that then a release with the exact same name is created and the next time the cleanup loop runs, fleet-agent will delete the release again.

For example, a bundle deployment with name mcc-anupamafinalrcrke2-managed-system-upgrade creates a release with name mcc-anupamafinalrcrke2-managed-system-upgrade-c-407d2.

It seems that this function should look at status.release as well when trying to determine the name of the release from a bundle deployment.

@cbron
Copy link
Contributor

cbron commented Sep 17, 2021

Haven't been able to repro, if anyone else see's this let us know.

@0x4c6565
Copy link

0x4c6565 commented Jun 15, 2022

For now, it would seem a workaround is possible with the method described in #795 (comment)

@papanito
Copy link
Contributor

@0x4c6565 as far it concerns helm charts. For manifests, there is no such thing as release names as I understand

@0x4c6565
Copy link

0x4c6565 commented Jun 15, 2022

@papanito it does appear to also work for non-helm bundles (this results in a generated helm chart for the manifests)

lee:~$ helm -n cert-manager list
NAME                            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                                                                           APP VERSION
cert-manager                    cert-manager    2               2022-06-15 13:43:22.663655261 +0000 UTC deployed        cert-manager-v1.8.0                                                             v1.8.0   
cert-manager-crds               cert-manager    2               2022-06-15 13:43:12.378313113 +0000 UTC deployed        longredactedpath-cert-manager-crds-v0.0.0+git-5d4301a7b565

cert-manager-crds is just a directory of CRD manifests, however Fleet actually generates a Helm chart to deploy these (it would appear Fleet just uses Helm for everything to manage the lifecycle of resources)

.
├── crds
│   └── cert-manager.crds.yaml
└── fleet.yaml

fleet.yaml:

helm:
  releaseName: cert-manager-crds

@erSitzt
Copy link

erSitzt commented Jul 11, 2022

This should really be fixed as it causes more problem down the line due to other controllers/operators starting to do stuff each time the resources get recreated... :(

@olblak olblak assigned manno and unassigned prachidamle Sep 7, 2022
@mattfarina mattfarina modified the milestones: v2.6.9, v2.7.0 Sep 14, 2022
@snasovich
Copy link

@mattfarina @manno , given that the fix made it into 0.4.0 release candidate, should the milestone be changed to 2.6.9 that includes Fleet 0.4.0?

@mattfarina mattfarina modified the milestones: v2.7.0, v2.6.9 Sep 21, 2022
@mattfarina
Copy link
Collaborator

@snasovich correct. I made the change.

@MSpencer87
Copy link

Verified on Rancher v2.6.9-rc3 with fleet:100.1.0+up0.4.0-rc3

Reproduced on Rancher v2.6.3 with Fleet v0.3.8

Steps to reproduce:

  1. Fork the fleet-examples repo
  2. Bring up HA Rancher on v2.6.3, ensure the local cluster has Fleet v0.3.8
  3. Create two downstream clusters (following Fleet p0)
  4. Inside the forked fleet-examples repo, navigate to single-cluster and copy the helm folder, renaming it with very long name(54+ characters)
  5. Remove releaseName: guestbook from line 10 of fleet.yaml
  6. Push up the changes to the forked fleet-examples repo and navigate Continuous Delivery inside Rancher's main menu
  7. Select Git Repos tab, select Add Repository
  8. Enter a simple name for this test repo and enter the URL and branch of the forked fleet examples repo/branch
  9. Enter the path of the copied/renamed folder inside the fleet-examples repo (ex: single-cluster/ms-example-name-verylong-longest-name-ever-example-test-bundle-with-excessive-characters-over-fiftyfour-to-trigger-failure-on-naming)
  10. Select Create and ensure the bundle is created with a trimmed and appended version of the long folder name
  11. View the fleet-agent logs for the deletion of the bunlde (24-60min) "level=info msg="Deleting unknown bundle ID..."

Steps for validation:

  1. Utilizing the same fork of fleet-examples repo as above, Bring up HA Rancher on v2.6.9-rc, ensure the local cluster has Fleet v0.4.0
  2. Create two downstream clusters (following Fleet p0)
  3. Navigate to Continuous Delivery inside Rancher's main menu
  4. Select Git Repos tab, select Add Repository
  5. Enter a simple name for this test repo and enter the URL and branch of the forked fleet examples repo/branch
  6. Enter the path of the copied/renamed folder inside the fleet-examples repo (ex: single-cluster/ms-example-name-verylong-longest-name-ever-example-test-bundle-with-excessive-characters-over-fiftyfour-to-trigger-failure-on-naming)
  7. Select Next and then Create, ensure the bundle is created with a trimmed and appended version of the long folder name
  8. Verify the fleet-agent logs do not reflect any deletion of bundles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.