Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add auto-reconciliation of missing/deleted resources #1551

Closed
4 tasks done
manno opened this issue May 23, 2023 · 6 comments
Closed
4 tasks done

Add auto-reconciliation of missing/deleted resources #1551

manno opened this issue May 23, 2023 · 6 comments

Comments

@manno
Copy link
Member

manno commented May 23, 2023

Fleet restores deleted resources and reset changed fields to their intended state. This happens without reaching out to the upstream cluster.

  • add tests around modification detection/correction
  • add a backoff
  • implement PoC code in fleet-agent
  • status field should show the bundledeployment is reconciling (re-use existing status)

implements #163

@manno
Copy link
Member Author

manno commented May 23, 2023

discussed with UI/UX

@manno manno changed the title [Epic] Add auto-reconciliation of missing/deleted resources Add auto-reconciliation of missing/deleted resources Jun 5, 2023
@raulcabello raulcabello self-assigned this Jun 6, 2023
@raulcabello
Copy link
Contributor

QA Template

Problem

We want to undo any external changes done to any resource outside of Fleet

Solution

Do a helm rollback when drift is detected. By default it will use the three-way strategic merge. It can be changed to overwrite all resources by setting force to true.
If a rollback fails it will be removed from the helm history unless keepFailHistory is set to true

Testing

  • Verify drift is not corrected if correctDrift.enabled is false or if it is not present.
  • Verify drift is corrected if enabled. Manually modify a resource deployed by fleet, and changes should be rolled back. Check helm history and you should see a helm rollback.
  • Some changes cannot be rollback without force as it uses the three-way strategic merge. Deploy a service and manually change a port item in the port array (spec.ports[0].port). It fails without force, and it is successful with force.
  • Rollback failures should not be in the helm history unless correctDrift.keepFailHistory is set to true

Additional info

Try with different k8s resources. There are some scenarios with immutable resources that will not work, this is a helm limitation. For example modifying a Pod container item will not work without force because of the three way strategic merge, and it will not work with force as it is immutable therefore PUT will fail too. We don't expect users creating Pods directly as Pods are usually managed by another controller (e.g. Deployment, StatefulSet, Job,..)

@sbulage
Copy link

sbulage commented Jun 23, 2023

As per tasks in description, still any other work is remaining for this card ?

@raulcabello
Copy link
Contributor

@sbulage there is no work remaining. It is ready for QA review

@sbulage
Copy link

sbulage commented Aug 30, 2023

QA TEST PLAN

Scenarios

Scenario Test Case
1 Test resources created via GitRepo are auto-reconciled after it altered manually.
2 Test existing resources are auto-reconciled when adding correctDrift.enabled value to true in existingGitRepo.
3 Test on upgrade rancher resources are auto-reconciled added via GitRepo.
4 Test k8s immutable resources are not auto-reconciled when correctDrift.enabled value set to true.
5 Test service modification is not auto-reconciled until it is GitRepo forcefully updated.

@sbulage
Copy link

sbulage commented Aug 30, 2023

TEST RESULT

Scenarios

Scenario Test Case Result
1 Test resources created via GitRepo are auto-reconciled after it altered manually.
2 Test existing resources are auto-reconciled when adding correctDrift.enabled value to true in existingGitRepo.
3 Test on upgrade rancher resources are auto-reconciled added via GitRepo.
4 Test k8s immutable resources are not auto-reconciled when correctDrift.enabled value set to true.
5 Test service modification is not auto-reconciled until it is GitRepo forcefully updated.

REPRO STEPS

Scenario 1

  1. Created a GitRepo with Enable Self-Healing checked in the UI.
  2. Checked that GitRepo and resources are created.
  3. Go to the Cluster --> Workloads --> Deployments.
  4. Edit the replicaset of the Deployment from 1 to 5.
  5. Check that pod count increased from 1 to 5 and later it goes down to the 1.
  6. Verified that resources are auto-reconciled i.e. restored to it's original value after updating it manually without forcefully updating GitRepo.

Scenario 2

  1. Created a GitRepo without enabling self-healing.
  2. Checked that GitRepo and resources are created.
  3. Update the created GitRepo, Enable Self-Healing via YAML by adding correctDrift.enabled value to true
  4. Go to the Cluster --> Workloads --> Deployments.
  5. Edit the replicaset of the Deployment from 1 to 5.
  6. Check that pod count increased from 1 to 5 and later it goes down to the 1.
  7. Verified that resources are auto-reconciled i.e. restored to it's original value after updating it manually without forcefully updating GitRepo.

Scenario 3

  1. Install latest version of Rancher version.
  2. Created a GitRepo and checked that resources are created.
  3. Upgrade performed from Rancher 2.7.5 to Rancher 2.7.7-rc3.
  4. Update the created GitRepo, Enable Self-Healing via YAML by adding correctDrift.enabled value to true
  5. Go to the Cluster --> Workloads --> Deployments.
  6. Edit the replicaset of the Deployment from 1 to 5.
  7. Check that pod count increased from 1 to 5 and later it goes down to the 1.
  8. Verified that resources are auto-reconciled i.e. restored to it's original value after updating it manually without forcefully updating GitRepo.

Scenario 4

  1. Created a GitRepo with Enable Self-Healing checked in the UI.
  2. Checked that GitRepo and resources are created.
  3. Go to the Cluster --> Storage --> Config Maps or Secrets.
  4. Add/edit the data in ConfigMap, Secrets or other immutable resources etc.
  5. After sometime it didn't revert back to the old image.
  6. Verified that some kubernetes immutable resources are not auto-reconciled without forcefully updating GitRepo.

Scenario 5

  1. Created a GitRepo with Enable Self-Healing checked in the UI.
  2. Checked that GitRepo and resources are created.
  3. Go to the Cluster --> Service Discovery.
  4. Edit the image version in the service and wait for it get reflected.
  5. After sometime it didn't revert back to the old image.
  6. Verified that service is not auto-reconciled without forcefully updating GitRepo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

No branches or pull requests

4 participants