Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDB Reaper #52

Merged
merged 10 commits into from
Jul 16, 2021
Merged

PDB Reaper #52

merged 10 commits into from
Jul 16, 2021

Conversation

eytan-avisror
Copy link
Collaborator

@eytan-avisror eytan-avisror commented Jul 13, 2021

TBD:

  • Documentation
  • Event Publication

@eytan-avisror eytan-avisror requested a review from a team as a code owner July 13, 2021 18:37
ReapMisconfigured bool
ReapMultiple bool
ReapCrashLoop bool
ReapablePodDisruptionBudgets []policyv1beta1.PodDisruptionBudget

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're copying a lot of the same data into multiple maps/slices. It would be more efficient if you stored here just the namespace & name in this ReapablePodDisruptionBudgets slice as that's all that's required for the Delete call or use pointers *policyv1beta1.PodDisruptionBudget

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate?
Data is copied a single time when it's added to the slice of PDBs when it's detected

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears you store the full object policyv1beta1.PodDisruptionBudget in multiple places, maybe in a map you have the full object then you append it to the slice of ReapablePodDisruptionBudgets, instead of copying by value multiple places you can use refs or maybe have a slice of namespace/name of pdb b/c that slice is only used for a delete call where namespace and name are required only.

Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com>
@@ -14,6 +14,10 @@ Two common problems observed in large Kubernetes clusters are:
**node-reaper** provides the capability for worker nodes to be force terminated so that replacement ones come up.
**pod-reaper** does a force termination of pods stuck in Terminating state for a certain amount of time.

In some cases, on multi-tenant platforms, users can own PDBs which block node rotation/drain if they are misconfigured, pods are in crashloop backoff, or if multiple PDBs are targeting the same pods.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misbehaving worker nodes are okay to be terminated because they are brought back by various autoscaling mechanisms. Misbehaving pods are okay to be terminated because various higher level objects (Deployments, replicasets, etc.) bring back the terminated pods. Nodes and pods can be considered ephemeral in that sense.

However, this is not true with PDBs. If PDBs are deleted, users will have to find ways of bringing them back. It might be okay if they use GitOPS but in non-gitops envs, objects getting deleted could be problematic.

Not sure what's a good answer for preventing misbehaving PDBs from stopping node drains though. If there was a way to temporarily disable these PDBs during upgrades, that would've been ideal.

Copy link
Collaborator Author

@eytan-avisror eytan-avisror Jul 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I added a comment on the above in the individual readme

Since pdb-reaper will not recreate the PDBs it deletes, deletion is 
particularly useful in cases where GitOps is used, which can re-create the PDBs at a later time.

Copy link
Collaborator Author

@eytan-avisror eytan-avisror Jul 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users who just want visibility can run with --dry-run and avoid deletion (which will publish events as well)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a future enhancement would be to save the PDB and then restore it later, presumably after the node was successfully drained? But then pdbreaper would need to be triggered when a node is about to be drained so the 'bad' PDBs can be temporarily deleted, like maybe triggered by upgrade-manager or lifecycle-manager. This is essentially what is recommended by this GCP Anthos link.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds a lot of complexity, you would have to track node drains and figure out the right time to recreate - this is easy when doing this manually (such as the doc you linked), but doing it programmatically is much more complex ( complexity which we might not want to have).

Copy link
Member

@tekenstam tekenstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. We might consider an enhancement to only delete "misconfigured" PDBs after some number of runs of pdbreaper.

pkg/reaper/pdbreaper/pdbreaper.go Show resolved Hide resolved
eytan-avisror and others added 2 commits July 13, 2021 14:16
Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com>
@codecov
Copy link

codecov bot commented Jul 13, 2021

Codecov Report

Merging #52 (4722fd6) into master (0b5d58b) will increase coverage by 2.39%.
The diff coverage is 67.80%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #52      +/-   ##
==========================================
+ Coverage   55.42%   57.81%   +2.39%     
==========================================
  Files           4        7       +3     
  Lines         821     1183     +362     
==========================================
+ Hits          455      684     +229     
- Misses        317      416      +99     
- Partials       49       83      +34     
Impacted Files Coverage Δ
pkg/reaper/nodereaper/types.go 100.00% <ø> (ø)
pkg/reaper/pdbreaper/types.go 27.58% <27.58%> (ø)
pkg/reaper/nodereaper/nodereaper.go 45.49% <59.67%> (-1.31%) ⬇️
pkg/reaper/pdbreaper/pdbreaper.go 71.14% <71.14%> (ø)
pkg/reaper/nodereaper/helpers.go 60.88% <78.94%> (+2.34%) ⬆️
pkg/reaper/podreaper/podreaper.go 75.59% <80.00%> (-1.07%) ⬇️
pkg/reaper/podreaper/types.go 100.00% <100.00%> (ø)
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f09bec8...4722fd6. Read the comment docs.

Eytan Avisror added 4 commits July 13, 2021 17:07
Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com>
Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com>
Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com>
Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com>
@eytan-avisror eytan-avisror merged commit 740c606 into keikoproj:master Jul 16, 2021
shaoxt pushed a commit to shaoxt/governor that referenced this pull request Dec 21, 2021
* initial impl.

* add unit tests

* Documentation

* event publication

Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com>

* Update unit-test.yaml

Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com>

* Add disclaimer for production + extra flags

Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com>

* minor fixes

Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com>

* Update pdbreaper.go

Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com>

* exclude namespaces

Signed-off-by: Eytan Avisror <eytan_avisror@intuit.com>
Signed-off-by: shaoxt <shaoxt@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants