Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(kubernetes): Pruning #131

Open
wants to merge 11 commits into
base: master
from
Open

feat(kubernetes): Pruning #131

wants to merge 11 commits into from

Conversation

@sh0rez
Copy link
Member

sh0rez commented Dec 11, 2019

This PR is another take on implementing a garbage collection of resources, that are no longer present in the Jsonnet code.

@malcolmholmes already experimented with using a label and removing resources (#117) and kubectl apply --prune (#123). Both attempts had their shortcomings (performance, reliability).

Just out of interest I tried what I could come up with and so far I think this is robust enough to be used in production:

  1. we label each resource with tanka.dev/environment: <relative-path-from-project-root>
  2. on apply / diff we get all most common resources (the same list kubectl apply --prune uses) and check if there is anything labeled in there we don't know about
  3. These will be marked as being deleted in diff
  4. On apply, they are deleted.
  5. This behavior defaults to enabled, as it ensures that the Jsonnet is declarative
  6. It can be disabled by using --prune=false, labeling can be disabled using spec.injectLabels.environment.

Can I get some opinions on this @rfratto @gouthamve @tomwilkie?

Co-authored-by: Malcolm Holmes mdh@odoko.co.uk

@sh0rez sh0rez requested a review from rfratto Dec 11, 2019
@sh0rez sh0rez self-assigned this Dec 11, 2019
@sh0rez sh0rez changed the title feat(kubernetes): Pruning WIP feat(kubernetes): Pruning Dec 16, 2019
@sh0rez

This comment has been minimized.

Copy link
Member Author

sh0rez commented Dec 16, 2019

Todo

  • Make it respect --target: Currently pruning will always happen, there is no chance to say what to prune
  • Make it opt in: When enabled by default, it will cause trouble with users that work on branches that are not frequently rebased onto master. Instead, it is the better workflow to just run tk apply --prune once in a while and you are good to go.
@sh0rez sh0rez added the keepalive label Jan 2, 2020
sh0rez added 7 commits Dec 5, 2019
Before, the name of the internally used Environment type was different depending
on the argument passed on the command line, even though it resulted in the same
environment after all (relative vs absolute paths, etc.)

This fixes the issue by using the `jpath` package to compute baseDir and
rootDir, using the relative path from rootDir to baseDir as the name. This
results in stable names, regardless which directory representation is used.
The environment's name (relative path from rootDir) is now added into each
created resource as a label, to be able to easily identify objects by their
origin.

Furthermore, the Kubernetes type now includes the entire Environment (had only
the spec before), to be able to access the env's name in that package as well
Allows to query all known apiResources from the server
Adds support for removing orphaned objects from the cluster. They can be viewed
in diff and removed in apply
Pruning defaults to enabled, but can be opted out using `--prune=false`
Broken in upstream, nothing we can do anything about
@sh0rez sh0rez force-pushed the prune branch from 57040f6 to bcc8991 Jan 6, 2020
sh0rez added 4 commits Dec 11, 2019
All of our current commands take environments, only the import analysis works on
plain files ... this has no benefit, it actually makes the implementation
harder, as we cannot use stable paths there.

Switches to environments as the target.
making it opt out can result in dangerous moments when the local sources are not
recent enough.

As this could kill entire clusters, we are going to make it opt in
@sh0rez sh0rez changed the title WIP feat(kubernetes): Pruning feat(kubernetes): Pruning Jan 6, 2020
@sh0rez

This comment has been minimized.

Copy link
Member Author

sh0rez commented Jan 6, 2020

@rfratto @malcolmholmes I guess this is ready for another round of review.

  • Made it opt in
  • Made it respect targets

I think it is safe for production usage now. WDYT?

@sh0rez sh0rez mentioned this pull request Jan 11, 2020
@anguslees

This comment has been minimized.

Copy link

anguslees commented Jan 11, 2020

How will CRDs work in this?
Does it work with manifests that include multiple namespaces? (I didn't read deeply enough)

(Basically: Why not just duplicate the logic used in kubecfg and descendents (ksonnet, qbec, etc)?)

@malcolmholmes

This comment has been minimized.

Copy link
Collaborator

malcolmholmes commented Jan 12, 2020

I don't think CRDs will work with this particularly well. The Kubernetes API does not provide a "list all resources" API (well it does, but the 'all' that it lists is just a subset of resources). There is kubectl api-resources but that gives you a HUGE list, and you will have to check each and every one one at a time, which isn't too efficient. Maybe it is possible to support CRDs, and maybe @sh0rez 's latest code will do it fine, but you are right to ask your question - it is a tough thing.

@sh0rez

This comment has been minimized.

Copy link
Member Author

sh0rez commented Jan 12, 2020

I don't think CRDs will work with this particularly well. The Kubernetes API does not provide a "list all resources" API (well it does, but the 'all' that it lists is just a subset of resources). There is kubectl api-resources but that gives you a HUGE list, and you will have to check each and every one one at a time, which isn't too efficient. Maybe it is possible to support CRDs, and maybe @sh0rez 's latest code will do it fine, but you are right to ask your question - it is a tough thing.

Kind of. While implementing a similar feature in kubectl, it became an obvious limitation that we cannot check all kinds at once ... this is a design thing of the API and cannot be fixed easily. Instead, kubectl uses a whitelist of the most common objects – which obviously won't include CRD's.

Tanka uses the same list, but also supports checking the entire list, but this will be much slower.

Of course we could add a flag that allows you to extend the whitelist on the CLI.

So @anguslees to answer the question:

  • it handles CRD's just fine, as long as they are in the whitelist
  • they are not in the whitelist by default but you will be able to add them on the CLI (or we could even make this possible from Jsonnet directly)
@malcolmholmes

This comment has been minimized.

Copy link
Collaborator

malcolmholmes commented Jan 12, 2020

presumably you could also add the whitelist in spec.json? Then, if there are specific resource types that aren't in the default list, you know they will ALWAYS be checked for the relevant environments?

@sh0rez

This comment has been minimized.

Copy link
Member Author

sh0rez commented Jan 12, 2020

presumably you could also add the whitelist in spec.json? Then, if there are specific resource types that aren't in the default list, you know they will ALWAYS be checked for the relevant environments?

I think it is the wrong place, because such a thing applies per library (libraries introduce CRD's after all) and not per environment. Otherwise, you would need to track the whitelist across all JSON files, which leads to config drift, which is the entire goal of Tanka to prevent this.

Thus said, we could instead introduce a special tanka.dev object in the Jsonnet output, which everyone can write to:

{
  "tanka.dev": {
    "additionalPruneKinds": [ ... ],
  }
}

This also means the library that introduces a CRD can specify it directly, users won't even notice it

@malcolmholmes

This comment has been minimized.

Copy link
Collaborator

malcolmholmes commented Jan 12, 2020

That could work too. It would be ignored as regarding kubernetes resources, which is good.

I would suggest:

{
  "tanka.dev": {
    "prune": {
      "additionalKinds": [ ... ],
    },
  }
}

Or better still:

{
  "tanka.dev": {
    "prune": {
      "kinds"+: [ ... ],
    },
  }
}

"additional" implied by jsonnet construct.

@sh0rez

This comment has been minimized.

Copy link
Member Author

sh0rez commented Jan 12, 2020

That could work too. It would be ignored as regarding kubernetes resources, which is good.

I would suggest:

{
  "tanka.dev": {
    "prune": {
      "additionalKinds": [ ... ],
    },
  }
}

Or better still:

{
  "tanka.dev": {
    "prune": {
      "kinds"+: [ ... ],
    },
  }
}

"additional" implied by jsonnet construct.

Like that! Depends on whether we can preseed the initial Jsonnet object though ... @sbarzowski is this possible? And are there any drawbacks?

@anguslees

This comment has been minimized.

Copy link

anguslees commented Jan 19, 2020

(Basically: Why not just duplicate the logic used in kubecfg and descendents (ksonnet, qbec, etc)?)

Repeating ^. I think the authors/reviewers of this PR should be aware that kubecfg/ksonnet/qbec follow an algorithm different to kubectl/helm, and I think it addresses a number of weaknesses in the kubectl-prune approach (disclaimer: I wrote the initial version in kubecfg). In particular they walk all the GroupKinds (using api-versions introspection) and work just fine with CRDs since they contain no hard-coded list.

Happy to swap notes further if it's useful - mostly just wanted to ensure you're choosing the proposed whitelist-approach with knowledge of alternatives.

@sh0rez

This comment has been minimized.

Copy link
Member Author

sh0rez commented Jan 20, 2020

@anguslees To be honest, we did not consider other approaches in the ecosystem when designing this ... we took kubectl as a reference.

Could you please share some details on the approach you suggest?

  • does it allow to avoid having to maintain a whitelist?
  • how does it look performance wise? how long does it take to compute the pruned objects?
  • what about edge cases? for example the current one does not handle api groups (apps/v1 vs extensions/v1beta1) very well
  • how well does it fit the Tanka codebase? For example, we are not using client-go (instead kubectl) and we are not using any of the apimachinery (though we can change if it is beneficial).

We are not fixed to what is done in this PR, if there are better ways to address the issue I am happy to abandon this one and use something else :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.