Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make the backup jobs request and limits configurable #143

Closed
megian opened this issue Nov 12, 2020 · 15 comments · Fixed by #175
Closed

Make the backup jobs request and limits configurable #143

megian opened this issue Nov 12, 2020 · 15 comments · Fixed by #175
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@megian
Copy link
Contributor

megian commented Nov 12, 2020

Summary

As K8up admin
I want to override the default resource request and limits of Pods generated by K8up
So that I can optimize resource usage or comply with cluster or namespace resource policies.

Context

On clusters with a default pod cpu/memory limit, backups jobs are currently limited to these default values, because there is no possibility to override or remove them from k8up.

Resource Limits
 Type                      Resource                 Min  Max  Default Request  Default Limit  Max Limit/Request Ratio
 ----                      --------                 ---  ---  ---------------  -------------  -----------------------
 Container                 cpu                      1m   -    100m             250m           -
 Container                 memory                   1Mi  -    128Mi            256Mi          -

This limits the use cases on such clusters heavily.

Out of Scope

  • Actually define global default resource limits or requests. Defining defaults in a non-major upgrade of K8up could potentially stop existing backups to work (crash with OOM, etc)

Further links

  • URLs of relevant Git repositories, PRs, Issues, etc.

Acceptance criteria

Given a K8up Schedule object with per-schedule-specified resources
When K8up schedules Jobs
Then the containers in Pods are scheduled with configured resource request and limits

Given a K8up Schedule object outside of cluster admin's responsibility
When K8up schedules Jobs
Then the containers in Pods are scheduled with configured global default resource request and limits
(in a multi-tenant cluster, customers can create schedules, while cluster-admins can provide global defaults in case customer doesn't define those)

Implementation Ideas

  • Global environment variables for defaults
  • The actual resource usage is heavily dependent on the amount of files that are backed up. So we should have the ability to set the limits by job and schedule, where values can be overwritten like for the S3 endpoints
    • The order of precedence could be: global defaults < schedule defaults < job type specifics (right overrides left)
@Kidswiss Kidswiss added enhancement New feature or request good first issue Good for newcomers labels Nov 27, 2020
@susana-garcia
Copy link
Contributor

I'll take this one

@susana-garcia susana-garcia self-assigned this Dec 1, 2020
@ccremer
Copy link
Contributor

ccremer commented Dec 1, 2020

I have refined this issue. @megian please review and add corrections or more acceptance criteria

@megian
Copy link
Contributor Author

megian commented Dec 1, 2020

Most is fine by me. I would just reword "I can optimize resource usage or comply with cluster resource policies" to "I can override the default request and limits of the namespace".

@ccremer
Copy link
Contributor

ccremer commented Dec 1, 2020

I have updated the summary
basically the "so that" clause should specify what we get out of it from a business perspective and less the technical reasons.

@megian
Copy link
Contributor Author

megian commented Dec 1, 2020

Well in this case this is very theoretic, because it's just a technical issue. K8up is just not usable without this under the cluster conditions. So we probably could write something like enable k8up run on multi tenant cluster allow a higher audience, but as long as I can override the resource and limits, I'm happy with the current wording as well.

@Kidswiss
Copy link
Contributor

Kidswiss commented Dec 2, 2020

Under implementation Ideas I see add more global environment variables. I don't agree with that, this should be something that schould be configurable by job/schedule IMHO. Because Resic's memory usage is very dependent on how many files it backs up. So a project with not many files to backup will need drastically less resources than a project with millions of files.

@ccremer
Copy link
Contributor

ccremer commented Dec 2, 2020

Yeah I wasn't sure about this. Please feel free to add your own ideas, you know K8up best :)

@Kidswiss
Copy link
Contributor

Kidswiss commented Dec 2, 2020

I amended the implementation ideas with my ideas

@susana-garcia
Copy link
Contributor

susana-garcia commented Dec 2, 2020

@Kidswiss I have a question related to this sentence “we should have the ability to set the limits by job and schedule, where values can be overwritten like for the S3 endpoints”, do you mean we need a environment variable for the backup job and another one for the schedule (also per request/limit)?

@Kidswiss
Copy link
Contributor

Kidswiss commented Dec 2, 2020

I would do the defaults with environment variables.

I wouldn't do the overrides with them though. But rather in the CRDs. Create a new type like k8upResourceLimits and include it in all the jobs and the schedule. Like the Backend type. It exists for each job type as well as the schedule. This way we can specify backends for schedules as well as for single jobs. That's what I mean with that sentence :)

@susana-garcia
Copy link
Contributor

@Kidswiss ok, all clear now, thank you :)

@ccremer
Copy link
Contributor

ccremer commented Dec 7, 2020

@megian The PR is now merged and should work. The new feature is currently available in quay.io/vshn/k8up:master image tag available

@megian
Copy link
Contributor Author

megian commented Dec 7, 2020

@ccremer When is an tagged release planned? I think as this is a production cluster, we don't want to set the deployment to a master or latest tag.

@ccremer
Copy link
Contributor

ccremer commented Dec 7, 2020

I don't have a clear date. There's a milestone for feature parity for the code that is currently released: https://github.com/vshn/k8up/milestone/1
However, I don't think we can finish the milestone in the next 2 weeks as-is, unless #129 is removed from that milestone, then we might make it.
@tobru WDYT?

@tobru
Copy link
Contributor

tobru commented Dec 8, 2020

I don't want to cut a release before we did more tests, so this most probably won't happen this year anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants