-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deduplicate Check and Prunes for the same backup repository #214
Comments
Some ideas to spin:
|
This ain't exactly easy. I started implementing, but soon discovered that
The graphic is in the PR for editing, though it's not intended to stay there in that form. |
One thought I just had: If we do this, should we provide a stable deduplication for this? Example: A new To guarantee the same interval regardless of operator restarts it would need to a way to know which schedule it should prefer. |
Could we sort them by date of creation? It could work something like this:
Now, if a
|
I had another idea over the weekend: We could hash the repository string and the type and use that as the randomness seed (https://golang.org/pkg/math/rand/#Seed). So each type and repo combination will generate the same "random" time. This way we only have to track if at least one of the jobs is registered for a given type/repo combo. By hashing the values before using as the seed it should generate enough spread for the schedules. |
It sounds like rand should not be used then, but rather a number should be deduced directly from the hash. For one, because there's the underlying assumption that the implementation of My main concern with this solution: It's anything but obvious to understand. I.e., it's a very implicit solution and in my experience with implicit solutions is that they are hard to understand right away for the next developer. |
Sure we can use something else to generate the times, But I feel like we'd have to get the randomness for the same types and repos down. Your suggestion could still lead to garbled execution times if there are a lot of namespace changes on a cluster. |
Unpopular opinion: The more we try to solve these "stable across restarts" problem, the more I'm convinced we should get rid of any internal states altogether. e.g. replace the cron library with K8s CronJobs etc. In a private project/operator I'm exactly at the same problem: handling scheduling and restarts. I have found a working solution, we can discuss it if you're further interested. At the moment I'm a bit hesitant to come up with complicated "solutions" that solve deduplication across restarts when using internal state. Maybe we should limit the deduplication feature to |
If we implement the deduplication logic fo I agree that switching to k8s native cron-jobs could help with things, but they may make other things more complicated. I also agree that off-loading as much state as possible to k8s should be desired, but there are cases where I think having a small in-memory state could make sense. For example to reduce the amount of API queries. I'm interested in hearing your solution for that issue |
With the switch to Operator SDK resp. controller-runtime, the client has built-in read-cache by default. Each GETted object lands in the cache and is automatically watched for changes. repeated GETs for already retrieved objects don't even land at the API server anymore. It's actually harder to ignore the cache for certain object So, as far as performance goes, I think it's worse when we try to maintain our own barely tested cache ;)
It depends whether we also implement for stable deduplication across restarts. If we decide to do it stable, we are accepting added complexity and reduced maintainability, whereas with ephemeral deduplication we can simplify deduplication at the cost of missing schedules as you described. |
My personal opinion is that missing schedules should be something that the k8up operator should avoid as much as possible. Nobody wants a backup solution that may or may not trigger a job. |
Hi, what's the current status of this issue in the latest version of k8up? Does k8up ensure that a |
Hi @smlx K8up doesn't yet deduplicate jobs to the same repository. However, there are already mechanisms in place that will prevent two exclusive jobs (like prune and check) from running at the same time. |
Summary
As K8up user
I want to deduplicate Jobs that target the same repository
So that exclusive Jobs are not run excessively
Context
Check and Prunes are Restic Jobs that need exclusive access to the backend repository: Only one job can effectively run at the same time. However, multiple backups can target the same Restic repository.
The operator should deduplicate prune jobs that are managed by a smart schedule. So for example if there are multiple schedules with
@daily-random
prunes to the same S3 endpoint the scheduler should only register one of these.But if the prunes have explicit cron patterns like
5 4 * * *
and5 5 * * *
they should NOT be deduplicated. This will ensure maximum flexibility if for some reason a user explicitly wants multiple prune runs.Out of Scope
Further links
Acceptance criteria
Schedules
with eitherCheck
orPrune
job, when the same randomized predefined cron syntax is specified that targets the same backup repository, then ignore the duplicated schedule of the same job type that has also the same schedule and backend.Schedules
with jobs that are already deduplicated, when changing the cron schedule of one of the jobs, then remove the deduplication and schedule both jobs separately.Schedules
with jobs that are already deduplicated, when changing the backend of aSchedule
, then remove the deduplication and schedule jobs separately.Implementation Ideas
The text was updated successfully, but these errors were encountered: