Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean histories for successful and failed jobs separately #462

Merged
merged 6 commits into from
Jul 28, 2021

Conversation

bastjan
Copy link
Contributor

@bastjan bastjan commented Jul 19, 2021

Summary

This PR allows cleaning histories for successful and failed jobs separately.

It introduces two new fields to the job CRDs .spec.successfulJobsHistoryLimit and .spec.failedJobsHistoryLimit. Those fields override the now deprecated .spec.KeepJobs.

It fixes a bug where most job cleanups would result in always cleaning up the last object, due to the misuse of a pointer in a for range loop:

https://github.com/vshn/k8up/blob/df7435b28e9292ef88c582bce9aa64bb69185168/executor/check.go#L115-L117

Fixes #322.

Checklist

  • Keep pull requests small so they can be easily reviewed. Sadly no 馃檲
  • Categorize the PR by setting a good title and adding one of the labels:
    bug, enhancement, documentation, change, breaking,
    as they show up in the changelog
  • Update the documentation.
  • Update tests.
  • Link this PR to related issues.

@bastjan bastjan force-pushed the improve-monitoring branch 14 times, most recently from e3670ac to c6daaf4 Compare July 22, 2021 06:18
@bastjan bastjan added the enhancement New feature or request label Jul 22, 2021
@bastjan bastjan force-pushed the improve-monitoring branch 2 times, most recently from f6eb673 to b75fa12 Compare July 22, 2021 15:13
@bastjan bastjan changed the title Monitoring improvements Clean histories for successful and failed jobs separately Jul 22, 2021
@bastjan bastjan marked this pull request as ready for review July 22, 2021 16:06
@bastjan bastjan requested review from ccremer and cimnine July 22, 2021 16:07
Required on arm machines with emulation capabilities (M1) to override arch.
Copy link
Contributor

@ccremer ccremer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a first review and I'm not through yet. Overall good, but I don't see much changes in the documentation. Could you double check whether the new variables and behaviour is updated in the docs?

api/v1alpha1/backup_types.go Outdated Show resolved Hide resolved
api/v1alpha1/history_limits_test.go Outdated Show resolved Hide resolved
api/v1alpha1/history_limits_test.go Outdated Show resolved Hide resolved
api/v1alpha1/history_limits_test.go Outdated Show resolved Hide resolved
api/v1alpha1/object_list_test.go Outdated Show resolved Hide resolved
executor/cleaner/cleaner_test.go Outdated Show resolved Hide resolved
executor/generic.go Outdated Show resolved Hide resolved
@bastjan
Copy link
Contributor Author

bastjan commented Jul 26, 2021

This is a first review and I'm not through yet. Overall good, but I don't see much changes in the documentation. Could you double check whether the new variables and behaviour is updated in the docs?

Thanks for the review 馃槉

The documentation is auto generated here: docs/modules/ROOT/pages/references/api-reference.adoc. Any other place needed? 馃槉

@ccremer
Copy link
Contributor

ccremer commented Jul 26, 2021

The documentation is auto generated here: docs/modules/ROOT/pages/references/api-reference.adoc. Any other place needed? blush

Not everything is automated :)
Please do a local preview and go through all pages where documenting the new behaviour could make sense. I recall there's a config reference page where the env variables are documented. There definitely BACKUP_KEEPJOBS needs to be more thoroughly documented/deprecated. You might also find code examples and add the new variables there and then.

api/v1alpha1/history_limits_test.go Outdated Show resolved Hide resolved
cfg/config.go Outdated Show resolved Hide resolved
cfg/config.go Outdated Show resolved Hide resolved
cfg/config.go Outdated Show resolved Hide resolved
cfg/config_test.go Show resolved Hide resolved
job/status.go Outdated Show resolved Hide resolved
api/v1alpha1/archive_types.go Outdated Show resolved Hide resolved
api/v1alpha1/backup_types.go Outdated Show resolved Hide resolved
api/v1alpha1/backup_types.go Outdated Show resolved Hide resolved
api/v1alpha1/check_types.go Outdated Show resolved Hide resolved
api/v1alpha1/check_types.go Outdated Show resolved Hide resolved
api/v1alpha1/object.go Outdated Show resolved Hide resolved
api/v1alpha1/schedule_types.go Outdated Show resolved Hide resolved
executor/cleaner/cleaner.go Outdated Show resolved Hide resolved
job/status.go Outdated Show resolved Hide resolved
@Kidswiss
Copy link
Contributor

Kidswiss commented Jul 26, 2021

@bastjan

Sorry to barge in like that. But this caught my eye:

It fixes a bug where most job cleanups would result in always cleaning up the last object, due to the misuse of a pointer in a for range loop:

It would fix this case, too: #465, right?

Just opened that one a few hours ago :D

@bastjan
Copy link
Contributor Author

bastjan commented Jul 26, 2021

@bastjan

It would fix this case, too: #465, right?

Just opened that one a few hours ago :D

I can't yet prove it, but I do think so yes 馃槉

api/v1alpha1/backup_types.go Outdated Show resolved Hide resolved
api/v1alpha1/check_types.go Outdated Show resolved Hide resolved
executor/cleaner/cleaner.go Outdated Show resolved Hide resolved
executor/generic.go Outdated Show resolved Hide resolved
@ccremer
Copy link
Contributor

ccremer commented Jul 26, 2021

I ran the integration tests locally and get some error output here. It doesn't fail the tests (exit code 0) but I'm not sure if that's something as expected:

=== CONT  Test_Check/TestJobCleanup
    check_integration_test.go:105: 2 deleted Checks found
=== RUN   Test_Check/TestReconciliation
    envsuite_test.go:147: creating namespace 'x7vs977v'
    envsuite_test.go:154: creating resource 'x7vs977v/check-integration-test0'
=== CONT  Test_Check
    logger.go:130: 2021-07-26T18:53:20.936+0200	DEBUG	adding job to the queue	{"check": "x7vs977v/check-integration-test0"}
    logger.go:130: 2021-07-26T18:53:20.948+0200	DEBUG	adding job to the queue	{"check": "x7vs977v/check-integration-test0"}
=== CONT  Test_Check/TestReconciliation
    check_integration_test.go:169: 1 Jobs found
=== CONT  Test_Check
    logger.go:130: 2021-07-26T18:53:22.349+0200	ERROR	could not patch status conditions	{"check": "x7vs977v/check-integration-test0", "error": "Patch \"http://127.0.0.1:36099/apis/backup.appuio.ch/v1alpha1/namespaces/x7vs977v/checks/check-integration-test0/status\": dial tcp 127.0.0.1:36099: connect: connection refused"}
    logger.go:130: 2021-07-26T18:53:22.349+0200	ERROR	cannot create job	{"check": "x7vs977v/check-integration-test0", "repository": "s3:/", "error": "Post \"http://127.0.0.1:36099/apis/batch/v1/namespaces/x7vs977v/jobs\": dial tcp 127.0.0.1:36099: connect: connection refused"}
--- PASS: Test_Check (14.69s)
    --- PASS: Test_Check/TestJobCleanup (7.17s)
    --- PASS: Test_Check/TestReconciliation (1.03s)

@ccremer
Copy link
Contributor

ccremer commented Jul 26, 2021

Please have another look at the documentation

Basically, a grep -r -i "keepjobs" . and see where it needs updating

@bastjan
Copy link
Contributor Author

bastjan commented Jul 26, 2021

Please have another look at the documentation

Basically, a grep -r -i "keepjobs" . and see where it needs updating

Thanks 馃槉 I'm still working on the documentation馃憤

@bastjan
Copy link
Contributor Author

bastjan commented Jul 26, 2021

I ran the integration tests locally and get some error output here. It doesn't fail the tests (exit code 0) but I'm not sure if that's something as expected:

I have the same issues on master. Did not yet take a look at the root cause.

@bastjan bastjan requested a review from corvus-ch July 26, 2021 21:31
cfg/config.go Outdated Show resolved Hide resolved
docs/modules/ROOT/pages/references/config-reference.adoc Outdated Show resolved Hide resolved
@bastjan bastjan requested a review from ccremer July 27, 2021 09:22
executor/cleaner/cleaner_test.go Outdated Show resolved Hide resolved
executor/cleaner/cleaner_test.go Outdated Show resolved Hide resolved
Copy link
Contributor

@corvus-ch corvus-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good now. I left some remarks. Those are general thoughts and I do not see a need for a change.

cfg/config_test.go Outdated Show resolved Hide resolved
@bastjan bastjan merged commit f746611 into master Jul 28, 2021
@bastjan bastjan deleted the improve-monitoring branch July 28, 2021 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Keep the Job Pods of Failed Jobs
4 participants