Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs get rescheduled when cronjobber pod (with updatetz) is recreated #24

Closed
fethiozdol opened this issue May 15, 2020 · 0 comments · Fixed by #25
Closed

Jobs get rescheduled when cronjobber pod (with updatetz) is recreated #24

fethiozdol opened this issue May 15, 2020 · 0 comments · Fixed by #25

Comments

@fethiozdol
Copy link
Contributor

fethiozdol commented May 15, 2020

Hi,

Thank you for this controller, great work! Unfortunately, I had this odd issue recently and I was not able to replicate the issue in a test environment therefore I don't know the root cause. I think this is a critical bug, because it's about an intermittently occurring unexpected behaviour: A job for each TZCronjob get scheduled immediately, regardless of their schedules defined in TZCronJob, when cronjobber pod is recreated.

Platform: AWS EKS 1.14

Scenario
1- cronjobber 0.2.0 was installed with updatetz

2- A TZCronJob is created with schedule "45 14 * * 2" and timezone "Europe/London", no startingDeadlineSeconds set

3- A job is scheduled and completed successfully on 12 May 1:45 pm UTC, which is expected

NAME                COMPLETIONS   DURATION   AGE
somejob-1589291100  1/1           7s         2d

4- Cronjobber pod gets recreated at "Wed, 13 May 2020 15:13:52 +0000":

cronjobber container logs

{"level":"info","ts":"2020-05-13T15:14:24.116Z","caller":"cronjobber/main.go:75","msg":"Starting Cronjobber version 0.2.0 revision f860bc912c395c58fa72741d8d34e8bf4b1a2c00"}
{"level":"info","ts":"2020-05-13T15:14:24.117Z","caller":"cronjobber/controller.go:106","msg":"Starting TZCronJob Manager"}

updatetz container logs

2020-05-13T15:14:29+0000 Local Time Zone database updated to version 2020a on /tmp/zoneinfo

5- Although its last schedule was for 12 May 1:45 pm UTC and it shouldn't be scheduled until next week, another job of this TZCronJob is started at "Wed, 13 May 2020 15:14:24 +0000", which is exactly the same to the second with the start time of cronjobber.

NAME                COMPLETIONS   DURATION   AGE
somejob-1589291100  1/1           7s         2d
somejob-1589294700  1/1           17s        23h

Now, this happened for all of our TZCronJobs. They got rescheduled immediately after cronjobber pod is recreated 😰

Observation
1- updatetz updates the local tz database for the first time in the pod 5 seconds later than cronjobber controller's start time.

2- I've noticed that unix timestamps on Job specs can lead to something. 1589291100 (the expected behaviour) is 12 May 1:45 pm UTC, which is good.
It gets interesting for 1589294700: This is 12 May 2:45 pm (UTC) -> so it looks like cronjobber could not get the timezone data from the mounted volume when it started up. I guess this caused cronjobber to misinterpret existing schedules. Cronjobber synced the existing cronjobs with the misinterpreted tzdata, which led kube-scheduler to interpret that the last schedules were missed although they weren't missed and hence kube-scheduler rescheduled all jobs.

Potential workarounds/suggestions
Since I couldn't replicate the scenario, below would be only blind guesses;
1- We could set startingDeadlineSeconds, but that would not solve the issue permanently. There will be some possibility for the issue to reoccur if [misinterpreted t] + startingDeadlineSeconds > x, where t is the tzcronjob's last successful execution timestamp and x is the time when cronjobber pod is recreated.

2- I haven't written a single line of Go (yet) so this may be gibberish, but I think there may be some missing/wrong variable initialisation. For example, if there is an initial value of the current timestamp, that is used when cronjobber is not able to read tzdata, then this magic number should be of a max value, instead of a min value. In other words, it's better to do nothing (don't sync) than doing something wrong (rescheduling).

3- Cronjobber should not start until updatetz does its job at least once. Maybe docker-entrypoint in updatetz can touch a file on another volume and an init container within the same pod will check if this file exists in this volume. Since there will be an init container, the cronjobber container will not start until the init container completes successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant