New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update schedule logic to properly calculate missed schedules #118724
Conversation
@soltysh: GitHub didn't allow me to assign the following users: kmala. Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig apps |
/retest |
/lgtm |
@kmala: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
a66efe5
to
c3060e8
Compare
// in time one more time unit back and let the cron library calculate | ||
// a proper schedule, for case where the schedule is not consistent, | ||
// for example something like 30 6-16/4 * * 1-5 | ||
potentialEarliest := t1.Add(time.Duration((numberOfMissedSchedules-1-1)*timeBetweenTwoSchedules) * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon second thought, this will not work in few corner scenarios.If the schedule is such that if the difference between now and the actual most recent time is greater than twice the timeBetweenTwoSchedules, then mostRecentTime would be returned as empty. An example test case which would fail would be
{
name: "complex schedule",
cj: &batchv1.CronJob{
ObjectMeta: metav1.ObjectMeta{
CreationTimestamp: metav1TopOfTheHour,
},
Spec: batchv1.CronJobSpec{
Schedule: "30 10,11,12 * * 1-5",
},
Status: batchv1.CronJobStatus{
LastScheduleTime: &metav1HalfPastTheHour,
},
},
now: *deltaTimeAfterTopOfTheHour(30*time.Hour + 30*time.Minute),
expectedRecentTime: deltaTimeAfterTopOfTheHour(26*time.Hour + 30*time.Minute),
expectedEarliestTime: *deltaTimeAfterTopOfTheHour(30 * time.Minute),
expectedNumberOfMisses: 2,
},
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we expect the number of misses to be five (5) here?
- 2016-05-19 T 10:30 (expectedEarliestTime)
- 2016-05-19 T 11:30
- 2016-05-19 T 12:30
- 2016-05-20 T 10:30
- 2016-05-20 T 11:30
- 2016-05-20 T 12:30 (expectedRecentTime)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@helayoty yes that's correct, the expectedNumberOfMisses should be 5 but since numberOfMissedSchedules is an approximate, i was concentrating more on the failure with respective expectedRecentTime
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my other comment below about missed schedules.
now: *deltaTimeAfterTopOfTheHour(24*time.Hour + 31*time.Minute), | ||
expectedRecentTime: deltaTimeAfterTopOfTheHour(24*time.Hour + 30*time.Minute), | ||
expectedEarliestTime: *topOfTheHour(), | ||
expectedNumberOfMisses: 7, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expectedNumberOfMisses
should be four(4) only; (10:30, 14:30, 6:30, and 10:30)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm aware, the error comes from calculating the diff between schedules, which will depend on when it's calculated, so I wouldn't take much care into that value, I'm even inclined to drop that number entirely, if it's confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was discussed on slack, in followup we'll get rid of exposing that information from this method entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#118940 is the promised followup dropping that number
Before this change we've assumed a constant time between schedule runs, which is not true for cases like "30 6-16/4 * * 1-5". The fix is to calculate the potential next run using the fixed schedule as the baseline, and then go back one schedule back and allow the cron library to calculate the correct time. This approach saves us from iterating multiple times between last schedule time and now, if the cronjob for any reason wasn't running for significant amount of time.
c3060e8
to
af1c9e4
Compare
Unrelated failure |
/approve consensus on slack was to merge and iterate! |
LGTM label has been added. Git tree hash: 459d25375f05bbfefd96979b94752e191154e9cc
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dims, soltysh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind bug
What this PR does / why we need it:
This is alternative approach to fix the problem described in #117166
Before this change we've assumed a constant time between schedule runs, which is not true for cases like "30 6-16/4 * * 1-5". The fix is to calculate the potential next run using the fixed schedule as the baseline, and then go back one schedule back and allow the cron library to calculate the correct time.
This approach saves us from iterating multiple times between last schedule time and now, if the cronjob for any reason wasn't running for significant amount of time.
Which issue(s) this PR fixes:
None
Special notes for your reviewer:
/assign @kmala @atiratree
Does this PR introduce a user-facing change?