-
-
Notifications
You must be signed in to change notification settings - Fork 4k
Fix infinite loop in calendarspec calculation when timezone has negative DST save value #19075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We have a bug where we seem to enter an infinite loop when running in the Europe/Dublin timezone. The timezone is "special" because it has negative SAVE values. The handling of this should obviously be fixed, but let's use a belt-and-suspenders approach, and gracefully fail if we fail to find an answer within a specific number of attempts. The code in this function is rather complex, and it's hard to rule out another bug in the future.
|
The first and last patches are the most interesting. I'd be grateful for quick review of at least those. |
cbbf8c1 to
fbdabad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a timezone experts, but the changes themselves look good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, minor stuff only.
I wonder how we never noticed this earlier?
| * aborting here is problematic and hard to diagnose for users. */ | ||
| _cleanup_free_ char *s = NULL; | ||
| (void) calendar_spec_to_string(spec, &s); | ||
| return log_warning_errno(SYNTHETIC_ERRNO(EDEADLK), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think LOG_DEBUG should be enough here. I mean, the error is propagated up here which should cause a logged about failure anyway, and the error is pretty descriptive even. And debug logging should give then enough hints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My idea was that this shouldn't happen. I.e. if we get here it means that we have an error in our logic. Hence the high log level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then an assert() should be good too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the big comment above: assert would lead to boot failure and would be very unpleasant to users. An error here shouldn't happen, but it has already happened a few times in the past, so I think it's reasonable to look for a solution that makes it easy for users to diagnose the issue and report to us when it happens again. Part of the problem is that we depend on the details of mktime() here, and even if our code works fine, glibc may be updated and something strange might happen in the future. Also, people are creative with timezones, so maybe some future combination of dst and leap second changes will trip us again. A warning instead of a failure is much more user friendly.
| CMP(t.tm_mday, tm->tm_mday) ?: | ||
| CMP(t.tm_hour, tm->tm_hour) ?: | ||
| CMP(t.tm_min, tm->tm_min) ?: | ||
| CMP(t.tm_sec, tm->tm_sec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't matter, but I'd split this out into a helper tm_cmp(), sounds generally useful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but it doesn't compare all fields... So I think this would be a potential pitfall to have tm_cmp that returns 0 when the structures are different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if my grep-foo is correct, that's the only place where we do this kind of comparison on struct tm. So I'll leave it as is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but it doesn't compare all fields... So I think this would be a potential pitfall to have tm_cmp that returns 0 when the structures are different.
The other fields are redundant though...
…ariable The scope of start & stop is narrowed down, and they are assigned only once. No functional change, but I think the code is easier to read this way. Also add a comment to make the code easier to read.
The output is rather long at this makes it easier to jump to the right place. Also use normal output routines and set_unset_env() to make things more compact.
I *think* it doesn't actually make any difference, because ":" will be ignored. 437f48a added prefixing with ":", but didn't take into account the fact that we also use "" with a different meaning than NULL here. But let's restore the original behaviour of specifying the empty string.
fbdabad to
97a4f05
Compare
|
Thank you all. I updated the PR with the suggested changes, except when commented above. |
When trying to calculate the next firing of 'Sun *-*-* 01:00:00', we'd fall
into an infinite loop, because mktime() moves us "backwards":
Before this patch:
tm_within_bounds: good=0 2021-03-29 01:00:00 → 2021-03-29 00:00:00
tm_within_bounds: good=0 2021-03-29 01:00:00 → 2021-03-29 00:00:00
tm_within_bounds: good=0 2021-03-29 01:00:00 → 2021-03-29 00:00:00
...
We rely on mktime() normalizing the time. The man page does not say that it'll
move the time forward, but our algorithm relies on this. So let's catch this
case explicitly.
With this patch:
$ TZ=Europe/Dublin faketime 2021-03-21 build/systemd-analyze calendar --iterations=5 'Sun *-*-* 01:00:00'
Normalized form: Sun *-*-* 01:00:00
Next elapse: Sun 2021-03-21 01:00:00 GMT
(in UTC): Sun 2021-03-21 01:00:00 UTC
From now: 59min left
Iter. #2: Sun 2021-04-04 01:00:00 IST
(in UTC): Sun 2021-04-04 00:00:00 UTC
From now: 1 weeks 6 days left <---- note the 2 week jump here
Iter. #3: Sun 2021-04-11 01:00:00 IST
(in UTC): Sun 2021-04-11 00:00:00 UTC
From now: 2 weeks 6 days left
Iter. #4: Sun 2021-04-18 01:00:00 IST
(in UTC): Sun 2021-04-18 00:00:00 UTC
From now: 3 weeks 6 days left
Iter. #5: Sun 2021-04-25 01:00:00 IST
(in UTC): Sun 2021-04-25 00:00:00 UTC
From now: 1 months 4 days left
Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1941335.
97a4f05 to
129cb6e
Compare
No description provided.