Skip to content

Fix infinite loop in calendarspec calculation when timezone has negative DST save value #19075

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 23, 2021

Conversation

keszybz
Copy link
Member

@keszybz keszybz commented Mar 22, 2021

No description provided.

We have a bug where we seem to enter an infinite loop when running in the
Europe/Dublin timezone. The timezone is "special" because it has negative SAVE
values. The handling of this should obviously be fixed, but let's use a
belt-and-suspenders approach, and gracefully fail if we fail to find an answer
within a specific number of attempts. The code in this function is rather
complex, and it's hard to rule out another bug in the future.
@keszybz keszybz added pid1 priority Stuff that should enter master quickly, since it fixes a major bug, unbreaks CI or stalls other work labels Mar 22, 2021
@keszybz keszybz added this to the v248 milestone Mar 22, 2021
@keszybz
Copy link
Member Author

keszybz commented Mar 22, 2021

The first and last patches are the most interesting. I'd be grateful for quick review of at least those.

@keszybz keszybz force-pushed the calendarspec-loop branch from cbbf8c1 to fbdabad Compare March 22, 2021 12:58
Copy link
Member

@bluca bluca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a timezone experts, but the changes themselves look good to me.

Copy link
Member

@poettering poettering left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, minor stuff only.

I wonder how we never noticed this earlier?

* aborting here is problematic and hard to diagnose for users. */
_cleanup_free_ char *s = NULL;
(void) calendar_spec_to_string(spec, &s);
return log_warning_errno(SYNTHETIC_ERRNO(EDEADLK),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think LOG_DEBUG should be enough here. I mean, the error is propagated up here which should cause a logged about failure anyway, and the error is pretty descriptive even. And debug logging should give then enough hints

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was that this shouldn't happen. I.e. if we get here it means that we have an error in our logic. Hence the high log level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then an assert() should be good too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the big comment above: assert would lead to boot failure and would be very unpleasant to users. An error here shouldn't happen, but it has already happened a few times in the past, so I think it's reasonable to look for a solution that makes it easy for users to diagnose the issue and report to us when it happens again. Part of the problem is that we depend on the details of mktime() here, and even if our code works fine, glibc may be updated and something strange might happen in the future. Also, people are creative with timezones, so maybe some future combination of dst and leap second changes will trip us again. A warning instead of a failure is much more user friendly.

CMP(t.tm_mday, tm->tm_mday) ?:
CMP(t.tm_hour, tm->tm_hour) ?:
CMP(t.tm_min, tm->tm_min) ?:
CMP(t.tm_sec, tm->tm_sec);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't matter, but I'd split this out into a helper tm_cmp(), sounds generally useful

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but it doesn't compare all fields... So I think this would be a potential pitfall to have tm_cmp that returns 0 when the structures are different.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if my grep-foo is correct, that's the only place where we do this kind of comparison on struct tm. So I'll leave it as is for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but it doesn't compare all fields... So I think this would be a potential pitfall to have tm_cmp that returns 0 when the structures are different.

The other fields are redundant though...

keszybz added 3 commits March 22, 2021 20:08
…ariable

The scope of start & stop is narrowed down, and they are assigned only once.
No functional change, but I think the code is easier to read this way.
Also add a comment to make the code easier to read.
The output is rather long at this makes it easier to jump to the right place.
Also use normal output routines and set_unset_env() to make things more
compact.
I *think* it doesn't actually make any difference, because ":" will be ignored.
437f48a added prefixing with ":", but didn't
take into account the fact that we also use "" with a different meaning than
NULL here. But let's restore the original behaviour of specifying the empty
string.
@keszybz keszybz force-pushed the calendarspec-loop branch from fbdabad to 97a4f05 Compare March 22, 2021 19:11
@keszybz
Copy link
Member Author

keszybz commented Mar 22, 2021

Thank you all. I updated the PR with the suggested changes, except when commented above.

@keszybz keszybz added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed good-to-merge/with-minor-suggestions labels Mar 22, 2021
@bluca bluca added ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR and removed good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed labels Mar 22, 2021
When trying to calculate the next firing of 'Sun *-*-* 01:00:00', we'd fall
into an infinite loop, because mktime() moves us "backwards":

Before this patch:
tm_within_bounds: good=0 2021-03-29 01:00:00 → 2021-03-29 00:00:00
tm_within_bounds: good=0 2021-03-29 01:00:00 → 2021-03-29 00:00:00
tm_within_bounds: good=0 2021-03-29 01:00:00 → 2021-03-29 00:00:00
...

We rely on mktime() normalizing the time. The man page does not say that it'll
move the time forward, but our algorithm relies on this. So let's catch this
case explicitly.

With this patch:
$ TZ=Europe/Dublin faketime 2021-03-21 build/systemd-analyze calendar --iterations=5 'Sun *-*-* 01:00:00'
Normalized form: Sun *-*-* 01:00:00
    Next elapse: Sun 2021-03-21 01:00:00 GMT
       (in UTC): Sun 2021-03-21 01:00:00 UTC
       From now: 59min left
       Iter. #2: Sun 2021-04-04 01:00:00 IST
       (in UTC): Sun 2021-04-04 00:00:00 UTC
       From now: 1 weeks 6 days left           <---- note the 2 week jump here
       Iter. #3: Sun 2021-04-11 01:00:00 IST
       (in UTC): Sun 2021-04-11 00:00:00 UTC
       From now: 2 weeks 6 days left
       Iter. #4: Sun 2021-04-18 01:00:00 IST
       (in UTC): Sun 2021-04-18 00:00:00 UTC
       From now: 3 weeks 6 days left
       Iter. #5: Sun 2021-04-25 01:00:00 IST
       (in UTC): Sun 2021-04-25 00:00:00 UTC
       From now: 1 months 4 days left

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1941335.
@keszybz keszybz force-pushed the calendarspec-loop branch from 97a4f05 to 129cb6e Compare March 22, 2021 23:36
@keszybz keszybz added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed ci-fails/needs-rework 🔥 Please rework this, the CI noticed an issue with the PR labels Mar 22, 2021
@keszybz keszybz merged commit ca83c7f into systemd:main Mar 23, 2021
@keszybz keszybz removed the good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed label Mar 23, 2021
@keszybz keszybz deleted the calendarspec-loop branch March 23, 2021 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pid1 priority Stuff that should enter master quickly, since it fixes a major bug, unbreaks CI or stalls other work
Development

Successfully merging this pull request may close these issues.

5 participants