gh-152847: Reject POSIX TZ rule with non-digit day-of-year in _zoneinfo.py#152848
Conversation
… POSIX TZ rules The J and n day-of-year branches of _parse_dst_start_end() fell through to a bare int(date), accepting input the C accelerator rejects (for example J1_0, which int() reads as day 10, silently building a different zone). Guard the branch with an re.ASCII digit match mirroring the C parser's parse_digits(1, 3), so both implementations agree.
|
Just for the record, quoting RFC 9636:
This is what the C accelerator currently complies with. |
StanFromIreland
left a comment
There was a problem hiding this comment.
LGTM, one little nit.
_zoneinfo.py
|
Thanks @tonghuaroot for the PR, and @StanFromIreland for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13. |
|
Thanks @tonghuaroot for the PR, and @StanFromIreland for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14. |
|
Thanks @tonghuaroot for the PR, and @StanFromIreland for merging it 🌮🎉.. I'm working now to backport this PR to: 3.15. |
|
GH-152908 is a backport of this pull request to the 3.13 branch. |
|
GH-152909 is a backport of this pull request to the 3.14 branch. |
|
GH-152910 is a backport of this pull request to the 3.15 branch. |
|
Merged, thanks. |
The pure-Python
zoneinfoparser validated theMm.w.dtransition rulestrictly, but the
Jn(Julian) andn(0-based) day-of-year branches of_parse_dst_start_end()fell through to a bareint(date)with no formatcheck.
int()accepts input the C accelerator rejects, so the twoimplementations disagreed on the same POSIX TZ string.
The C accelerator reads the day-of-year field with
parse_digits(&ptr, 1, 3, &day)inModules/_zoneinfo.c, consuming 1 to 3ASCII digits (
Py_ISDIGIT) and nothing else. This PR adds the matching guardbefore
int(), so the pure parser now matches the C accelerator exactly forthis field — not stricter, not looser.
The most notable case is a silent miscompile rather than a crash:
int('1_0')is
10, soAAA4BBB,J1_0,J300/2previously built a valid but different zone(DST on day 10) in pure Python while the C accelerator raised
ValueError.The fix also aligns the
J+1, leading-space, 4+-digit-width (J0001), andnon-ASCII-digit cases. The
1-to-3-digit bound is deliberate: the C parserconsumes at most three digits, so
\d+would make the pure parser acceptJ0001, which C rejects. Leading zeros within three digits (J01,J001)are still accepted by both. The existing
_DayOffsetrange check(
[julian, 365]) is untouched, so no numeric-range behaviour changes.Verified with a C-vs-pure differential (10 divergent inputs before, 0 after),
a zero-regression pass over all 499 bundled IANA zones (byte-identical through
both implementations), and the full
test_zoneinfosuite. Added invalid-TZcases (
J1_0,1_0,J+1,J 1,1,J0001,0001) and leading-zerovalid controls (
J001,001); these run against bothTZStrTestandCTZStrTest.This covers a distinct field in the same POSIX TZ parity audit as gh-152212
(std offset), gh-152246 (
Mm.w.dseparator), and gh-152248 (abbreviationregex).
Jn/nday-of-year field accepts non-digit input viaint()(C rejects) #152847