You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
proposal: time: POSIX style TZ strings on Unix and timezone handling optimisations
Dear Gophers,
This proposal is about local timezone initialisation on Unix and other improvements to timezone handling. I already implemented most of the proposed features, but wanted to discuss it before submitting patches.
tzcode is the code part of Zoneinfo, dealing with Zoneinfo files and timezone conversions. It's used in glibc and other Unix libc implementations.
A compiled Zoneinfo file contains zero or more static transitions and a TZ string that applies after the last static transition. The TZ string describes either a static zone or a pair of rules describing yearly transition times and target zones.
Introduction: TZ environment variable on Unix (libc/tzcode)
On Unix, the time package reads the local timezone information from a Zoneinfo file according to the value of the TZ environment variable: if it's unset, from /etc/localtime; if it's <file> or :<file>, from <file>. In case of any failure, UTC is used.
libc behaves similarly, but if the named file can not be read and the value does not start with ":", the value is parsed as a POSIX style TZ string. E.g., TZ=JST-9 date will display the date in a timezone named "JST" at UTC+9, and TZ=CET-1CEST,M3.5.0,M10.5.0/3 date in CET UTC+1 or CEST UTC+2 DST, the latter between last Sunday of March 02:00 CST and last Sunday of October 03:00 CEST.
POSIX style TZ strings
It would be nice to add support for such TZ settings to Go, to bring it in line with the rest of the system. The time package already has a parser for such strings, as they are used in compiled Zoneinfo files for timestamps after the last static transition.
The implementation requires a new error type for unknown timezones to be returned from loadLocation, so that initLocal can check the error and call tzset only when the zone is not found, and not on other errors.
Questions:
The parser in Go tzset is strict, failing on any syntax error. The tzcode parser best-effort, accepting as many fields as it can parse and discarding the rest. Should the Go parser be changed accordingly?
Argument against: Garbage in, garbage out.
Argument for: Compatibility with the rest of the system. Also, currently LoadLocationFromTZData fails on any error except TZ string errors, even when it calls tzset (to populate the cache).
Is it relevant to other OSes?
Should time.LoadLocation be changed to accept POSIX style TZ strings, additionally to timezone names? If yes, only on Unix or on other OSes as well?
Or should another API function be added?
FWIW, there's a comment near LoadLocation:
// NOTE(rsc): Eventually we will need to accept the POSIX TZ environment
// syntax too, but I don't feel like implementing it today.
TZ string: limits
tzcode allows absolute UTC offsets less than 25 hours (up to 24:59:59), and time in rules less than 168 hours (7 days). The former is a POSIX requirement, the latter a Zoneinfo extension. Go currently allows <168 hours for both. I propose limiting allowed UTC offsets to match those of tzcode.
Optimisation: rules
Rationale: The current caching approach is based on the assumption that most timezone lookups will be for timestamps around the present. In all but two zoneinfo timezones the TZ string apples in the present (late 2023). Most suggestions here are either pure optimisation or moving calculations from lookup time to be done once at load time.
TZ string parsing
After loading a zoneinfo file, the TZ string is kept in the Location struct and is parsed on every non-cached lookup after the last static transition, whether it describes rules or a static zone. Currently, TZ strings in over 2/3 of all unique Zoneinfo locations, including the two most populated ones ("Asia/Shanghai" and "Asia/Kolkata"), specify static zones.
My proposal is:
Parse the TZ string at load time.
If it describes transition rules, store []rule in the Location structure. Add a *zone pointing to the transition target to the rule structure.
If it specifies a static zone, discard it. The last static transition specifies the same zone.
When parsing the TZ environment variable, use it to create a fixed location.
Detect Zoneinfo version 3 permanent DST zones and treat them as static zones.
Day of week calculation
The only rule kind used in practice is the "M" rule, containing month, week and day of week of the transition. These are used to compute the day of year.
Simplify the calculation of day of year by treating week 5 as starting 7 days before the next month instead of looping.
Calculate day of year first and add it to the day of week of 1 January in that year instead of using Zeller's congruence.
Fix handling of negative years and use simplified Tomohiko Sakamoto's algorithm to calculate the day of week of 1 Jan. Better yet, use absolute day as shown below.
Simplifying the rule structure
Remove month and week. At load time, convert month and week to day of year. Add a separate day of week field.
Remove rule kind. Use a sentinel day of week value (-1) or a flag for other rule kinds.
Add a flag to indicate whether a day should be added during leap years; this needs to be explicit to distinguish between "Sunday in week 4 of February" and "last Sunday of February", and between "J" and DOY rule kinds for day>=59.
Convert the time of day to UTC, to avoid subtracting the offset each time.
Reorder the rules if DST ends earlier in a year than it begins.
Rule normalisation
Normalise rules so that time of day is always non-negative and less than secondsPerDay, and day is always non-negative and, if possible, less than 365.
The latter is not possible with DOY rules whose day, after adjustment, is >=365.
Additionally, with "M" rules whose adjusted day is 26 to 31 December, the transition will sometimes happen in the next year.
Rules resulting in transitions in another year do not occur in Zoneinfo, and other implementations, including tzcode, don't handle them correctly, but we can (see below).
With normalised rules the transition happens between year days day and day + 7, inclusive (adding 0-1 days for leap years and 0-6 days for day of week). Without it, between day - 14 and day + 21 (also adding -14 to 14 days for UTC offset and transition time).
Code
After implementing all of the above, and changing tzruleTime to accept the return values of dayOfEpoch(year) and isLeap(year) instead of year and return Unix time, it looks like this (with comments stripped):
func tzruleTime(yearStartDay uint64, r rule, leapYear bool) int64 {
d := int(yearStartDay) + r.day
if leapYear && r.addLeapDay {
d++
}
if r.dow >= 0 {
delta := (d - r.dow) % 7
d += 6 - delta
}
return int64(d*secondsPerDay+r.time) + absoluteToInternal + internalToUnix
}
Zone boundaries
lookup returns the timespan when the zone applies (start and end), used:
to populate the lookup cache while creating a Location;
in Date, to avoid the second lookup in most cases;
in Time.ZoneBounds, essentially as return values.
Currently, if the zone spans a new year, tzset returns the new year instead of one of the values, to limit the number of transition time calculations to two. This only affects efficiency in the first two cases, but in the last case it affects correctness.
If the optimisations above are applied, the following algorithm results in two transition time computations, except when second transition in the previous year occurs past the end of the year and past the target time, in which case (that never happens in Zoneinfo) it's three computations:
Use the yday result from the call to absDate (the year day of sec, the target time). If it's before the day of the second rule, compute the time of the first transition, otherwise of the second.
If sec is before the result, compute the time of the previous transition. Repeat while sec is before the result (i.e., possibly once more).
Otherwise, compute the time of the next transition.
Optimisation: lookup
Most lookups are for times after the last static transition. Check it before searching.
For locations without static transitions a fake transition is created at the beginning of time. Do it for all locations to eliminate a rarely occuring special case during lookup. Call (*Location).lookupFirstZone from LoadLocationFromTZData to determine the transition target.
Alternatively: do it only for locations with static transitions. Fully static locations (like "Etc/GMT-1") will have the only zone cached anyway, for others use rules.
Avoid code duplication
Unify the code in LoadLocationFromTZData that fills the cache with lookup.
Possibly: change lookup to return *zone instead of name, offset and isDST. This would not make sense with the existing TZ string handling code, but does with proposed changes.
Limitations
The proposed implementations of tzruleTime and lookup may return incorrect results in the following cases:
Calculation may overflow in the last year before Unix time math.MaxInt64 (existing limitation).
Wrong results may be returned for years below absoluteZeroYear (existing limitation).
Result may be one week off in absoluteZeroYear for "M" rules whose adjusted day is before 7 January (does not occur in Zoneinfo).
Results will be unpredictable if the transitions occur in different order in different years or simultaneously, e.g., 4 April 2:00 UTC and first Sunday of April 2:00 UTC (existing limitation but different failures; does not occur in Zoneinfo).
Resulting speed-up
I wrote benchmarks that load testdata/2020b_Europe_Berlin, create a Time value and run Hour in a loop. The Time is one of:
With optimisations above applied to master (commit 505dff4), the results are:
Lookup using rules is 4 to 4.5 times faster than in master.
It is over 2 times slower than searching static transitions, and about 9 times slower than hitting the cache.
Other kinds of lookups stayed about as fast as in master.
The benchmarks were run in an uncontrolled environment, so I can't give you more precise results.
Timezone abbreviations allocation
Change LoadLocationFromTZData and abbrevChars to allocate one string for all the chars except trailing NUL and cut abbrevs from it, instead of many strings of 3 to 6 bytes. Especially useful with locations having several zones with the same name (e.g., Europe/Dublin has three zones named "IST") and America/Adak that has "HST" encoded as a substring of "AHST".
ZONEINFO environment variable
If ZONEINFO is set, LoadLocation tries to load the named zoneinfo file from the path specified by it. This should probably be added to initLocal in src/time/zoneinfo_unix.go for consistensy. tzcode does not use this variable.
(Tentative) Optimisation: caching
When a location is loaded, the zone valid now is cached in the Location structure to be used in subsequent lookups. This is good for most uses, but in long running processes (such as servers) lookups will slow down after the next transition.
Tentative proposal:
Cache the last 1 or 2 lookup result as well. Alternatively, only cache last lookup results. Caching 2 last lookups is useful for conversions to UTC (e.g., in Date) around a transition; caching more will add too much overhead.
Downside: this will require a sync.RWMutex and taking a read lock on every lookup that misses the "now" cache. A compromise would be calling TryRLock and only writing back the result if locking succeeded.
This is useful in particular scenarios, such as a mail server serialising "now" for "Received:" headers, but detrimental in others.
The text was updated successfully, but these errors were encountered:
unixdj
changed the title
proposal: import/path: proposal title
proposal: time: POSIX style TZ strings on Unix and timezone handling optimisations
Dec 11, 2023
This commit implements some of the changes outlined in proposal golang#64659.
To optimise subsequent rule time computations, the rule structure is
converted to: year day, week day, time of day and a flag indicating
whether to add a day during leap years.
At load time:
- In "M" rules, month and week are converted to year day and leap day
flag.
- Transition time is converted to UTC.
- Rule is normalised so that time of day is non-negative and less than
secondsPerDay, and year day is non-negative and, if possible, less
than 365.
- Rules in the Location structure are reordered in the order they occur
in a year.
The internal API of tzruleTime is changed to speed up week day
calculation.
Additionally, tzrule (and thus Time.ZoneBounds) now returns correct zone
bounds for zones spanning a new year.
Proposal Details
proposal: time: POSIX style TZ strings on Unix and timezone handling optimisations
Dear Gophers,
This proposal is about local timezone initialisation on Unix and other improvements to timezone handling. I already implemented most of the proposed features, but wanted to discuss it before submitting patches.
Related proposals:
CC: @rsc
References:
tzcode is the code part of Zoneinfo, dealing with Zoneinfo files and timezone conversions. It's used in glibc and other Unix libc implementations.
A compiled Zoneinfo file contains zero or more static transitions and a TZ string that applies after the last static transition. The TZ string describes either a static zone or a pair of rules describing yearly transition times and target zones.
Introduction: TZ environment variable on Unix (libc/tzcode)
On Unix, the time package reads the local timezone information from a Zoneinfo file according to the value of the TZ environment variable: if it's unset, from
/etc/localtime
; if it's<file>
or:<file>
, from<file>
. In case of any failure, UTC is used.libc behaves similarly, but if the named file can not be read and the value does not start with ":", the value is parsed as a POSIX style TZ string. E.g.,
TZ=JST-9 date
will display the date in a timezone named "JST" at UTC+9, andTZ=CET-1CEST,M3.5.0,M10.5.0/3 date
in CET UTC+1 or CEST UTC+2 DST, the latter between last Sunday of March 02:00 CST and last Sunday of October 03:00 CEST.POSIX style TZ strings
It would be nice to add support for such TZ settings to Go, to bring it in line with the rest of the system. The time package already has a parser for such strings, as they are used in compiled Zoneinfo files for timestamps after the last static transition.
The implementation requires a new error type for unknown timezones to be returned from
loadLocation
, so thatinitLocal
can check the error and calltzset
only when the zone is not found, and not on other errors.Questions:
The parser in Go
tzset
is strict, failing on any syntax error. The tzcode parser best-effort, accepting as many fields as it can parse and discarding the rest. Should the Go parser be changed accordingly?LoadLocationFromTZData
fails on any error except TZ string errors, even when it callstzset
(to populate the cache).Is it relevant to other OSes?
Should
time.LoadLocation
be changed to accept POSIX style TZ strings, additionally to timezone names? If yes, only on Unix or on other OSes as well?FWIW, there's a comment near
LoadLocation
:TZ string: limits
tzcode allows absolute UTC offsets less than 25 hours (up to 24:59:59), and time in rules less than 168 hours (7 days). The former is a POSIX requirement, the latter a Zoneinfo extension. Go currently allows <168 hours for both. I propose limiting allowed UTC offsets to match those of tzcode.
Optimisation: rules
Rationale: The current caching approach is based on the assumption that most timezone lookups will be for timestamps around the present. In all but two zoneinfo timezones the TZ string apples in the present (late 2023). Most suggestions here are either pure optimisation or moving calculations from lookup time to be done once at load time.
TZ string parsing
After loading a zoneinfo file, the TZ string is kept in the
Location
struct and is parsed on every non-cached lookup after the last static transition, whether it describes rules or a static zone. Currently, TZ strings in over 2/3 of all unique Zoneinfo locations, including the two most populated ones ("Asia/Shanghai" and "Asia/Kolkata"), specify static zones.My proposal is:
Parse the TZ string at load time.
If it describes transition rules, store
[]rule
in theLocation
structure. Add a*zone
pointing to the transition target to therule
structure.If it specifies a static zone, discard it. The last static transition specifies the same zone.
Detect Zoneinfo version 3 permanent DST zones and treat them as static zones.
Day of week calculation
The only rule kind used in practice is the "M" rule, containing month, week and day of week of the transition. These are used to compute the day of year.
Simplify the calculation of day of year by treating week 5 as starting 7 days before the next month instead of looping.
Calculate day of year first and add it to the day of week of 1 January in that year instead of using Zeller's congruence.
Fix handling of negative years and use simplified Tomohiko Sakamoto's algorithm to calculate the day of week of 1 Jan. Better yet, use absolute day as shown below.
Simplifying the rule structure
Remove month and week. At load time, convert month and week to day of year. Add a separate day of week field.
Remove rule kind. Use a sentinel day of week value (-1) or a flag for other rule kinds.
Add a flag to indicate whether a day should be added during leap years; this needs to be explicit to distinguish between "Sunday in week 4 of February" and "last Sunday of February", and between "J" and DOY rule kinds for day>=59.
Convert the time of day to UTC, to avoid subtracting the offset each time.
Reorder the rules if DST ends earlier in a year than it begins.
Rule normalisation
secondsPerDay
, and day is always non-negative and, if possible, less than 365.With normalised rules the transition happens between year days
day
andday + 7
, inclusive (adding 0-1 days for leap years and 0-6 days for day of week). Without it, betweenday - 14
andday + 21
(also adding -14 to 14 days for UTC offset and transition time).Code
After implementing all of the above, and changing
tzruleTime
to accept the return values ofdayOfEpoch(year)
andisLeap(year)
instead ofyear
and return Unix time, it looks like this (with comments stripped):Zone boundaries
lookup
returns the timespan when the zone applies (start
andend
), used:Location
;Date
, to avoid the second lookup in most cases;Time.ZoneBounds
, essentially as return values.Currently, if the zone spans a new year,
tzset
returns the new year instead of one of the values, to limit the number of transition time calculations to two. This only affects efficiency in the first two cases, but in the last case it affects correctness.If the optimisations above are applied, the following algorithm results in two transition time computations, except when second transition in the previous year occurs past the end of the year and past the target time, in which case (that never happens in Zoneinfo) it's three computations:
Use the
yday
result from the call toabsDate
(the year day ofsec
, the target time). If it's before theday
of the secondrule
, compute the time of the first transition, otherwise of the second.If
sec
is before the result, compute the time of the previous transition. Repeat whilesec
is before the result (i.e., possibly once more).Otherwise, compute the time of the next transition.
Optimisation: lookup
Most lookups are for times after the last static transition. Check it before searching.
For locations without static transitions a fake transition is created at the beginning of time. Do it for all locations to eliminate a rarely occuring special case during lookup. Call
(*Location).lookupFirstZone
fromLoadLocationFromTZData
to determine the transition target.Avoid code duplication
LoadLocationFromTZData
that fills the cache withlookup
.lookup
to return*zone
instead ofname
,offset
andisDST
. This would not make sense with the existing TZ string handling code, but does with proposed changes.Limitations
The proposed implementations of
tzruleTime
andlookup
may return incorrect results in the following cases:Calculation may overflow in the last year before Unix time
math.MaxInt64
(existing limitation).Wrong results may be returned for years below
absoluteZeroYear
(existing limitation).Result may be one week off in
absoluteZeroYear
for "M" rules whose adjusted day is before 7 January (does not occur in Zoneinfo).Results will be unpredictable if the transitions occur in different order in different years or simultaneously, e.g., 4 April 2:00 UTC and first Sunday of April 2:00 UTC (existing limitation but different failures; does not occur in Zoneinfo).
Resulting speed-up
I wrote benchmarks that load
testdata/2020b_Europe_Berlin
, create aTime
value and runHour
in a loop. TheTime
is one of:2020-10-29 15:30
(cache miss, TZ string / rules)1980-10-29 15:30
(cache miss, searching 60 static transitions)Now
(cache hit)With optimisations above applied to master (commit
505dff4
), the results are:The benchmarks were run in an uncontrolled environment, so I can't give you more precise results.
Timezone abbreviations allocation
Change
LoadLocationFromTZData
andabbrevChars
to allocate one string for all the chars except trailing NUL and cut abbrevs from it, instead of many strings of 3 to 6 bytes. Especially useful with locations having several zones with the same name (e.g., Europe/Dublin has three zones named "IST") and America/Adak that has "HST" encoded as a substring of "AHST".ZONEINFO environment variable
If
ZONEINFO
is set,LoadLocation
tries to load the named zoneinfo file from the path specified by it. This should probably be added toinitLocal
insrc/time/zoneinfo_unix.go
for consistensy. tzcode does not use this variable.(Tentative) Optimisation: caching
When a location is loaded, the zone valid now is cached in the Location structure to be used in subsequent lookups. This is good for most uses, but in long running processes (such as servers) lookups will slow down after the next transition.
Tentative proposal:
Cache the last 1 or 2 lookup result as well. Alternatively, only cache last lookup results. Caching 2 last lookups is useful for conversions to UTC (e.g., in
Date
) around a transition; caching more will add too much overhead.Downside: this will require a
sync.RWMutex
and taking a read lock on every lookup that misses the "now" cache. A compromise would be callingTryRLock
and only writing back the result if locking succeeded.This is useful in particular scenarios, such as a mail server serialising "now" for "Received:" headers, but detrimental in others.
The text was updated successfully, but these errors were encountered: