-
Notifications
You must be signed in to change notification settings - Fork 744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTC or Local timestamp: Timezone Provenance #945
Comments
I consider this approach to harmful. While it might make sense for a specific file format to support an option to specify the time zone its fields should be considered of, no sane API should use local time. It can and should be considered a purely representional issue. |
To be clear: I don't think libarchive should convert any timestamps to local time (although, if there's any that can be converted to UTC, that would be nice). I do think libarchive should provide as much information as it can about the timezone of the timestamps that it returns. |
The libarchive API should be concerned with timestamps in UTC only. Like pretty much any sensible API. I don't think "this timestamp was CEST" is useful data. If you look around, most systems will agree. PostgreSQL will store UTC internally as well for "timestamp with timezone" and only create a representation in local time when necessary. |
The libarchive API is not concerned with timestamps in UTC only. libarchive takes timestamps which may be UTC or may be "local time" and shoves them into the same field. A timestamp without even an implicit timezone is not meaningful, as it could refer to many times across a 26 hour period. If a timestamp was originally UTC, that's something I'd like to know. If a timestamp was in any other timezone, that's also something I'd like to know. It's dangerous to assume that the current timezone is the same timezone as the one on the machine that created the archive we're inspecting. ZIP files, for instance, store the modified time as a local timestamp. The timezone of the machine that created an archive is simply not known, and it's wrong to assume that it was UTC. The very least libarchive can do is tell us when it knows the timezone (which, AFAIK, is always UTC), and when it doesn't. |
That's wrong. time_t is UTC. Whether some archive format doesn't do correct timezone conversion is, as I said, a completely separate issue and specific to that format. |
Correct timezone conversion is impossible when ambiguous timezones aren't even accepted as a possibility. ZIP files, for instance, will only give you the correct time when the current timezone is the same timezone as on the machine that created the ZIP file. Libarchive's behavior is itself fine - it's well-defined and consistent. The issue is I can't distinguish those timestamps from the timestamps that are correct unconditionally. |
Offhand, I think this might be true, but I'm not entirely certain. Definitely tar, cpio, and ISO9660 use UTC; I believe Zip always stores local time. In particular, I suspect that all four timestamps are the same in all cases. To help me understand: What do you expect to do differently with bogus timestamps? Your request to somehow mark them is probably pretty simple, but our API surface is pretty big already, so I try to make sure I understand people's use cases before adding more to it. (FWIW, this is very similar to our problem with filename encodings. Sometimes we know the character set for the filename (usually because it's UTF-8) and sometimes we only have a bunch of bytes, and it's a real pain point for some people. Unfortunately, I've not come up with a good way to resolve this without breaking the current API.) |
Great, that should keep things pretty simple. I certainly don't need an API any more fine-grained than necessary. Basically, I want to be able to differentiate bogus timestamps for end-user reporting. UTC timestamps don't require any special attention, but I want to be able to put an asterisk by all of the bogus timestamps to indicate that the exact time is unknown. The bogus timestamps require special attention for date filtering and the like. Sometimes it's desirable to report archive entry timestamps in a user-specified timezone (possibly different from what is believed to be the timezone of the machine that created the archive). The UTC timestamps can be localized, but the bogus ones really shouldn't be, lest the original data become even more mangled. |
libarchive provides the
archive_entry_time
family of functions for inspecting the various timestamps associated with archive entries. There is an inherent ambiguity in archive timestamps: some archives store timestamps in UTC, while others are stored as local timestamps.To resolve this ambiguity, I'd like to see a new API to check whether a timestamp is local or UTC. This ought to be deducible from the archive type.
Is it the case that all timestamps for a given archive type will all be UTC or all be local? If so, I believe a single
bool archive_supports_utc(struct archive *)
function would be sufficient. Of course,archive_entry_supports_utc
or evenarchive_entry_ctime_is_utc
would allow for more fine-grained handling of time zones.The text was updated successfully, but these errors were encountered: