Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTC or Local timestamp: Timezone Provenance #945

Open
zweger opened this issue Aug 28, 2017 · 8 comments
Open

UTC or Local timestamp: Timezone Provenance #945

zweger opened this issue Aug 28, 2017 · 8 comments

Comments

@zweger
Copy link
Contributor

zweger commented Aug 28, 2017

libarchive provides the archive_entry_time family of functions for inspecting the various timestamps associated with archive entries. There is an inherent ambiguity in archive timestamps: some archives store timestamps in UTC, while others are stored as local timestamps.

To resolve this ambiguity, I'd like to see a new API to check whether a timestamp is local or UTC. This ought to be deducible from the archive type.

Is it the case that all timestamps for a given archive type will all be UTC or all be local? If so, I believe a single bool archive_supports_utc(struct archive *) function would be sufficient. Of course, archive_entry_supports_utc or even archive_entry_ctime_is_utc would allow for more fine-grained handling of time zones.

@jsonn
Copy link
Contributor

jsonn commented Aug 28, 2017

I consider this approach to harmful. While it might make sense for a specific file format to support an option to specify the time zone its fields should be considered of, no sane API should use local time. It can and should be considered a purely representional issue.

@zweger
Copy link
Contributor Author

zweger commented Aug 28, 2017

To be clear: I don't think libarchive should convert any timestamps to local time (although, if there's any that can be converted to UTC, that would be nice). I do think libarchive should provide as much information as it can about the timezone of the timestamps that it returns.

@jsonn
Copy link
Contributor

jsonn commented Aug 28, 2017

The libarchive API should be concerned with timestamps in UTC only. Like pretty much any sensible API. I don't think "this timestamp was CEST" is useful data. If you look around, most systems will agree. PostgreSQL will store UTC internally as well for "timestamp with timezone" and only create a representation in local time when necessary.

@zweger
Copy link
Contributor Author

zweger commented Aug 28, 2017

The libarchive API is not concerned with timestamps in UTC only. libarchive takes timestamps which may be UTC or may be "local time" and shoves them into the same field. A timestamp without even an implicit timezone is not meaningful, as it could refer to many times across a 26 hour period.

If a timestamp was originally UTC, that's something I'd like to know. If a timestamp was in any other timezone, that's also something I'd like to know. It's dangerous to assume that the current timezone is the same timezone as the one on the machine that created the archive we're inspecting.

ZIP files, for instance, store the modified time as a local timestamp. The timezone of the machine that created an archive is simply not known, and it's wrong to assume that it was UTC. The very least libarchive can do is tell us when it knows the timezone (which, AFAIK, is always UTC), and when it doesn't.

@jsonn
Copy link
Contributor

jsonn commented Aug 28, 2017

That's wrong. time_t is UTC. Whether some archive format doesn't do correct timezone conversion is, as I said, a completely separate issue and specific to that format.

@zweger
Copy link
Contributor Author

zweger commented Aug 29, 2017

Correct timezone conversion is impossible when ambiguous timezones aren't even accepted as a possibility.

ZIP files, for instance, will only give you the correct time when the current timezone is the same timezone as on the machine that created the ZIP file. Libarchive's behavior is itself fine - it's well-defined and consistent. The issue is I can't distinguish those timestamps from the timestamps that are correct unconditionally.

@kientzle
Copy link
Contributor

Is it the case that all timestamps for a given archive type will all be UTC or all be local?

Offhand, I think this might be true, but I'm not entirely certain. Definitely tar, cpio, and ISO9660 use UTC; I believe Zip always stores local time. In particular, I suspect that all four timestamps are the same in all cases.

To help me understand: What do you expect to do differently with bogus timestamps? Your request to somehow mark them is probably pretty simple, but our API surface is pretty big already, so I try to make sure I understand people's use cases before adding more to it.

(FWIW, this is very similar to our problem with filename encodings. Sometimes we know the character set for the filename (usually because it's UTF-8) and sometimes we only have a bunch of bytes, and it's a real pain point for some people. Unfortunately, I've not come up with a good way to resolve this without breaking the current API.)

@zweger
Copy link
Contributor Author

zweger commented Aug 31, 2017

Offhand, I think this might be true, but I'm not entirely certain. Definitely tar, cpio, and ISO9660 use UTC; I believe Zip always stores local time. In particular, I suspect that all four timestamps are the same in all cases.

Great, that should keep things pretty simple. I certainly don't need an API any more fine-grained than necessary.

Basically, I want to be able to differentiate bogus timestamps for end-user reporting. UTC timestamps don't require any special attention, but I want to be able to put an asterisk by all of the bogus timestamps to indicate that the exact time is unknown. The bogus timestamps require special attention for date filtering and the like.

Sometimes it's desirable to report archive entry timestamps in a user-specified timezone (possibly different from what is believed to be the timezone of the machine that created the archive). The UTC timestamps can be localized, but the bogus ones really shouldn't be, lest the original data become even more mangled.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants