Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't lie about time #570

Open
davispuh opened this issue Feb 28, 2024 · 10 comments
Open

Don't lie about time #570

davispuh opened this issue Feb 28, 2024 · 10 comments
Assignees
Labels

Comments

@davispuh
Copy link

davispuh commented Feb 28, 2024

Hi,

Currently if you work with zip file that doesn't use UniversalTime/NTFS but DOSTime then you don't actually know correct timestamp due to missing timezone but currently rubyzip would use local timezone which is wrong. I would rather have a Time type without any timezone specified as then I can convert to some better guess at application layer and take into account other data points (eg. which user uploaded zip and so on).

The main issue with current implementation is that some zip files do contain good timestamp (UniversalTime/NTFS) so you can't always assume it's wrong.

I think Time APIs should either return Ruby's Time (when it's correct) or DOSTime when it's not known.

Currently I have worked around this issue like this but it seems quite hacky...

Zip::File.open_buffer(buffer) do |zip|
  zip.each do |entry|
    if entry.extra.key?('UniversalTime') || entry.extra.key?('NTFS')
      # Keep timezone because it's correct
      time = entry.mtime.strftime('%FT%T%:z')
    else
      # Remove timezone because it's wrong to assume host's
      time = entry.mtime.strftime('%FT%T')
    end
    # ...
  end
end
@hainesr
Copy link
Member

hainesr commented Mar 3, 2024

Hi,

I don't think there's any such thing as a Time without a timezone; certainly I haven't been able to create one with standard Ruby classes. If you use Time.new and either set timezone to nil or leave it out, then it still sets a timezone. For me it's UTC, but because I'm in UTC anyway I don't know if it always chooses UTC, or if it chooses whatever is the local timezone. If there is a way to create an instance of Time without a timezone, I have missed it.

I'm not sure it's 'wrong' to assume the host's timezone if there's no timezone supplied, given there has to be a timezone. Remember, if you've got a ZIP archive, you are probably going to be copying the files out of it into your local filesystem, which will be using your local timezone anyway. If coordinating timestamps across different timezones is important, then using the UniversalTime, NTFS or one of the Unix extra fields is required. (Also remember ZIP dates from the pre-internet 16bit floppy disk era of computing, so it's understandable that timezones aren't part of the core standard.)

Even then UniversalTime and other extra fields don't really have any timezone information in them, it's just that they store times that have been (well, should have been) normalized to UTC first. So I think I would expect your code above to always output UTC (+/-00:00)?

I do think the DOSTime class is a distraction, and I think I'm probably going to remove it in a future version of Rubyzip. All it really does is account for the fact that MS-DOS timestamps have 2 second resolution when comparing times, and this breaks when you're comparing against a normal Time object anyway.

I'm sorry that this probably doesn't address your issue above, but I think we currently strike a reasonable balance given the constraints we have.

@davispuh
Copy link
Author

davispuh commented Mar 4, 2024

Ruby doesn't have any built-in class that has Time without timezone and Time.new will use system's timezone which is wrong here since most of time it won't match with whoever created Zip. I meant it more like having custom Time class. It could have method to_time to convert to Ruby's time class for convenience for those who don't really care about this.

Remember, if you've got a ZIP archive, you are probably going to be copying the files out of it into your local filesystem, which will be using your local timezone anyway.

Yeah but it will be converted wrongly to local time. I don't really care about timezone itself, but without that information you can't determine which file is newer or older when comparing 2 zip files. For example user in California creates zip file with local time 2024-03-04 10:00:00 and another user from Australia creates at his local time 2024-03-04 20:00:00. Now if you want to know which zip file contains newer data you need timezone because without it there's no way to tell. In current implementation you won't even know that you're comparing them wrongly so it will show newest file came from Australia which is wrong.

So I think I would expect your code above to always output UTC (+/-00:00)?

It would return system's UTC offset but that doesn't matter, the goal is to distinguish between correct time (with timezone, can be easily converted to UTC or whichever) and time without one which can only be based on guesses.

This issue is not present if everyone's Zip files contains UniversalTime or NTFS but not all Zip files have those hence this issue.

With DOSTime I just meant a way to separate correct timestamp vs one which is unknown due to being in local time without known timezone.

@hainesr
Copy link
Member

hainesr commented Mar 5, 2024

OK, so I think you just need a way to know if the timezone within a DOSTime class can be relied on or not for a given Zip::Entry?

Perhaps we could add a has_timezone? method or similar to Zip::Entry?

@hainesr
Copy link
Member

hainesr commented Mar 5, 2024

Funnily enough, now I look at the code that decodes times out of the extra fields I think there's a bug. It's not reading the times in UTC but assuming local timezone... Oh dear, I better check and fix that for v3.0.

@davispuh
Copy link
Author

davispuh commented Mar 6, 2024

OK, so I think you just need a way to know if the timezone within a DOSTime class can be relied on or not for a given Zip::Entry?

Perhaps we could add a has_timezone? method or similar to Zip::Entry?

Yeah exactly, but I'm not sure if has_timezone? would be correct name, because it's not really about timezone but about exact time point/instant. Maybe better would be absolute_time? - as opposite to relative_time since for that you need a reference point (timezone/location) to get absolute time from it. I think that would make sense so you can convert this Zip relative time to absolute time if you know in which timezone/country/etc it was created.

Funnily enough, now I look at the code that decodes times out of the extra fields I think there's a bug. It's not reading the times in UTC but assuming local timezone... Oh dear, I better check and fix that for v3.0.

Before submitting this issue I actually looked into that as it seemed so but actually the time itself is correct because you use Time.at() (see universal_time.rb) which expects unix timestamp, so timezone is used only for presentation/display purposes but you can easily use time.utc to get that or any timezone library to convert to whichever timezone you're interested in.

@hainesr
Copy link
Member

hainesr commented Mar 6, 2024

Before submitting this issue I actually looked into that as it seemed so but actually the time itself is correct because you use Time.at() (see universal_time.rb) which expects unix timestamp, so timezone is used only for presentation/display purposes but you can easily use time.utc to get that or any timezone library to convert to whichever timezone you're interested in.

Ah yes, Unix timestamps are always in UTC. That's a relief...

@hainesr
Copy link
Member

hainesr commented Mar 6, 2024

So, would Entry#absolute_time? be a useful addition to the API? Where true indicates the presence of accurate timezone information?

@hainesr hainesr self-assigned this Mar 6, 2024
@davispuh
Copy link
Author

davispuh commented Mar 7, 2024

Yes that would work, because then you can do something like

time = entry.mtime
if !entry.absolute_time?
   time = ... # fix mtime based on some logic
end

But my thinking was that it should belong to Zip Time class so entry.mtime.absolute_time? and there could be method like to_time that converts it to Ruby's Time class using system's local timezone (if it didn't came with one), this would make it more clear that it might be wrong timestamp if you convert them and in some cases you could work directly with this Zip's Time class without need to convert it to Ruby's time.

@hainesr
Copy link
Member

hainesr commented Mar 7, 2024

Yes, I was wondering about having DOSTime#absolute_time? as well.

I think it's possible to argue that it's a property of both the time itself (was this time object created with accurate timezone information?) and the Entry (does this Entry contain the right fields to include timezone information?).

I'll look at this.

@hainesr
Copy link
Member

hainesr commented Mar 9, 2024

Hi @davispuh, if you are able to test something, then I've added Entry#absolute_time? and DOSTime#absolute_time? to this branch: https://github.com/hainesr/rubyzip/tree/absolute_time

In short:

  • Entry#absolute_time? simply checks for the presence of a UniversalTime or NTFS extra field.
  • DOSTime sets an internal field if it has been parsed from binary DOS date/time fields and #absolute_time? uses that. So it will be true in all other cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants