Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Extended Time data #76

Merged
merged 7 commits into from Mar 20, 2022
Merged

Conversation

andrebrait
Copy link
Contributor

@andrebrait andrebrait commented Mar 3, 2022

Add parsing of extended time data
- Detects and parses the EXT_TIME part of the file header
- java.util.Date fields added or improved:
- atime, ctime and arctime are now parsed, precision is milliseconds
- mtime now has a maximum precision of milliseconds instead of seconds, when EXT_TIME is present
- Corresponding FileTime fields added
- lastModifiedTime, lastAccessTime and creationTime fields added
- precision for those is 100ns intervals

@andrebrait
Copy link
Contributor Author

Instead of having both Date and FileTime fields, I think I'm gonna keep only FileTime and make the getters and setters for the Date fields just interact get/set to the FileTime fields.

WDYT?

@gotson
Copy link
Member

gotson commented Mar 4, 2022

Hi, could you add a description of what the PR is about? It would be good to understand what this is without having to read the code. Thanks

@codecov
Copy link

codecov bot commented Mar 4, 2022

Codecov Report

Merging #76 (5e0a97b) into master (16161e6) will increase coverage by 22.39%.
The diff coverage is 49.12%.

@@              Coverage Diff              @@
##             master      #76       +/-   ##
=============================================
+ Coverage     31.74%   54.14%   +22.39%     
- Complexity      503      870      +367     
=============================================
  Files            90       90               
  Lines          4914     4972       +58     
  Branches        791      800        +9     
=============================================
+ Hits           1560     2692     +1132     
+ Misses         3153     1954     -1199     
- Partials        201      326      +125     
Impacted Files Coverage Δ
...n/java/com/github/junrar/LocalFolderExtractor.java 81.63% <0.00%> (ø)
.../main/java/com/github/junrar/rarfile/EAHeader.java 0.00% <0.00%> (ø)
.../java/com/github/junrar/rarfile/MacInfoHeader.java 0.00% <0.00%> (ø)
...ain/java/com/github/junrar/rarfile/MainHeader.java 26.31% <0.00%> (+1.99%) ⬆️
...ain/java/com/github/junrar/rarfile/MarkHeader.java 46.66% <0.00%> (-1.61%) ⬇️
...va/com/github/junrar/rarfile/UnixOwnersHeader.java 0.00% <0.00%> (ø)
.../com/github/junrar/unpack/ppm/AnalyzeHeapDump.java 0.00% <0.00%> (ø)
...in/java/com/github/junrar/rarfile/BlockHeader.java 83.33% <50.00%> (-5.56%) ⬇️
...ain/java/com/github/junrar/rarfile/FileHeader.java 55.98% <53.84%> (+7.26%) ⬆️
...java/com/github/junrar/rarfile/SubBlockHeader.java 66.66% <66.66%> (-3.93%) ⬇️
... and 32 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d276f93...5e0a97b. Read the comment docs.

@andrebrait
Copy link
Contributor Author

Hi, could you add a description of what the PR is about? It would be good to understand what this is without having to read the code. Thanks

My bad. I thought I had written it clearly enough, but re-reading the description, I see it's a bit confusing. I'll be editing it.

@andrebrait
Copy link
Contributor Author

Hi, could you add a description of what the PR is about? It would be good to understand what this is without having to read the code. Thanks

Done. Please check it out.


{
//noinspection PointlessBitwiseExpression
int flag = extTimeFlags >>> 0;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gotson This is here just we I can reuse the block below without changing anything. I figured out how to make it into a function and I'm gonna refactor it so it'll be a lot cleaner.

@@ -67,12 +73,20 @@

private final byte[] salt = new byte[SALT_SIZE];

private FileTime lastModifiedTime;

private Date mTime;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gotson given the FileTime fields above are available and contain the most accurate precision which reflects the extended time format better (and it also makes this more like ZipEntry), I'm thinking it we can get rid of the java.util.Date fields and just get them from the FileTime equivalents when calling the getters (documenting the change accordingly, of course).

Ex.:

public Date getMTime() {
    return toDate(lastModifiedTime);
}

@gotson
Copy link
Member

gotson commented Mar 4, 2022

I've never heard of the EXT_TIME header. Would you be able to share some context about what this is, how it can be used (outside this project), and what your proposed change can help do that couldn't be done before?

@andrebrait
Copy link
Contributor Author

andrebrait commented Mar 4, 2022

@gotson I think I mistakenly called it EXT_TIME header but what I actually meant was EXT_TIME section of the header.

I think WinRAR calls it "High precision time format".

Time in RAR 4 and below is stored as an MS-DOS date/time 32-bit integer, which JUnrar decodes into a (local) java.util.Date (the mTime field).
This format can only store up to the seconds of a given time, and it can only store even values for seconds, not odd values (so 2022-03-04T10:00:55 is stored as 2022-03-04T10:00:54, for example).

The EXT_TIME flag in the header signals that the header contains additional data about file times.

If the File Header contains this flag, it has a section which contains:

  • additional data* for the mTime field
  • creation time (in MS-DOS format) and additional data* for it
  • last access time (in MS-DOS format) and additional data* for it

Each additional data section comprises:

  • the fractions of a second for that time, in 100ns increments
  • a flag for correcting the MS-DOS date/time format if the second was actually odd

Parsing this section of the file header enables it to parse dates correctly using all information available in the RAR file.
So, basically, it enables us to increase the precision of the data it already parsed (mTime) from seconds to seconds + 100ns increments, and it enables us to also read the creation and access times for the entry.

For example, the following entry (output from the UnRAR CLI utility):

        Name: files\test\lorem-ipsum.txt
    Modified: 2022-02-23 18:25:55,206205200
     Created: 2022-02-23 18:19:22,854708200
    Accessed: 2022-03-02 18:45:18,695090800

before this patch, is parsed as (notice the seconds which went from 55 to 54 because of the MS-DOS time/date format):

       Name: files\test\lorem-ipsum.txt
   Modified: 2022-02-23 18:25:54
    Created: 
   Accessed: 

And after this patch, it's parsed correctly down to 100ns units.

@andrebrait
Copy link
Contributor Author

andrebrait commented Mar 4, 2022

There was a // TODO rartime -> extended comment in there, so I assumed you knew about this, which is why I didn't go for a more detailed explanation right away. My bad.

@andrebrait
Copy link
Contributor Author

And this is where I took most of the things here, including the pseudo-code:

https://codedread.github.io/bitjs/docs/unrar.html

See the ExtTime structure section.

@andrebrait
Copy link
Contributor Author

@gotson I just pushed some improvements, including now it being fully compatible with 100ns intervals.

I think I ended up pushing some small optimization around array copies 😅
It's just stuff addressing the difference between C++ and Java strings and not needing to copy arrays just to make strings in Java, but it was intended for another branch.

@andrebrait
Copy link
Contributor Author

I reverted most things, leaving only the functionality itself here. That way it's easier to review.

@gotson
Copy link
Member

gotson commented Mar 7, 2022

There was a // TODO rartime -> extended comment in there, so I assumed you knew about this, which is why I didn't go for a more detailed explanation right away. My bad.

I am just the maintainer of the project, the code is more than 15 years old :)

I reverted most things, leaving only the functionality itself here. That way it's easier to review.

Thanks, i was going to ask for splitting up the various changes in multiple PRs, so it's easier to review and merge while keeping a nice commit history.


Thanks a lot for the PR, i will review it when i have more time.

@andrebrait andrebrait changed the title Feature: Extended Time data + small improvements Feature: Extended Time data Mar 9, 2022
@andrebrait
Copy link
Contributor Author

@gotson all PRs were updated. It seems you need to approve the workflows so they can run again against the latest changes.

@gotson
Copy link
Member

gotson commented Mar 11, 2022

Instead of having both Date and FileTime fields, I think I'm gonna keep only FileTime and make the getters and setters for the Date fields just interact get/set to the FileTime fields.

WDYT?

I think FileTime would make more sense.

That would be a breaking change though, make sure you add BREAKING CHANGE: in the commit description, it would be picked up by semantic release and bump the major version accordingly (example).

@andrebrait
Copy link
Contributor Author

I think FileTime would make more sense.

That would be a breaking change though, make sure you add BREAKING CHANGE: in the commit description, it would be picked up by semantic release and bump the major version accordingly (example).

I actually already did that. There's no breakage because the getters and setters are kept, but they get/set the FileTime fields in the appropriate ways.

@gotson
Copy link
Member

gotson commented Mar 11, 2022

I think FileTime would make more sense.
That would be a breaking change though, make sure you add BREAKING CHANGE: in the commit description, it would be picked up by semantic release and bump the major version accordingly (example).

I actually already did that. There's no breakage because the getters and setters are kept, but they get/set the FileTime fields in the appropriate ways.

But you don't expose it externally though. Should it be?

@andrebrait
Copy link
Contributor Author

But you don't expose it externally though. Should it be?

What do you mean? There are getter/setter methods for the FileTime fields that only get/set the objects themselves without any conversion.

FileTimes are immutable, so that's safe to do.

@gotson
Copy link
Member

gotson commented Mar 11, 2022

But you don't expose it externally though. Should it be?

What do you mean? There are getter/setter methods for the FileTime fields that only get/set the objects themselves without any conversion.

FileTimes are immutable, so that's safe to do.

Ah! I didn't see you added new getter/setter and also kept the original.

@gotson
Copy link
Member

gotson commented Mar 11, 2022

But you don't expose it externally though. Should it be?

What do you mean? There are getter/setter methods for the FileTime fields that only get/set the objects themselves without any conversion.
FileTimes are immutable, so that's safe to do.

Ah! I didn't see you added new getter/setter and also kept the original.

I'm wondering if the old getter/setter should be marked as deprecated. Thoughts?

@andrebrait
Copy link
Contributor Author

I'm wondering if the old getter/setter should be marked as deprecated. Thoughts?

I thought about that, but they technically work for what they are for (e.g. save the user from doing some conversions) so I don't feel the need for that. It's up to you and how you'd prefer it.

Copy link
Member

@gotson gotson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gotson
Copy link
Member

gotson commented Mar 18, 2022

@theobisproject i would be happy to have your updated review before merging :)

@theobisproject
Copy link
Contributor

LGTM

@gotson gotson merged commit d5cc784 into junrar:master Mar 20, 2022
@github-actions
Copy link
Contributor

🎉 This PR is included in version 7.5.0 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants