Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessive disk read: Fix location of central directory #1514

Merged
merged 3 commits into from Mar 27, 2021

Conversation

pimaster
Copy link
Contributor

In archive_read_support_format_zip.c, it is noted that the slurping of the central directory starts from where it has been told it should be but just reads forward until it finds it.

I have worked on a fix that passes the tests that calculates the actual location of the CentralDirectory and stores it as a separate member of Zip. I was hoping to store the actual location in the existing member but I discovered that I was going to have to rework how it calculated the offset of each individual file as well.

This was discussed in this issue but I should have created a pull request to begin with.

…tral Directory in EOCD and where the OECD was found.

This prevents large reads when a zip archive is preceded by other data.
Copy link
Contributor

@kientzle kientzle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice! I have a couple of suggestions for clearer naming and comments, but nothing serious.

libarchive/archive_read_support_format_zip.c Outdated Show resolved Hide resolved
libarchive/archive_read_support_format_zip.c Outdated Show resolved Hide resolved
libarchive/archive_read_support_format_zip.c Outdated Show resolved Hide resolved
Use the size of the Central Directory and the offset of the EOCD to
calculate the real position.

This trick doesn't work for Zip64 as easily as we are not scanning
backwards to find the PK\x06\x06 entry.
Interestingly, it is never checked so it could be trying to parse
bad files.

Updated based on review
@pimaster
Copy link
Contributor Author

Thanks for the feedback. Taken it on board and updated.
I must have been reading the + i that calls read_eocd as + 1 originally. Good pickup.

As noted, I can't just fix zip64 formats as I'd have to implement the scanning backwards like a standard EOCD is found but for zip64. If we assume a small comment it should be doable but it's really outside what I want to do because I should be getting back to the project that is using libarchive.

@pimaster pimaster requested a review from kientzle March 25, 2021 01:45
Copy link
Contributor

@kientzle kientzle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@@ -3484,6 +3492,8 @@ read_zip64_eocd(struct archive_read *a, struct zip *zip, const char *p)

/* Save the central directory offset for later use. */
zip->central_directory_offset = archive_le64dec(p + 48);
/* TODO: Needs scanning backwards to find the eocd64 instead of assuming */
zip->central_directory_offset_adjusted = zip->central_directory_offset;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for introducing those variables; this is a lot easier to read now.

@kientzle kientzle merged commit ce2aebd into libarchive:master Mar 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants