Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for opaque binary sections #1

Closed
GoogleCodeExporter opened this issue Apr 13, 2015 · 1 comment
Closed

Add support for opaque binary sections #1

GoogleCodeExporter opened this issue Apr 13, 2015 · 1 comment

Comments

@GoogleCodeExporter
Copy link

Background: to read a zip-like archive, code generally starts by seeking 
backwards from the end-of-file looking for the End Of Central Directory (EOCD) 
marker bytes. Once this is located the central directory can be read in, 
yielding a listing of all the file entries in the zip. At this point most 
engines will begin processing one or more entries in the zip file.

There is no standard that defines what comes before the start of the first 
"Local File" entry in the zip nor what comes after the End Of Central 
Directory's entry's final byte. These sections can contain arbitrary data that 
may be needed by some tools, although it seems rare to encounter these in 
practice.

Empirically, all ZIP and APK files that I have run the tool against so far have 
not had any such bits; this is known because the result of applying a patch to 
such a file would produce incorrect results if there were any such "dark bits" 
today (since they are not copied by the patching structure).

There is mention of such files "in the wild", e.g. executable JARs:
http://mesosphere.io/2013/12/07/executable-jars/

... and in older PKZIP-created stuff, there is apparently always a prefix of 
the ASCII chars 'PK', potentially followed by a bunch of stuff specific to 
whatever tool is intended to interpret it, e.g. PKLITE, PKSFX, and so on:
http://www.garykessler.net/library/file_sigs.html


This implies that the library needs a few modifications:
1. A new "OpaqueBits" (or similar) subclass of Part
2. The ability for such OpaqueBits to be present at the start and end of an 
archive.
3. The ability to send these parts along in a patch.

Since such bits are by definition opaque, there's probably nothing we can do 
special for them; running them through the configured delta provider seems the 
only sensible thing to do.

Extending this thought further, it may also be the case that some archives 
contain interstitial bits between entries. Again, this is undefined behavior; 
even if the spec declared that there should be no such bytes, it is an 
almost-certainty that every nontrivial ZIP implementation uses the central 
directory to find all the offsets for all the entries, meaning that it should 
be possible to inject extra bits between entries with no ill effects in most 
cases.

The fix for this latter problem is to generalize the problem and identify any 
and all gaps:
1. Gap between start of file and first local file entry.
2. Gap between the end of a file entry and start of the next file entry.
3. Gap between the final file entry and the start of the central directory.
4. Gap between the final bytes in the central directory and the first byte of 
the End Of Central Directory record.
5. Gap between the final byte of the End Of Central Directory record and the 
end of the file.

There's a hidden bonus to doing this, which is that it will automagically 
enhance the library to support ZIP records for which it has no specific 
support, since any such records would take the form of opaque bits by this 
definition. These would correctly be included in the patch.

This should be a fairly straightforward change; all that is required is to 
generate an offset-based linear ordering of all the entries and find their 
gaps. Since the opaque bits have no discernible structure, they are just binary 
blobs from the perspective of the library.

Original issue reported on code.google.com by andrewha...@google.com on 31 Jul 2014 at 12:08

@andrewhayden
Copy link
Contributor

This issue is no longer relevant. The new patch generation and application logic doesn't have to do anything special to handle these opaque chunks, they just get delta-encoded along with the rest of the stuff that isn't in a proper zip entry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants