You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background: to read a zip-like archive, code generally starts by seeking
backwards from the end-of-file looking for the End Of Central Directory (EOCD)
marker bytes. Once this is located the central directory can be read in,
yielding a listing of all the file entries in the zip. At this point most
engines will begin processing one or more entries in the zip file.
There is no standard that defines what comes before the start of the first
"Local File" entry in the zip nor what comes after the End Of Central
Directory's entry's final byte. These sections can contain arbitrary data that
may be needed by some tools, although it seems rare to encounter these in
practice.
Empirically, all ZIP and APK files that I have run the tool against so far have
not had any such bits; this is known because the result of applying a patch to
such a file would produce incorrect results if there were any such "dark bits"
today (since they are not copied by the patching structure).
There is mention of such files "in the wild", e.g. executable JARs:
http://mesosphere.io/2013/12/07/executable-jars/
... and in older PKZIP-created stuff, there is apparently always a prefix of
the ASCII chars 'PK', potentially followed by a bunch of stuff specific to
whatever tool is intended to interpret it, e.g. PKLITE, PKSFX, and so on:
http://www.garykessler.net/library/file_sigs.html
This implies that the library needs a few modifications:
1. A new "OpaqueBits" (or similar) subclass of Part
2. The ability for such OpaqueBits to be present at the start and end of an
archive.
3. The ability to send these parts along in a patch.
Since such bits are by definition opaque, there's probably nothing we can do
special for them; running them through the configured delta provider seems the
only sensible thing to do.
Extending this thought further, it may also be the case that some archives
contain interstitial bits between entries. Again, this is undefined behavior;
even if the spec declared that there should be no such bytes, it is an
almost-certainty that every nontrivial ZIP implementation uses the central
directory to find all the offsets for all the entries, meaning that it should
be possible to inject extra bits between entries with no ill effects in most
cases.
The fix for this latter problem is to generalize the problem and identify any
and all gaps:
1. Gap between start of file and first local file entry.
2. Gap between the end of a file entry and start of the next file entry.
3. Gap between the final file entry and the start of the central directory.
4. Gap between the final bytes in the central directory and the first byte of
the End Of Central Directory record.
5. Gap between the final byte of the End Of Central Directory record and the
end of the file.
There's a hidden bonus to doing this, which is that it will automagically
enhance the library to support ZIP records for which it has no specific
support, since any such records would take the form of opaque bits by this
definition. These would correctly be included in the patch.
This should be a fairly straightforward change; all that is required is to
generate an offset-based linear ordering of all the entries and find their
gaps. Since the opaque bits have no discernible structure, they are just binary
blobs from the perspective of the library.
Original issue reported on code.google.com by andrewha...@google.com on 31 Jul 2014 at 12:08
The text was updated successfully, but these errors were encountered:
This issue is no longer relevant. The new patch generation and application logic doesn't have to do anything special to handle these opaque chunks, they just get delta-encoded along with the rest of the stuff that isn't in a proper zip entry.
Original issue reported on code.google.com by
andrewha...@google.com
on 31 Jul 2014 at 12:08The text was updated successfully, but these errors were encountered: