Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort ZIP Entries again #28

Closed
wants to merge 1 commit into from
Closed

Sort ZIP Entries again #28

wants to merge 1 commit into from

Conversation

dlitz
Copy link

@dlitz dlitz commented Feb 21, 2012

This ensures that the entries in the zip file's central directory are grouped
together by folder, with each folder's directory entry coming before the
folder's contents, rather than being in the random order dictated by Ruby's
Hash implementation.

Before:

  $ unzip -l before.zip
  Archive:  before.zip
    Length      Date    Time    Name
  ---------  ---------- -----   ----
          7  03-11-2011 14:14   fruit/apple
          8  03-11-2011 14:14   fruit/orange
          6  03-11-2011 14:14   fruit/kiwi
          0  03-11-2011 14:14   uuid/5D8976B4-C3B5-4FC6-A9BA-E307173F0064
          0  03-11-2011 14:14   uuid/
          0  03-11-2011 14:14   uuid/B1D55A18-860D-4BE1-BCB3-7DFBCD7C8BC4
          0  03-11-2011 14:14   uuid/E5E1F8F6-60A3-4733-9648-86711E7CD02A
          0  03-11-2011 14:14   fruit/
          8  03-11-2011 14:14   fruit/mango
          0  03-11-2011 14:14   Vegetable/
          0  03-11-2011 14:14   uuid/47FD714B-1A38-46AB-B9EF-0E8B2098D3A5
          0  03-11-2011 14:14   uuid/8B6FD77A-9A4C-4D74-B1E0-5B0EBBF0C3A9
          7  03-11-2011 14:14   Vegetable/celery
          8  03-11-2011 14:14   Vegetable/carrot
          6  03-11-2011 14:14   Vegetable/bean
  ---------                     -------
         50                     15 files

After:

  $ unzip -l after.zip
  Archive:  after.zip
    Length      Date    Time    Name
  ---------  ---------- -----   ----
          0  03-11-2011 14:15   Vegetable/
          6  03-11-2011 14:15   Vegetable/bean
          8  03-11-2011 14:15   Vegetable/carrot
          7  03-11-2011 14:15   Vegetable/celery
          0  03-11-2011 14:15   fruit/
          7  03-11-2011 14:15   fruit/apple
          6  03-11-2011 14:15   fruit/kiwi
          8  03-11-2011 14:15   fruit/mango
          8  03-11-2011 14:15   fruit/orange
          0  03-11-2011 14:15   uuid/
          0  03-11-2011 14:15   uuid/47FD714B-1A38-46AB-B9EF-0E8B2098D3A5
          0  03-11-2011 14:15   uuid/5D8976B4-C3B5-4FC6-A9BA-E307173F0064
          0  03-11-2011 14:15   uuid/8B6FD77A-9A4C-4D74-B1E0-5B0EBBF0C3A9
          0  03-11-2011 14:15   uuid/B1D55A18-860D-4BE1-BCB3-7DFBCD7C8BC4
          0  03-11-2011 14:15   uuid/E5E1F8F6-60A3-4733-9648-86711E7CD02A
  ---------                     -------
         50                     15 files

Similar functionality has been committed before (see commit 74e4512),
but was reverted in commit 963f23c because it
breaks things (like EPUB) that require entries to appear in a particular order.

This re-enables sorting by default, but still allows custom ZipEntrySet
implementations to be used when a custom sort order is required.

This ensures that the entries in the zip file's central directory are grouped
together by folder, with each folder's directory entry coming before the
folder's contents, rather than being in the random order dictated by Ruby's
Hash implementation.

Before:

  $ unzip -l before.zip
  Archive:  before.zip
    Length      Date    Time    Name
  ---------  ---------- -----   ----
          7  03-11-2011 14:14   fruit/apple
          8  03-11-2011 14:14   fruit/orange
          6  03-11-2011 14:14   fruit/kiwi
          0  03-11-2011 14:14   uuid/5D8976B4-C3B5-4FC6-A9BA-E307173F0064
          0  03-11-2011 14:14   uuid/
          0  03-11-2011 14:14   uuid/B1D55A18-860D-4BE1-BCB3-7DFBCD7C8BC4
          0  03-11-2011 14:14   uuid/E5E1F8F6-60A3-4733-9648-86711E7CD02A
          0  03-11-2011 14:14   fruit/
          8  03-11-2011 14:14   fruit/mango
          0  03-11-2011 14:14   Vegetable/
          0  03-11-2011 14:14   uuid/47FD714B-1A38-46AB-B9EF-0E8B2098D3A5
          0  03-11-2011 14:14   uuid/8B6FD77A-9A4C-4D74-B1E0-5B0EBBF0C3A9
          7  03-11-2011 14:14   Vegetable/celery
          8  03-11-2011 14:14   Vegetable/carrot
          6  03-11-2011 14:14   Vegetable/bean
  ---------                     -------
         50                     15 files

After:

  $ unzip -l after.zip
  Archive:  after.zip
    Length      Date    Time    Name
  ---------  ---------- -----   ----
          0  03-11-2011 14:15   Vegetable/
          6  03-11-2011 14:15   Vegetable/bean
          8  03-11-2011 14:15   Vegetable/carrot
          7  03-11-2011 14:15   Vegetable/celery
          0  03-11-2011 14:15   fruit/
          7  03-11-2011 14:15   fruit/apple
          6  03-11-2011 14:15   fruit/kiwi
          8  03-11-2011 14:15   fruit/mango
          8  03-11-2011 14:15   fruit/orange
          0  03-11-2011 14:15   uuid/
          0  03-11-2011 14:15   uuid/47FD714B-1A38-46AB-B9EF-0E8B2098D3A5
          0  03-11-2011 14:15   uuid/5D8976B4-C3B5-4FC6-A9BA-E307173F0064
          0  03-11-2011 14:15   uuid/8B6FD77A-9A4C-4D74-B1E0-5B0EBBF0C3A9
          0  03-11-2011 14:15   uuid/B1D55A18-860D-4BE1-BCB3-7DFBCD7C8BC4
          0  03-11-2011 14:15   uuid/E5E1F8F6-60A3-4733-9648-86711E7CD02A
  ---------                     -------
         50                     15 files

Similar functionality has been committed before (see commit 74e4512),
but was reverted in commit 963f23c because it
breaks things (like EPUB) that require entries to appear in a particular order.

This re-enables sorting by default, but still allows custom ZipEntrySet
implementations to be used when a custom sort order is required.
@simonoff
Copy link
Member

Sorry but I think better way when you will sort entries before include them into zip archive.
In this case EPUB files will be work and unzip will work too.
ZipEntrySet is not a Hash, it class with Enumeration mixin.

@dlitz
Copy link
Author

dlitz commented Feb 22, 2012

Alex, please take a closer look at the code:

https://github.com/aussiegeek/rubyzip/blob/156c0c04b54857ab07df845c6722148e1b821683/lib/zip/zip_entry_set.rb:

    def initialize(anEnumerable = [])
      super()
      @entrySet = {}
      anEnumerable.each { |o| push(o) }
    end

    # ...

    def each(&aProc)
      @entrySet.values.each(&aProc)
    end

It's definitely returning entries in arbitrary Hash order.

@simonoff
Copy link
Member

Hm.. ok. In any case I will look deep on it tomorrow for better understanding how to fix EPUP files and sorting.

@dlitz
Copy link
Author

dlitz commented Apr 11, 2012

Ping?

@marcuserronius
Copy link
Contributor

I second not sorting by filename, for epub related reasons. Under Ruby 1.9, ordered hashes currently help with this, but it's just a side effect; and it would be nice to have the option of defining the exact order of the files in the archive. I'm looking into how much work would be required for this.

@marcuserronius
Copy link
Contributor

Just issued a pull request for changes to ZipEntrySet that preserve the original ordering, and insertion ordering. If desired, one could sort the entries and reinsert them to get whatever ordering one wants :)

@simonoff
Copy link
Member

@dlitz sorry for long delay. Can we continue conversation to choose proper way of fix this bug?

@simonoff simonoff closed this in cb6e628 Oct 20, 2013
@dlitz
Copy link
Author

dlitz commented Oct 21, 2013

@simonoff Sorry I didn't respond (I no longer do much Ruby work). I would have made :sort_entries=>true the default (most users don't need a specific ordering for .zip files, so it makes sense to just do the right thing for them), but otherwise it looks good to me.

Thanks!

@simonoff
Copy link
Member

@dlitz I think i will release this feature and will look what others thinks. May be you will be right and I will set sort_entries to true by default.

PS. To where you are moving? Node? Clojure? Erlang?

@dlitz
Copy link
Author

dlitz commented Oct 22, 2013

I went back to C and Python; I'm working at Dropbox now.

Cheers!
Dwayne

Alexander Simonov notifications@github.com wrote:

@dlitz I think i will release this feature and will look what others
thinks. May be you will be right and I will set sort_entries to true by
default.

PS. To where you are moving? Node? Clojure? Erlang?


Reply to this email directly or view it on GitHub:
#28 (comment)

Sent from my Android device with K-9 Mail. Please excuse my brevity.

githubmo pushed a commit to githubmo/rubyzip that referenced this pull request Jan 6, 2014
…ypt-perf

* 'master' of https://github.com/rubyzip/rubyzip:
  Explicitly add the released 2.1.0 Ruby version Remove branch restriction
  Fix Rubinius by adding newly required gems, updating label in .travis.yml
  Update README.md
  Update README.md
  Make File.open_buffer support Tempfiles
  Version bump
  Update Changelog with Ruby 1.9 requirement
  Update README to reflect 1.9 requirement
  Fix rubyzip#106 Set options about restoring ownerships, permissions and times. restore permissions enabled by default.
  fix jRuby Building rubyzip#104
  Fix rubyzip#28 and rubyzip#103
  disable jRuby for a while
  Fix rubyzip#102 recover file permissions if zip file was exist
  Add missing Zip::Entry arguments to Zip::File#get_output_stream. Fixes rubyzip#100
  fix string encoding of zip64 header ids for ruby 2.0
  Add read/write support for zip64 extensions

Conflicts:
	lib/zip/extra_field.rb
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants