Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup gunzipping #33

Closed
wants to merge 2 commits into from
Closed

Speedup gunzipping #33

wants to merge 2 commits into from

Conversation

pjump
Copy link

@pjump pjump commented May 11, 2015

Gunzipping the specs.4.8.gz file on my PC takes about 16 seconds with Gem.gunzip.
It's virtually instantaneous (80ms) with zcat/gunzip (i.e., it it 200
times faster).

This patch uses zcat instead of Gem.gunzip, if it is available. (The
availability check could probably be done better, but I don't know how
to do it properly.)

Just a suggestion.

Gunzipping the specs.4.8.gz file on my PC takes about 16 seconds with `Gem.gunzip`.
It's virtually instantaneous (80ms) with zcat/gunzip (i.e., it it 200
times faster).

This patch uses zcat instead of Gem.gunzip, if it is available. (The
availability check could probably be done better, but I don't know how
to do it properly.)
@raggi
Copy link
Collaborator

raggi commented May 12, 2015

This really just probably needs to use a stream implementation rather than a slurp. Both implementations are using zlib, so there shouldn't be this huge difference if the allocations are no longer being done.

@raggi
Copy link
Collaborator

raggi commented May 12, 2015

General concept:

Zlib::GzipReader.open(input) { |in| open(output, 'w') { |out| out << in.readpartial(65536) while true rescue nil; end } } # TODO rescue EOFError specifically

@pjump
Copy link
Author

pjump commented May 12, 2015

I looked at it in a little more detail now and it's the Gem.gunzip method.
File.write('specs.4.8', Zlib::GzipReader.open('specs.4.8.gz').read)
is virtually instantaneous too. (Doesn't even use streaming, although streaming might make it more memory-friendly, depending on how GzipReader works internally).

@pjump
Copy link
Author

pjump commented May 13, 2015

Thanks for your attention. This https://github.com/rubygems/rubygems/pulls should fix it upstream if they take it. Basically somebody once made Gem::gunzip switch from StringIO to a 300 times slower, in-ruby implementation, which is why the unzipping of the specs takes 16 seconds.

@pjump pjump closed this May 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants