Skip to content

Don't evaluate every single gemspec when rubygems loads #376

Closed
wants to merge 6 commits into from

4 participants

@charliesome

This patch adds gemspec caching to RubyGems. At the moment, every time RubyGems loads, it loads every single .gemspec and evaluates it. This is a huge performance issue.

This patch caches these Gem::Specification objects as marshalled data in the same directory as the gemspecs. In my testing, this significantly reduces the time taken for RubyGems to load.

I've been using this script to test load times:

require "rubygems"
require "yajl/version"

puts Yajl::VERSION

Without caching, this script takes about 280 milliseconds to execute. With this patch, the time taken drops down to about 210 milliseconds.

Cheers

@luislavena luislavena and 1 other commented on an outdated diff Sep 17, 2012
lib/rubygems/specification.rb
@@ -900,9 +900,17 @@ def self.load file
code.untaint
begin
- spec = eval code, binding, file
+ if File.exist? "#{file}.cache"
+ spec = Marshal.load File.read "#{file}.cache"
@luislavena
RubyGems member
luislavena added a note Sep 17, 2012

This is wrong, don't use File.read, use Gem.read_binary, see other places where reading of binary contents is performed.

@charliesome
charliesome added a note Sep 17, 2012

Fixed in 953d966

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@luislavena luislavena commented on the diff Sep 17, 2012
lib/rubygems/requirement.rb
@@ -143,7 +143,7 @@ def hash # :nodoc:
end
def marshal_dump # :nodoc:
- fix_syck_default_key_in_requirements
+ fix_syck_default_key_in_requirements if defined? YAML
@luislavena
RubyGems member
luislavena added a note Sep 17, 2012

YAML will be defined if Psych is also used. Why are you conditionally checking that here and not inside the method?

@charliesome
charliesome added a note Sep 17, 2012

This was causing a stack overflow before I added this conditional because I'm marshalling inside Gem::Specification.load which is called when something is required. As it happens, fix_syck_default_key_in_requirements calls Gem.load_yaml and kicks off the infinite recursion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@luislavena
RubyGems member

@charliesome there is another detail that I missed: Marshalled files needs to be versioned, similar to gem quick specs that are grabbed from rubygems.org

As for this comment I'm not still comfortable with the solution.

But before you work harder on this, perhaps @drbrain and @evanphx can chime in and provide some feedback.

It will be great we also have a better benchmark (more quantitative, number of installed gems, etc) to help evaluate the benefits of this change.

Thank you

@evanphx evanphx commented on an outdated diff Sep 17, 2012
lib/rubygems/specification.rb
if Gem::Specification === spec
+ if needs_cache_write
+ File.open("#{file}.cache", "wb") { |f| f.write Marshal.dump spec } rescue nil
@evanphx
RubyGems member
evanphx added a note Sep 17, 2012

I won't merge any code with "rescue nil" in it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@evanphx
RubyGems member
evanphx commented Sep 17, 2012

This doesn't deal with the gemspec being newer (and thusly more valid) than the cache. This happens when people reinstall the same version of a gem as it's changing (for instance, in development).

That has to be taken into account.

@drbrain
RubyGems member
drbrain commented Sep 18, 2012

I don't see any tests for this new behavior, either. Since this is in the critical path, tests are absolutely necessary. Sure, we can add the tests for this pull request, but it will delay inclusion.

As @luislavena said, we need a reproducible benchmark script to verify the performance improvement.

@charliesome

@drbrain: I don't have much experience with RubyGem's tests so I'm not sure how I'd go about testing this. The existing tests will make sure that this doesn't break anything, but it'd be nice to test that it actually caches the gemspecs. Have you got any pointers?

@evanphx
RubyGems member
evanphx commented Oct 6, 2012

I admire the approach, and I've gone down this route too. But I think that such a small gain it creates a huge headache of extra files, extra checks, etc. I wonder if some good profiling of the current startup could identify some places where the existing code to be tweaked to get the same gains.

That being said, I think that we should explore bigger ways to speed up startup than this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.