Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5x speed anyone ? #396

Closed
wants to merge 1 commit into from
Closed

5x speed anyone ? #396

wants to merge 1 commit into from

Conversation

grosser
Copy link
Contributor

@grosser grosser commented Nov 11, 2012

startup speed for time ruby -r active_record/version -e ''
and would equally affect any other simple ruby program that needs to load only a few other gems,
and should be neutral to bigger apps.

before:

real    0m0.808s
user    0m0.375s
sys 0m0.067s

after:

real    0m0.167s
user    0m0.132s
sys 0m0.033s

It's definitely not perfect, e.g. 1.2.3-rc2 would not be supported, but e.g. tweaking the filenames of stored gems to replace - in versions with -- and maybe adding the platform could solve those edge-cases.
Also making the spec itself aware of being a stub and replacing itself when more info is needed could make it less disrupting for other users of specifications. But for now just a small idea to see if you like it / want to hack on it :)

@raggi
Copy link
Contributor

raggi commented Nov 11, 2012

This is really dangerous, as it's hard to ensure that unstubbing occurs in all the right places.

I've looked at this problem plenty in the past, and the conclusion I have come to is that we would do well to have an index that is much more lightweight than the specs. The index would contain something much more slim, like a [name, version, *dependency_requirements] tuple - alarmingly similar, as it happens, to what bundler needs for its index.

This would avoid having to unmarshal all the other metadata, turning it into larger objects, etc.

@grosser
Copy link
Contributor Author

grosser commented Nov 11, 2012

I think rubygems would also benefit from an index, but picking the 1 of 1311 gems I have installed is not a problem bundler has. Making this as fast as possible (while not impacting bigger apps) would make ruby feasible for small scripts/tools that depend on a fast startup time e.g. ruco/hub gem.

For the stability point I think just adding ensure_fully_loaded to every Specification method that is not name/version should solve the problem.

@evanphx
Copy link
Member

evanphx commented Nov 12, 2012

The idea of delaying loading the details until they are needed is solid, but the implementation is extremely brittle and would make Specification a giant mess because no more attr_* could be used since it would have to constantly check if things were loaded.

A better approach is to put a add a new class that simply lazily loads the real specification. It doesn't have to do anything clever, just a #specification method that loads the object if it's not loaded. Since it's new, doesn't need to use any method_missing tricks or anything either.

@grosser
Copy link
Contributor Author

grosser commented Nov 12, 2012

so .all_specs_as_stubs -> [Gem::StubSpec] -> .select{...} -> .map(&:spec)

On Sun, Nov 11, 2012 at 8:23 PM, Evan Phoenix notifications@github.comwrote:

The idea of delaying loading the details until they are needed is solid,
but the implementation is extremely brittle and would make Specification a
giant mess because no more attr_* could be used since it would have to
constantly check if things were loaded.

A better approach is to put a add a new class that simply lazily loads the
real specification. It doesn't have to do anything clever, just a
#specification method that loads the object if it's not loaded. Since it's
new, doesn't need to use any method_missing tricks or anything either.


Reply to this email directly or view it on GitHubhttps://github.com//pull/396#issuecomment-10277126.

@evanphx
Copy link
Member

evanphx commented Nov 12, 2012

I'm not sure what you mean.

@grosser
Copy link
Contributor Author

grosser commented Nov 12, 2012

make a new .all_spec_as_stubs method that returns stubs, then use this to
do the basic selection of gems (via name + version), then convert to real
specs and do platform
and everything else

On Mon, Nov 12, 2012 at 11:06 AM, Evan Phoenix notifications@github.comwrote:

I'm not sure what you mean.


Reply to this email directly or view it on GitHubhttps://github.com//pull/396#issuecomment-10299878.

@evanphx
Copy link
Member

evanphx commented Nov 12, 2012

Yep, that's correct.

@grosser
Copy link
Contributor Author

grosser commented Nov 13, 2012

implemented it, how do you like it ? :)

matches = Gem::Specification.stubs.map { |name, version, file|
next unless self.name == name and requirement.satisfied_by? version
Gem::Specification.load(file)
}.compact
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use .map + .compact, just use .each and append to a local. The array in question could be very big and then we'd just turn around and make it smaller again. Better to just do it right the first time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

On Tue, Nov 13, 2012 at 8:18 AM, Evan Phoenix notifications@github.comwrote:

In lib/rubygems/dependency.rb:

@@ -245,10 +245,10 @@ def merge other

DOC: this method needs either documented or :nodoc'd

def matching_specs platform_only = false

  • matches = Gem::Specification.find_all { |spec|
  •  self.name === spec.name and # TODO: == instead of ===
    
  •    requirement.satisfied_by? spec.version
    
  • }
  • matches = Gem::Specification.stubs.map { |name, version, file|
  •  next unless self.name == name and requirement.satisfied_by? version
    
  •  Gem::Specification.load(file)
    
  • }.compact

Please don't use .map + .compact, just use .each and append to a local.
The array in question could be very big and then we'd just turn around and
make it smaller again. Better to just do it right the first time.


Reply to this email directly or view it on GitHubhttps://github.com//pull/396/files#r2111612.

@grosser
Copy link
Contributor Author

grosser commented Nov 13, 2012

@evanphx if you think it's a solid approach/merge-worthy I'll set out and try to fix up all the failing tests, seems like a lot of them use gemspecs with only a name and no version.

@evanphx
Copy link
Member

evanphx commented Nov 13, 2012

This approach seems valid, now we just need to get the tests passing!

@evanphx
Copy link
Member

evanphx commented Nov 13, 2012

@drbrain Do you think that depending on the format of the gemspec filename in the specs dir is ok?

@evanphx
Copy link
Member

evanphx commented Nov 13, 2012

After talking with @drbrain we think depending on the filename as sacred data is not a good idea. The filename is actually ambiguous with regard to the gem name and the version and also doesn't include the platform.

I do love this idea though. One way is to, when a gem is installed, write out another file with just the name, version, and platform that we can read in.

@evanphx
Copy link
Member

evanphx commented Nov 13, 2012

Just thought of another idea that I like the best:

When we write the .gemspec file out, add an extra line to the top, something like: # compact: foo 1.0.0 ruby

Then we need to make the stubs, we can read the file and just look at the first line. If we see this new header, we don't need to eval the whole file. If we don't (because they're generated by old rubygems or whatever) we just eval the whole thing to get the info, in which case the stub is actively loaded instead of lazily loaded.

@luislavena
Copy link
Member

@evanphx stole my idea, I was just typing something quite similar when his new comment showed up 😉

Not sure about opening the files and just reading one line will compare with the performance of directory globing and string splitting, but gemspec for platform specific gems already include the platform when installed:

C:\Users\Luis>gem list gem

gem-exefy (1.2.0 universal-mingw32)
irb(main):001:0> spec = Gem::Specification.find_by_name("gem-exefy")
=> #<Gem::Specification name=gem-exefy version=1.2.0>
irb(main):002:0> spec.spec_file
=> "C:/Users/Luis/Tools/Ruby/ruby-1.9.3-p327-i386-mingw32/lib/ruby/gems/1.9.1/specifications/gem-exefy-1.2.0-universal-mingw32.gemspec"
irb(main):003:0> File.basename(spec.spec_file)
=> "gem-exefy-1.2.0-universal-mingw32.gemspec"

@grosser
Copy link
Contributor Author

grosser commented Nov 14, 2012

on an ssd it would still be kind of fast
500 gemspecs:

  • glob: 0.002
  • reading first lines: 0.02
    so 5000 = 0.1

so a step in the right direction, but getting rid of this lookup all
together would be nice e.g. caching the list of all gem names + versions +
platform in 1 file cache and then expiring it on gem install or maybe even
check mtime of newest gemspec > mtime cache -> write new cache

On Tue, Nov 13, 2012 at 3:48 PM, Luis Lavena notifications@github.comwrote:

@evanphx https://github.com/evanphx stole my idea, I was just typing
something quite similar when his new comment showed up [image: 😉]

Not sure about opening the files and just reading one line will compare
with the performance of directory globing and string splitting, but gemspec
for platform specific gems already include the platform when installed:

C:\Users\Luis>gem list gem

gem-exefy (1.2.0 universal-mingw32)

irb(main):001:0> spec = Gem::Specification.find_by_name("gem-exefy")
=> #<Gem::Specification name=gem-exefy version=1.2.0>
irb(main):002:0> spec.spec_file
=> "C:/Users/Luis/Tools/Ruby/ruby-1.9.3-p327-i386-mingw32/lib/ruby/gems/1.9.1/specifications/gem-exefy-1.2.0-universal-mingw32.gemspec"
irb(main):003:0> File.basename(spec.spec_file)
=> "gem-exefy-1.2.0-universal-mingw32.gemspec"


Reply to this email directly or view it on GitHubhttps://github.com//pull/396#issuecomment-10349251.

@grosser
Copy link
Contributor Author

grosser commented Nov 14, 2012

maybe make md5 of list of all gemspec file names -> use it as cache

On Tue, Nov 13, 2012 at 8:35 PM, Michael Grosser
grosser.michael@gmail.comwrote:

on an ssd it would still be kind of fast
500 gemspecs:

  • glob: 0.002
  • reading first lines: 0.02
    so 5000 = 0.1

so a step in the right direction, but getting rid of this lookup all
together would be nice e.g. caching the list of all gem names + versions +
platform in 1 file cache and then expiring it on gem install or maybe even
check mtime of newest gemspec > mtime cache -> write new cache

On Tue, Nov 13, 2012 at 3:48 PM, Luis Lavena notifications@github.comwrote:

@evanphx https://github.com/evanphx stole my idea, I was just typing
something quite similar when his new comment showed up [image: 😉]

Not sure about opening the files and just reading one line will compare
with the performance of directory globing and string splitting, but gemspec
for platform specific gems already include the platform when installed:

C:\Users\Luis>gem list gem

gem-exefy (1.2.0 universal-mingw32)

irb(main):001:0> spec = Gem::Specification.find_by_name("gem-exefy")
=> #<Gem::Specification name=gem-exefy version=1.2.0>
irb(main):002:0> spec.spec_file
=> "C:/Users/Luis/Tools/Ruby/ruby-1.9.3-p327-i386-mingw32/lib/ruby/gems/1.9.1/specifications/gem-exefy-1.2.0-universal-mingw32.gemspec"
irb(main):003:0> File.basename(spec.spec_file)
=> "gem-exefy-1.2.0-universal-mingw32.gemspec"


Reply to this email directly or view it on GitHubhttps://github.com//pull/396#issuecomment-10349251.

@evanphx
Copy link
Member

evanphx commented Nov 14, 2012

The main issue with collecting all the info about the gems in a gemdir is the inherent concurrency problem of maintaining that index. It's trivial (and happens all the time) for a directory to be installed into at exactly the same time by 2 different users. We'd have to add a bunch of locking code, etc to deal with this case.

For that reason, exploring an index-less approach is easier to start with. If we decide that an indexed approach gets us enough additional performance to justify, we can discuss how we want to go about making that rock solid.

@grosser
Copy link
Contributor Author

grosser commented Nov 14, 2012

not sure why you would need locking when we simply e.g. grab the newest
.gemspec and use that timestamp for caching, but either way we can go with
the 'grab simple info from file' approach first

On Wed, Nov 14, 2012 at 7:39 AM, Evan Phoenix notifications@github.comwrote:

The main issue with collecting all the info about the gems in a gemdir is
the inherent concurrency problem of maintaining that index. It's trivial
(and happens all the time) for a directory to be installed into at exactly
the same time by 2 different users. We'd have to add a bunch of locking
code, etc to deal with this case.

For that reason, exploring an index-less approach is easier to start with.
If we decide that an indexed approach gets us enough additional performance
to justify, we can discuss how we want to go about making that rock solid.


Reply to this email directly or view it on GitHubhttps://github.com//pull/396#issuecomment-10370578.

@evanphx
Copy link
Member

evanphx commented Nov 14, 2012

Sounds good! Are you going to start coding it up? (I'm eager as well so I can help)

@grosser
Copy link
Contributor Author

grosser commented Nov 14, 2012

I'm 'at work' atm, so I can start this evening, if you got some time on
your hands and are eager to do it go ahead :)

On Wed, Nov 14, 2012 at 8:04 AM, Evan Phoenix notifications@github.comwrote:

Sounds good! Are you going to start coding it up? (I'm eager as well so I
can help)


Reply to this email directly or view it on GitHubhttps://github.com//pull/396#issuecomment-10371607.

@evanphx
Copy link
Member

evanphx commented Nov 14, 2012

Ok! If I find some time today I'll hack on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants