Skip to content

Loading…

improve performance of find_files_from_load_path #971

Open
wants to merge 1 commit into from

5 participants

@tenderlove

In order to support case-insensitive file systems, Dir#[] is slower in
trunk Ruby:

https://bugs.ruby-lang.org/issues/10015

This patch speeds up find_files_from_load_path a little on 2.1.2
(about 10%) and a lot on Ruby trunk (10x faster). Here is my benchmark:

require 'benchmark/ips'

class Hello
  def self.old_find_files_from_load_path glob # :nodoc:
    $LOAD_PATH.map { |load_path|
      Dir["#{File.expand_path glob, load_path}#{Gem.suffix_pattern}"]
    }.flatten.select { |file| File.file? file.untaint }
  end

  def self.new_find_files_from_load_path glob # :nodoc:
    search = "#{glob}#{Gem.suffix_pattern}"
    $LOAD_PATH.map { |load_path|
      Dir.chdir(load_path) {
        Dir[search].map! { |f| File.join load_path, f }
      }
    }.flatten.select { |file| File.file? file.untaint }
  end
end

search = "minitest/*_plugin.rb"

Benchmark.ips do |x|
  x.report('old') do
    Hello.old_find_files_from_load_path search
  end

  x.report('new') do
    Hello.new_find_files_from_load_path search
  end
end

Results on 2.1.2:

[aaron@higgins rubygems (master)]$ ruby -v -I lib flp.rb
ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-darwin13.0]
Calculating -------------------------------------
                 old       283 i/100ms
                 new       304 i/100ms
-------------------------------------------------
                 old     2903.8 (±4.3%) i/s -      14716 in   5.078291s
                 new     3180.5 (±4.7%) i/s -      16112 in   5.078193s

Results against trunk:

[aaron@higgins rubygems (master)]$ ruby -v -I lib flp.rb
ruby 2.2.0dev (2014-07-09 trunk 46759) [x86_64-darwin13]
Calculating -------------------------------------
                 old         8 i/100ms
                 new        85 i/100ms
-------------------------------------------------
                 old       84.6 (±7.1%) i/s -        424 in   5.046520s
                 new      858.7 (±3.3%) i/s -       4335 in   5.053763s
@tenderlove tenderlove improve performance of find_files_from_load_path
In order to support case-insensitive file systems, Dir#[] is slower in
trunk Ruby:

  https://bugs.ruby-lang.org/issues/10015

This patch speeds up `find_files_from_load_path` a little on 2.1.2
(about 10%) and a lot on Ruby trunk (10x faster).  Here is my benchmark:

```ruby
require 'benchmark/ips'

class Hello
  def self.old_find_files_from_load_path glob # :nodoc:
    $LOAD_PATH.map { |load_path|
      Dir["#{File.expand_path glob, load_path}#{Gem.suffix_pattern}"]
    }.flatten.select { |file| File.file? file.untaint }
  end

  def self.new_find_files_from_load_path glob # :nodoc:
    search = "#{glob}#{Gem.suffix_pattern}"
    $LOAD_PATH.map { |load_path|
      Dir.chdir(load_path) {
        Dir[search].map! { |f| File.join load_path, f }
      }
    }.flatten.select { |file| File.file? file.untaint }
  end
end

search = "minitest/*_plugin.rb"

Benchmark.ips do |x|
  x.report('old') do
    Hello.old_find_files_from_load_path search
  end

  x.report('new') do
    Hello.new_find_files_from_load_path search
  end
end
```

Results on 2.1.2:

```
[aaron@higgins rubygems (master)]$ ruby -v -I lib flp.rb
ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-darwin13.0]
Calculating -------------------------------------
                 old       283 i/100ms
                 new       304 i/100ms
-------------------------------------------------
                 old     2903.8 (±4.3%) i/s -      14716 in   5.078291s
                 new     3180.5 (±4.7%) i/s -      16112 in   5.078193s
```

Results against trunk:

```
[aaron@higgins rubygems (master)]$ ruby -v -I lib flp.rb
ruby 2.2.0dev (2014-07-09 trunk 46759) [x86_64-darwin13]
Calculating -------------------------------------
                 old         8 i/100ms
                 new        85 i/100ms
-------------------------------------------------
                 old       84.6 (±7.1%) i/s -        424 in   5.046520s
                 new      858.7 (±3.3%) i/s -       4335 in   5.053763s
```
7830c83
@evanphx
RubyGems member

We can't use Dir.chdir because it's process wide and thusly extremely rude to use in library code.

@tenderlove

Without the chdir, I'm unsure how to do this and make it faster. Any ideas? Here's a new benchmark:

require 'benchmark/ips'
require 'find'

class Hello
  def self.old_find_files_from_load_path glob # :nodoc:
    $LOAD_PATH.map { |load_path|
      Dir["#{File.expand_path glob, load_path}#{Gem.suffix_pattern}"]
    }.flatten.select { |file| File.file? file.untaint }
  end

  def self.chdir_find_files_from_load_path glob # :nodoc:
    search = glob + Gem.suffix_pattern
    $LOAD_PATH.map { |load_path|
      Dir.chdir(load_path) {
        Dir[search].map! { |f| File.join load_path, f }
      }
    }.flatten.select { |file| File.file? file.untaint }
  end

  def self.finder_find_files_from_load_path glob # :nodoc:
    files = []
    $LOAD_PATH.each { |load_path|
      pattern = "#{File.expand_path glob, load_path}#{Gem.suffix_pattern}"
      Find.find(load_path) do |file|
        files << file.untaint if File.fnmatch(pattern, file, File::FNM_EXTGLOB)
                                 File.file?(file)
      end
    }
    files
  end
end

search = "minitest/*_plugin.rb"

Benchmark.ips do |x|
  x.report('old')    { Hello.old_find_files_from_load_path search }
  x.report('chdir')  { Hello.chdir_find_files_from_load_path search }
  x.report('finder') { Hello.finder_find_files_from_load_path search }
end

Result:

[aaron@higgins rubygems (master)]$ ruby -v -I lib flp.rb 
ruby 2.2.0dev (2014-07-09 trunk 46759) [x86_64-darwin13]
Calculating -------------------------------------
                 old         8 i/100ms
               chdir        84 i/100ms
              finder         1 i/100ms
-------------------------------------------------
                 old       87.4 (±4.6%) i/s -        440 in   5.044883s
               chdir      842.8 (±5.6%) i/s -       4200 in   5.001708s
              finder       17.9 (±11.2%) i/s -         89 in   5.044909s
[aaron@higgins rubygems (master)]$ ruby -v -I lib flp.rb
ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-darwin13.0]
Calculating -------------------------------------
                 old       280 i/100ms
               chdir       313 i/100ms
              finder         3 i/100ms
-------------------------------------------------
                 old     2999.0 (±4.9%) i/s -      15120 in   5.055591s
               chdir     3328.2 (±5.1%) i/s -      16589 in   5.000112s
              finder       36.9 (±5.4%) i/s -        186 in   5.061180s

I was thinking, since this method is just for finding files in installed gems, the Gem specs should have a list of the files for each gem. Could we just loop through those and run fnmatch against them? Then we don't have to do this FS traversal?

@evanphx
RubyGems member

find_files_from_load_path is used by our custom require, right? That code path (and that code path alone) might be ok to do a chdir in because the assumption is that there are no other threads to make a mess of. MRI already does this, using optimization for single thread cases by using rb_thread_alone(). Should we do the same by using Thread.list.size == 1, and in that case, use chdir?

@tenderlove

I wasn't actually seeing the find_files_from_load_path being called from the custom require. My profiling was pointing at Gem.find_files that minitest uses for finding plugins.

@drbrain
RubyGems member

The installed specs no longer contain the list of files because they take up memory that is otherwise useless to most users.

In the 1.8.7 (1.9.1?) timeframe I tried looking up files from the in-memory arrays instead of using the filesystem and the filesystem was significantly faster, at least for activation.

Now that RubyGems uses StubSpecifications which are tiny in comparison to the file list-free Specifications maybe the file list can be restored without impacting memory for activation purposes. It's less work to use the finder, though.

@tenderlove

@drbrain maybe we could keep the files list in a different place form the .gemspec and just load them when we need them? I'll have to test, but I suspect that a marshal load for each gemspec plus searching through that list would be faster than the glob in trunk.

One problem though, the original gemspec files list won't contain extensions, and this method will return extensions. How could we handle that? I suppose we could write out the files list + extensions to a cache after install (but I'm just tossing out ideas).

@drbrain
RubyGems member

Storing files separately would be fine along with creating the list post-build to have a complete cache.

@drbrain drbrain modified the milestone: 2.4, 2.5
@drbrain
RubyGems member

I implemented searching through a cached set of requirable files here:

https://gist.github.com/drbrain/3ea0befdadc187cf88dc

This is half as fast as the old Dir.glob method. I chose saving the files as a newline-separated list because loading it was slightly faster than Marshal and much faster than saving them ruby source code to send to eval.

It seems modifying this pull request to take a lock before Dir.chdir is the best way.

@drbrain drbrain modified the milestone: 2.5, 2.6
@krainboltgreene krainboltgreene added bug fix and removed bug report labels
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jul 11, 2014
  1. @tenderlove

    improve performance of find_files_from_load_path

    tenderlove committed
    In order to support case-insensitive file systems, Dir#[] is slower in
    trunk Ruby:
    
      https://bugs.ruby-lang.org/issues/10015
    
    This patch speeds up `find_files_from_load_path` a little on 2.1.2
    (about 10%) and a lot on Ruby trunk (10x faster).  Here is my benchmark:
    
    ```ruby
    require 'benchmark/ips'
    
    class Hello
      def self.old_find_files_from_load_path glob # :nodoc:
        $LOAD_PATH.map { |load_path|
          Dir["#{File.expand_path glob, load_path}#{Gem.suffix_pattern}"]
        }.flatten.select { |file| File.file? file.untaint }
      end
    
      def self.new_find_files_from_load_path glob # :nodoc:
        search = "#{glob}#{Gem.suffix_pattern}"
        $LOAD_PATH.map { |load_path|
          Dir.chdir(load_path) {
            Dir[search].map! { |f| File.join load_path, f }
          }
        }.flatten.select { |file| File.file? file.untaint }
      end
    end
    
    search = "minitest/*_plugin.rb"
    
    Benchmark.ips do |x|
      x.report('old') do
        Hello.old_find_files_from_load_path search
      end
    
      x.report('new') do
        Hello.new_find_files_from_load_path search
      end
    end
    ```
    
    Results on 2.1.2:
    
    ```
    [aaron@higgins rubygems (master)]$ ruby -v -I lib flp.rb
    ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-darwin13.0]
    Calculating -------------------------------------
                     old       283 i/100ms
                     new       304 i/100ms
    -------------------------------------------------
                     old     2903.8 (±4.3%) i/s -      14716 in   5.078291s
                     new     3180.5 (±4.7%) i/s -      16112 in   5.078193s
    ```
    
    Results against trunk:
    
    ```
    [aaron@higgins rubygems (master)]$ ruby -v -I lib flp.rb
    ruby 2.2.0dev (2014-07-09 trunk 46759) [x86_64-darwin13]
    Calculating -------------------------------------
                     old         8 i/100ms
                     new        85 i/100ms
    -------------------------------------------------
                     old       84.6 (±7.1%) i/s -        424 in   5.046520s
                     new      858.7 (±3.3%) i/s -       4335 in   5.053763s
    ```
Showing with 4 additions and 1 deletion.
  1. +4 −1 lib/rubygems.rb
View
5 lib/rubygems.rb
@@ -439,8 +439,11 @@ def self.find_files(glob, check_load_path=true)
end
def self.find_files_from_load_path glob # :nodoc:
+ search = glob + Gem.suffix_pattern
$LOAD_PATH.map { |load_path|
- Dir["#{File.expand_path glob, load_path}#{Gem.suffix_pattern}"]
+ Dir.chdir(load_path) {
+ Dir[search].map! { |f| File.join load_path, f }
+ }
}.flatten.select { |file| File.file? file.untaint }
end
Something went wrong with that request. Please try again.