Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues with require_relative #5618

Open
headius opened this Issue Feb 14, 2019 · 10 comments

Comments

Projects
None yet
3 participants
@headius
Copy link
Member

headius commented Feb 14, 2019

Kernel#require_relative has become more and more popular since its introduction, so we need to take another look at our impl and reduce some overhead.

  • Currently implemented in Ruby. This is probably not a large part of its overhead.
  • Uses Kernel#caller to get access to the caller's frame, from which it gets the caller's filename. This is an expensive operation.
  • Uses File.realpath to determine the path of the caller's frame's file, which also appears to be a bottleneck

I will be preparing a PR with various fixes, primarily to eliminate the use of caller.

@headius headius added this to the JRuby 9.2.7.0 milestone Feb 14, 2019

@headius

This comment has been minimized.

Copy link
Member Author

headius commented Feb 15, 2019

The caller issue can be remedied by moving require_relative into a Java method that reads=SCOPE, and then get the filename from the caller's scope. I have a patch in progress.

The other bottleneck, File.realpath, appears to be a more complicated issue. We started using realpath where MRI uses rb_file_absolute_path because we needed to be able to support URLs like classloader resources. The overhead of this logic appears to be where much of the asciidoctor issues lie.

@headius

This comment has been minimized.

Copy link
Member Author

headius commented Feb 15, 2019

@mkristian @kares Hey can you give me some help on the realpath issue? I'm merging a branch to master that resolves the caller situation, but the logic for realpath seems to be extremely heavy right now. I think in the asciidoctor/asciidoctor-diagram#214 case they probably do need the classloader URL expansion to work, so we may need to find ways to make that cheaper.

@mojavelinux @Mogztter You can try a master build of JRuby after my branch lands. Is that repo mentioned in asciidoctor/asciidoctor-diagram#214 all we would need to do to profile this locally?

@headius

This comment has been minimized.

Copy link
Member Author

headius commented Feb 15, 2019

Branch merged as of 6a62520.

require_relative no longer depends on caller, but will force the caller to push a heap scope.

The realpath issues will need more investigation.

@Mogztter

This comment has been minimized.

Copy link
Contributor

Mogztter commented Feb 15, 2019

Thanks @headius I will give it a try this afternoon.

Is that repo mentioned in asciidoctor/asciidoctor-diagram#214 all we would need to do to profile this locally?

Yes, you can use https://github.com/SevereOverfl0w/asciidoctor-diagram-slow-repro to profile locally 👍

@headius

This comment has been minimized.

Copy link
Member Author

headius commented Feb 26, 2019

I believe the remaining issues are simple to explain and tricky to fix...

The basic problem is that we have no good cache from a require path to the actual filesystem path that it required, so a very heavy part of the require_relative logic (the File.realpath call, which is especially heavy in jruby-complete.jar) is repeated for every require_relative, regardless of whether it has already been loaded.

The next big project may have to be a refit of the load logic, so we can handle these cases more efficiently.

@headius

This comment has been minimized.

Copy link
Member Author

headius commented Mar 7, 2019

I've made a truly shocking discovery.

Here's a few benchmarks comparing require_relative to require.

This first one loads the file in question using require_relative before the benchmark, and then it unshifts the current dir into the load path. Surprisingly, require is slower by 30% or so.

Warming up --------------------------------------
    require_relative     7.701k i/100ms
             require     5.477k i/100ms
Calculating -------------------------------------
    require_relative     98.306k (± 2.9%) i/s -    492.864k in   5.018031s
             require     63.069k (± 2.9%) i/s -    317.666k in   5.041468s

This second benchmark instead appends the current dir to load path. By moving the current dir to the end of the load path, require is now slower by almost a factor of 4.

Warming up --------------------------------------
    require_relative     7.723k i/100ms
             require     2.595k i/100ms
Calculating -------------------------------------
    require_relative     95.835k (± 1.8%) i/s -    478.826k in   4.998045s
             require     27.795k (± 3.5%) i/s -    140.130k in   5.048528s

Now we flip the script, and instead use require to load the initial file, first with the current dir unshifted:

Warming up --------------------------------------
    require_relative     3.510k i/100ms
             require    58.437k i/100ms
Calculating -------------------------------------
    require_relative     41.394k (± 2.0%) i/s -    207.090k in   5.005056s
             require    936.902k (± 4.5%) i/s -      4.675M in   5.001518s

And finally, with the current dir appended to load path:

Warming up --------------------------------------
    require_relative     3.309k i/100ms
             require    55.180k i/100ms
Calculating -------------------------------------
    require_relative     39.489k (± 2.0%) i/s -    198.540k in   5.029843s
             require    903.996k (± 4.2%) i/s -      4.525M in   5.015362s

I have learned a few things from this experiment:

  • require_relative and require of the same file do not work well together. If the former comes first, the latter is slower, and vice-versa. In the case of require_relative comes first, require of the same file is massively impaired. I don't know if mixing these is a common pattern.
  • The best performance for require_relative is orders of magnitude slower than the best performance for require.
  • A sampled profile of the benchmarks showed that a large percentage of the samples occurred during a downcall to java.io.UnixFileSystem.getBooleanAttributes0, which is a native downcall that basically does a stat of a given path. When require_relative comes first, this is over 80% of the samples. When require comes first, it's nearly 50% of the samples.

This confirms a few things for me:

  • require_relative has poor performance characteristics regardless of the situation.
  • require has reasonably good performance when standalone, but degrades heavily if require_relative is used with the same files.
  • The remaining overhead in these methods appears to be accessing the filesystem, probably because...
  • ...loaded feature and load path caching are not working, or at least not working very well.

Keep in mind that I ran these on my MBP, which has an SSD. On a system with slower filesystem access (like, say EVERY VM and container system out there) this will be even more extreme. This is likely a major component of our slow startup for large applications.

These results mean I'll be looking into a refit of the load logic sooner rather than later.

Here's MRI for comparison. It doesn't matter which version of the benchmark it is, because MRI does not continue searching the file system for a loaded feature. require_relative is likely slower because they still need to manipulate the path captured from the caller's file, which would require at least one stat, while require loaded feature and load path caching means it's just a couple hash lookups.

Warming up --------------------------------------
    require_relative    34.149k i/100ms
             require    77.995k i/100ms
Calculating -------------------------------------
    require_relative    387.837k (± 2.4%) i/s -      1.946M in   5.021826s
             require      1.065M (± 1.8%) i/s -      5.382M in   5.053533s

The benchmark in question, with require_relative and unshifted load path:

require 'benchmark/ips'

require_relative 'file_for_require_bench'

$:.unshift(Dir.pwd)

Benchmark.ips do |bm|
  bm.warmup = 10
  bm.report("require_relative") do |n|
    while n > 0
      require_relative 'file_for_require_bench'
      n-=1
    end
  end
  bm.report("require") do |n|
    while n > 0
      require 'file_for_require_bench'
      n-=1
    end
  end
end

cc @enebo

@headius

This comment has been minimized.

Copy link
Member Author

headius commented Mar 7, 2019

Note that one big change in Ruby 2.6 stdlib is to use require_relative for all requires that can be done that way.

Also keep in mind that the above results would only affect an application that's lazily doing require or require_relative along a hot path. That appears to be the case for at least asciidoctor.

AND keep in mind that this will be far slower when done against classloader resources, as in asciidoctorj, since classloader resource searching and URI juggling is much more overhead than simply accessing files on disk.

@headius

This comment has been minimized.

Copy link
Member Author

headius commented Mar 7, 2019

Here's results for require_relative when all the files are inside a jar.

    require_relative     1.278k i/100ms
Calculating -------------------------------------
    require_relative     14.529k (± 4.3%) i/s -     72.846k in   5.024407s

So loading from the classloader reduces require_relative performance to something like 1/6 of what it is for direct filesystem (on SSD).

@mojavelinux

This comment has been minimized.

Copy link

mojavelinux commented Mar 14, 2019

That appears to be the case for at least asciidoctor.

Was the case for Asciidoctor Diagram. Asciidoctor core (and many of the libraries) use guards (checking if the symbol is already available).

@Mogztter

This comment has been minimized.

Copy link
Contributor

Mogztter commented Mar 14, 2019

In Asciidoctor Diagram, we were using require_relative in a clojure (and all the files are inside a jar).

Asciidoctor::Extensions.register do
  require_relative 'plantuml/extension'
end

As mentioned by Charles, a very heavy part of the require_relative logic (the File.realpath call, which is especially heavy in jruby-complete.jar) is repeated for every require_relative.
Since the register block is called for every conversion, it hits the performance of Asciidoctor Diagram.

@headius headius modified the milestones: JRuby 9.2.7.0, JRuby 9.2.8.0 Apr 8, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.