New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralize require statements #6910

Merged
merged 1 commit into from Apr 10, 2018

Conversation

Projects
None yet
7 participants
@ashmaroli
Member

ashmaroli commented Apr 6, 2018

With every call to require, Ruby has to check through the $LOADED_FEATURES array to see if the requested file has already been loaded beforehand. Therefore, multiple require calls to the same file have an overall negative effect on performance.

This patch removes the multiple require calls (esp. from within frequently called instance methods) and adds them to lib/jekyll.rb

Also includes a benchmark script to compare require calls per file-load and per method-call.

@ashmaroli ashmaroli requested review from pathawks and parkr Apr 6, 2018

@ashmaroli ashmaroli requested a review from jekyll/stability Apr 6, 2018

@@ -1,9 +1,5 @@
# frozen_string_literal: true
require "addressable/uri"
require "json"
require "liquid"

This comment has been minimized.

@pathawks

pathawks Apr 6, 2018

Member

I can see how removing require from a frequently called method could lead to performance improvements, but I can’t see how moving these static requires to a different file could result in any meaningful difference (even considering deduplication). I worry that it obfuscates why we are requiring some things if the require is no where near where we are using it.

Can you explain a bit more of the motivation behind this change?

This comment has been minimized.

@ashmaroli

ashmaroli Apr 6, 2018

Member

Understandably, Jekyll::Filters is tightly coupled with lib/jekyll.rb — one cannot simply require 'jekyll/filters' without require 'jekyll' first. i.e. lib/jekyll.rb has to be loaded first.

Loading lib/jekyll.rb will require among others:

  • liquid (already does so in master),
  • addressable/uri (a change made in this PR itself — this module is necessary in both Jekyll::Filters and Jekyll::URL) and
  • json (another change made in this PR — it is necessary in Jekyll::Filters, serve/live_reload_reactor.rb, and in jekyll/drops)

So, as part of removing duplicate require calls in this PR, I removed these static requires.

@ashmaroli ashmaroli added this to the v3.8.0 milestone Apr 9, 2018

@parkr

This comment has been minimized.

Member

parkr commented Apr 9, 2018

Can you explain how this speeds things up? require is cached heavily so these calls are quite cheap once they have happened once. Do you have a flame graph or profile which shows how much time this saves on a standard site? My guess, as Pat mentioned, is that unless it’s in a method called a whole awful lot, we won’t see any improvement from this.

@ashmaroli

This comment has been minimized.

Member

ashmaroli commented Apr 10, 2018

@parkr This "optimization" is not about speeding up the build process.
It's about optimization in the general sense.

The rationale behind this PR is simple:

  • Every call to require is going to tell the interpreter to iterate through the global $LOADED_FEATURES array (which is simply going to increase in size with every successful call to a file, and especially when its a 3rd-party dependency library), check if the said file is listed in the array, and then proceed to "load the file" or "return false" as per the case.
  • Yes, Array#include? is optimized at C-level and is fast for small-medium arrays, its still an O(n) call and this is an "improvement" we can make without any unwanted side-effects since lib/jekyll.rb loads every other *.rb file sooner or later.
@oe

This comment has been minimized.

Member

oe commented Apr 10, 2018

I agree that it might be helpful to centralize require calls at the top-level, I'm just worried that we might be ripping some call that was put at its respective level intentionally out of its context by doing this. @ashmaroli, have you blamed some of these calls to find out their context?

@oe

oe approved these changes Apr 10, 2018

That being said, my concerns shouldn't stop this from landing!

@ashmaroli

This comment has been minimized.

Member

ashmaroli commented Apr 10, 2018

worried that we might be ripping some call that was put at its respective level intentionally

Fortunately, all of the files in the diff use at least one call to the Jekyll module either directly or indirectly via Jekyll's classes.. so the interpreter is going to error-out if one tries to load (or require) any of these files without requiring lib/jekyll.rb first..
..and simply requiring the main library file first is going to satisfy the external dependencies used in the files in the diff here.

have you blamed some of these calls..?

No, I haven't. But I'm guessing, they were probably added for historical reasons when those files were created and tested lexically.. (e.g. ruby test/test_filters.rb -r ./lib/jekyll/filters.rb) where the test/helper.b would load ./lib/jekyll.rb before the test-file is loaded into memory.

@DirtyF

This comment has been minimized.

Member

DirtyF commented Apr 10, 2018

This "optimization" is not about speeding up the build process. It's about optimization in the general sense.

Is it a common practice in Ruby? I'm more concerned about the developer experience here.
It's easier to understand what libraries are used in a specific file, no?

@ashmaroli

This comment has been minimized.

Member

ashmaroli commented Apr 10, 2018

Is it a common practice in Ruby? I'm more concerned about the developer experience here.

Its tough to answer that because there is no definite best-practices in Ruby. Every developer chooses a style they prefer.. Even Rubocop doesn't seem to care about it..

Looking into our own backyard, the main entry file ./lib/jekyll.rb (on master) loads external libs even if they're not used within the ./lib/jekyll.rb file.

jekyll/lib/jekyll.rb

Lines 20 to 35 in a27d8fa

# stdlib
require "pathutil"
require "forwardable"
require "fileutils"
require "time"
require "English"
require "pathname"
require "logger"
require "set"
# 3rd party
require "safe_yaml/load"
require "liquid"
require "kramdown"
require "colorator"
require "i18n"

Liquid, Kramdown, I18n, etc come into play much later after the Jekyll::Site has been initialized.
The argument that Liquid is our sole Template library and Kramdown is our default Converter library wouldn't have strength. The more reasonable argument would be "..so that those libraries can be used elsewhere in Jekyllverse easily..!"

That is all this PR is secondarily doing. JSON, Addressable::URI, are used in multiple methods in multiple files (files that're central to a site's build process).

It's easier to understand what libraries are used in a specific file, no?

I don't see value in that, since the developer would not be able to load the file individually (without loading ./lib/jekyll.rb first)

@pathawks pathawks changed the title from Optimize loading library into memory to Centralize require statements Apr 10, 2018

@pathawks

This comment has been minimized.

Member

pathawks commented Apr 10, 2018

@jekyllbot: merge +dev

@jekyllbot jekyllbot merged commit 36fbcaa into jekyll:master Apr 10, 2018

3 checks passed

WIP ready for review
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@ashmaroli ashmaroli deleted the ashmaroli:reduce-requires branch Apr 10, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment