Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major performance regression 3.7.4 -> 3.7.5 (18 seconds -> 1342 seconds) #864

Closed
benjie opened this Issue Jun 8, 2016 · 30 comments

Comments

Projects
None yet
3 participants
@benjie
Copy link

benjie commented Jun 8, 2016

On 3.7.1, 3.7.2, 3.7.4 our site generates in under 20 seconds:

Site compiled in 17.92s.

On 3.7.5:

Site compiled in 1341.51s.

(Most of the delay is before the first line of output after Compiling site... is output - I didn't watch for long enough to see it complete but it didn't output anything for at least 5 minutes.)

The only difference otherwise in the output is that 3.7.5 outputs a warning about halfway through logging the outputs; but that's much later than the delay seems to occur.

kramdown warning: No link definition for link ID 'due to change in future' found on line 9

How can I help debug this further?

@benjie

This comment has been minimized.

Copy link
Author

benjie commented Jun 8, 2016

(To be clear, I suspect the kramdown warning is a red herring)

@ddfreyne

This comment has been minimized.

Copy link
Member

ddfreyne commented Jun 8, 2016

Can you reproduce the problem by only changing the Nanoc version (not the version of other dependencies)?

Does this problem occur for an incremental compilation, or also for a from-scratch compilation (by executing rm -rf tmp before compiling)?

Does the issue also occur on Nanoc 3.8?

@ddfreyne

This comment has been minimized.

Copy link
Member

ddfreyne commented Jun 8, 2016

Can I access the source code for the site somewhere?

(I’ll be out for the next week, so I won’t be able to reply so quickly.)

@benjie

This comment has been minimized.

Copy link
Author

benjie commented Jun 9, 2016

Can you reproduce the problem by only changing the Nanoc version (not the version of other dependencies)?

Yes; this was the only thing I changed (was just changing the last digit in gem 'nanoc', "= 3.7.4" in my Gemfile and re-running bundle install between each test).

Does this problem occur for an incremental compilation, or also for a from-scratch compilation (by executing rm -rf tmp before compiling)?

3.7.4

From scratch: Site compiled in 18.42s.

Immediate rebuild (incremental): Site compiled in 17.31s.

3.7.5

From scratch: Site compiled in 1382.87s.

Immediate rebuild: got bored after 5 minutes and killed it. I expect it'll be the same as yesterdays rebuild.

Interestingly in the from-scratch build the create lines were output within a minute of starting but it hung at the end before confirming; whereas in the incremental one I did the other day it hung before the update lines (possibly and likely afterwards as well).

Does the issue also occur on Nanoc 3.8?

3.8.0

Errors for me, because it doesn't pull in the blank? helper from Rails:

Loading site data… done
Compiling site…
      create  [0.24s]  output/authentication/index.html

Captain! We’ve been hit!

Message:

NoMethodError: undefined method `blank?' for "collection":String

After fixing this by adding this to my Rules file:

require 'active_support'
require 'active_support/core_ext/object/blank'

we get the same as 3.7.5

From scratch: Site compiled in 1302.61s.

Immediate rebuild: got bored after a couple minutes

Can I access the source code for the site somewhere?

Alas, no. If I get a chance I'll try and produce a minimal test case that I can share though. It's likely to be a while though.

@benjie

This comment has been minimized.

Copy link
Author

benjie commented Jun 9, 2016

Update: turned out I hadn't killed the 3.8.0 immediate rebuild. Here's the full output:

$ bundle exec nanoc
Loading site data… done
Compiling site…
kramdown warning: No link definition for link ID 'due to change in future' found on line 9

Site compiled in 1285.06s.
@ddfreyne

This comment has been minimized.

Copy link
Member

ddfreyne commented Jun 19, 2016

Would you be able to run nanoc though a profiler, like stackprof, and send me the output? I’m a bit at a loss where to look for this slowdown.

@ddfreyne

This comment has been minimized.

Copy link
Member

ddfreyne commented Jun 28, 2016

Potential duplicate: #878

@ddfreyne

This comment has been minimized.

Copy link
Member

ddfreyne commented Jul 1, 2016

I’ve looked at the diff for 3.7.4 to 3.7.5 yet I’m not able to pinpoint a potential reason for why this is slower.

@ddfreyne

This comment has been minimized.

Copy link
Member

ddfreyne commented Jul 1, 2016

I’d still be interested in getting a stackprof profile, so that I can get a better understanding of where the slowness is coming from. I’ve written down a how-to guide below; it should work out of the box. Note that this approach will only work with Ruby 2.1 or higher.

Clone Nanoc and check out version 3.7.5:

% git clone git@github.com:nanoc/nanoc.git        
% cd nanoc
% git checkout 3.7.5

Edit bin/nanoc and wrap Nanoc::CLI.run(ARGV) in a Stackprof block. For example:

require 'stackprof'

StackProf.run(mode: :cpu, out: 'stackprof.dump') do
  Nanoc::CLI.run(ARGV)
end

In your Nanoc site, edit the Gemfile to point Nanoc to your locally modified copy, using the path: argument. Additionally, add stackprof to your Gemfile. For example:

gem 'nanoc', path: '/Users/denis/Downloads/nanoc'
gem 'stackprof', github: 'tmm1/stackprof'

In your Nanoc site, run bundle install, and bundle exec nanoc. If successful, a stackprof.dump file will have appeared in the working directory after compilation.

Run stackprof stackprof.dump --text --limit 50, and send me the output of that command.

@benjie

This comment has been minimized.

Copy link
Author

benjie commented Jul 1, 2016

On it. (Sorry; have been away 🇬🇧 ✈️ 🍁 )

@benjie

This comment has been minimized.

Copy link
Author

benjie commented Jul 1, 2016

Site compiled in 1138.05s.

Here's the output of stackprof stackprof.dump --text --limit 50:

==================================
  Mode: cpu(1000)
  Samples: 345159 (0.02% miss rate)
  GC: 1689 (0.49%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
    327616  (94.9%)      327078  (94.8%)     Set#initialize_dup
      7274   (2.1%)        7274   (2.1%)     Set#add
     14493   (4.2%)         942   (0.3%)     Psych::Visitors::YAMLTree#accept
       824   (0.2%)         824   (0.2%)     Set#include?
   4210398 (1219.8%)         567   (0.2%)     Nanoc::Checksummer.update
       541   (0.2%)         541   (0.2%)     Nanoc::Item#hash
      1660   (0.5%)         380   (0.1%)     Haml::Buffer#attributes
       363   (0.1%)         363   (0.1%)     Psych::Nodes::Scalar#initialize
       363   (0.1%)         361   (0.1%)     Psych::ScalarScanner#tokenize
       285   (0.1%)         285   (0.1%)     Psych::Visitors::Emitter#visit_Psych_Nodes_Scalar
       254   (0.1%)         254   (0.1%)     Psych::Visitors::YAMLTree#binary?
       381   (0.1%)         241   (0.1%)     Haml::Util#inspect_obj
       941   (0.3%)         206   (0.1%)     Psych::Visitors::YAMLTree#visit_String
       534   (0.2%)         177   (0.1%)     Haml::Compiler.build_attributes
       168   (0.0%)         168   (0.0%)     #<Class:0x007f83c4574aa0>#children=
       388   (0.1%)         167   (0.0%)     Haml::Parser#parse_tag
       150   (0.0%)         150   (0.0%)     Haml::Options#[]
       155   (0.0%)         150   (0.0%)     Haml::Buffer.merge_attrs
      6447   (1.9%)         137   (0.0%)     Psych::Visitors::Visitor#visit
     14726   (4.3%)         137   (0.0%)     Set#merge
       134   (0.0%)         134   (0.0%)     Nanoc::DataSources::Filesystem#read
      6813   (2.0%)         132   (0.0%)     Haml::Engine#render
       177   (0.1%)         117   (0.0%)     Haml::Parser::Line#tabs
      1990   (0.6%)         115   (0.0%)     Nanoc::DataSources::Filesystem#load_objects
      2638   (0.8%)         107   (0.0%)     block (4 levels) in <main>
       101   (0.0%)         101   (0.0%)     Psych::Visitors::YAMLTree#find_ivars
        99   (0.0%)          99   (0.0%)     Nanoc::NotificationCenter.initialize_if_necessary
        88   (0.0%)          88   (0.0%)     Psych::Nodes::Node#initialize
       440   (0.1%)          84   (0.0%)     ActiveSupport::Dependencies::Loadable#require
      4131   (1.2%)          81   (0.0%)     Psych::Visitors::ToRuby#revive_hash
        73   (0.0%)          73   (0.0%)     Psych::Visitors::YAMLTree::Registrar#register
      3300   (1.0%)          71   (0.0%)     Haml::Compiler#compile_tag
        71   (0.0%)          71   (0.0%)     Haml::Util#balance
        71   (0.0%)          71   (0.0%)     Psych::Visitors::YAMLTree::Registrar#key?
      2748   (0.8%)          61   (0.0%)     block (3 levels) in <main>
        60   (0.0%)          60   (0.0%)     Haml::Helpers#html_escape
        75   (0.0%)          58   (0.0%)     Haml::Parser.parse_class_and_id
        64   (0.0%)          56   (0.0%)     Haml::Parser#next_line
        51   (0.0%)          51   (0.0%)     Haml::Helpers#preserve
        99   (0.0%)          50   (0.0%)     FileUtils::Entry_#copy_file
        49   (0.0%)          49   (0.0%)     block (4 levels) in <main>
       553   (0.2%)          48   (0.0%)     Haml::Compiler#flush_merged_text
        46   (0.0%)          46   (0.0%)     Haml::Parser#block_keyword
       409   (0.1%)          46   (0.0%)     Psych::TreeBuilder#scalar
       780   (0.2%)          44   (0.0%)     Psych::Visitors::Emitter#visit_Psych_Nodes_Document
        43   (0.0%)          43   (0.0%)     Psych::ClassLoader#load
        75   (0.0%)          43   (0.0%)     Haml::Compiler.filter_and_join
      7315   (2.1%)          40   (0.0%)     Set#do_with_enum
        37   (0.0%)          37   (0.0%)     Haml::Parser#initialize
       417   (0.1%)          36   (0.0%)     Nanoc::HashExtensions#freeze_recursively

@benjie

This comment has been minimized.

Copy link
Author

benjie commented Jul 1, 2016

3.7.4...3.7.5diff-13aaa7d235f22bd91fa101a8340767ff < Differences between 3.7.4 and 3.7.5 focussing on checksummer

@denis-soundcloud

This comment has been minimized.

Copy link
Contributor

denis-soundcloud commented Jul 1, 2016

Interesting! The Set#initialize_dup is unexpected. Could you share stackprof.dump so that I can do more detailed analysis on where this is coming from? Alternatively, can you send me the output of stackprof stackprof.dump --method 'Set#initialize_dup'?

@benjie

This comment has been minimized.

Copy link
Author

benjie commented Jul 1, 2016

Set#initialize_dup (/Users/benjiegillam/.rbenv/versions/2.3.1/lib/ruby/2.3.0/set.rb:106)
  samples:  327078 self (94.8%)  /   327616 total (94.9%)
  callers:
    327616  (  100.0%)  Set#|
  callees (538 total):
     538  (  100.0%)  Nanoc::Item#hash
  code:
                                  |   106  |   def initialize_dup(orig)
  378    (0.1%) /   378   (0.1%)  |   107  |     super
 327238   (94.8%) /  326700  (94.7%)  |   108  |     @hash = orig.instance_variable_get(:@hash).dup
                                  |   109  |   end
@denis-soundcloud

This comment has been minimized.

Copy link
Contributor

denis-soundcloud commented Jul 1, 2016

It looks like this originates from Set#+, which is used quite often in the checksummer. Perhaps using Hamster::Set would make this problem go away, as it’s a far more efficient way of dealing with immutable data.

@benjie

This comment has been minimized.

Copy link
Author

benjie commented Jul 1, 2016

I'm happy sticking to 3.7.4; so don't feel you need to fix it on my account 😃

@benjie

This comment has been minimized.

Copy link
Author

benjie commented Jul 1, 2016

As an aside, I'm running ruby 2.3.1 now, which seems to have sped this up about 15% over ruby 2.1.5 - from 1341.51s to 1138.05s.

@denis-soundcloud

This comment has been minimized.

Copy link
Contributor

denis-soundcloud commented Jul 1, 2016

Can you try adding hamster to your Gemfile, and replacing

def update(obj, digest, visited = Set.new)

with

def update(obj, digest, visited = Hamster::Set.new)

along with require 'hamster' at the top of the file? That might fix the slowdown.

@benjie

This comment has been minimized.

Copy link
Author

benjie commented Jul 1, 2016

Site compiled in 50.72s.

Significant improvement

@ddfreyne

This comment has been minimized.

Copy link
Member

ddfreyne commented Jul 1, 2016

Alright, then #518 is indeed the cause of the slowdown. I’m unsure how to improve the implementation, apart from using Hamster::Set (which makes much more sense).

Can you run the stackprof again, using the modified Nanoc this time? (50s isn’t quite the original 18s, so it’s worth investigating more closely.)

In the mean time, I’ll work on a PR that uses Hamster::Set (and thus adds Hamster as a dependency, but I believe that’s fine).

@ddfreyne

This comment has been minimized.

Copy link
Member

ddfreyne commented Jul 1, 2016

FYI, I’m going to apply the patch to Nanoc 4.x only, as the problem also exists there, and Nanoc 4.x is the recommended version to use, and the Nanoc 4 upgrade guide should be easy to follow.

@ddfreyne

This comment has been minimized.

Copy link
Member

ddfreyne commented Jul 1, 2016

Fix in #887.

@benjie

This comment has been minimized.

Copy link
Author

benjie commented Jul 2, 2016

==================================
  Mode: cpu(1000)
  Samples: 16527 (0.33% miss rate)
  GC: 1904 (11.52%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
      5961  (36.1%)        5961  (36.1%)     Hamster::Trie#index_for
     15075  (91.2%)         948   (5.7%)     Psych::Visitors::YAMLTree#accept
      6873  (41.6%)         606   (3.7%)     Hamster::Set#union
       468   (2.8%)         468   (2.8%)     Psych::Nodes::Scalar#initialize
     11947  (72.3%)         437   (2.6%)     Hamster::Trie#bulk_put
       416   (2.5%)         412   (2.5%)     Psych::ScalarScanner#tokenize
      1455   (8.8%)         337   (2.0%)     Haml::Buffer#attributes
       289   (1.7%)         289   (1.7%)     Psych::Visitors::Emitter#visit_Psych_Nodes_Scalar
       289   (1.7%)         289   (1.7%)     Psych::Visitors::YAMLTree#binary?
     54169 (327.8%)         232   (1.4%)     Nanoc::Checksummer.update
      1002   (6.1%)         214   (1.3%)     Psych::Visitors::YAMLTree#visit_String
       327   (2.0%)         204   (1.2%)     Haml::Util#inspect_obj
      8425  (51.0%)         194   (1.2%)     Psych::Visitors::Visitor#visit
       468   (2.8%)         182   (1.1%)     Haml::Compiler.build_attributes
      6420  (38.8%)         162   (1.0%)     Psych::Visitors::ToRuby#revive_hash
       241   (1.5%)         162   (1.0%)     Haml::Parser::Line#tabs
       337   (2.0%)         158   (1.0%)     Haml::Parser#parse_tag
       133   (0.8%)         133   (0.8%)     Haml::Options#[]
       128   (0.8%)         128   (0.8%)     #<Class:0x007feba21f13d8>#children=
       130   (0.8%)         128   (0.8%)     Haml::Buffer.merge_attrs
      5897  (35.7%)         127   (0.8%)     Haml::Engine#render
       112   (0.7%)         112   (0.7%)     Nanoc::DataSources::Filesystem#read
      2659  (16.1%)         104   (0.6%)     Nanoc::DataSources::Filesystem#load_objects
      2742  (16.6%)         104   (0.6%)     block (4 levels) in <main>
       103   (0.6%)         103   (0.6%)     Psych::Visitors::YAMLTree#find_ivars
       472   (2.9%)          96   (0.6%)     ActiveSupport::Dependencies::Loadable#require
        92   (0.6%)          92   (0.6%)     Hamster::Trie#initialize
      2875  (17.4%)          87   (0.5%)     block (3 levels) in <main>
       552   (3.3%)          84   (0.5%)     Psych::TreeBuilder#scalar
        82   (0.5%)          82   (0.5%)     Psych::Nodes::Node#initialize
        81   (0.5%)          81   (0.5%)     block (2 levels) in <main>
        78   (0.5%)          78   (0.5%)     Psych::Visitors::YAMLTree::Registrar#register
        69   (0.4%)          69   (0.4%)     Psych::ClassLoader#load
      2872  (17.4%)          69   (0.4%)     Haml::Compiler#compile_tag
        69   (0.4%)          69   (0.4%)     Nanoc::NotificationCenter.initialize_if_necessary
       131   (0.8%)          67   (0.4%)     Hamster::Set.alloc
        66   (0.4%)          66   (0.4%)     Psych::Visitors::YAMLTree::Registrar#key?
       623   (3.8%)          59   (0.4%)     Nanoc::HashExtensions#symbolize_keys_recursively
        53   (0.3%)          53   (0.3%)     Haml::Util#balance
        63   (0.4%)          53   (0.3%)     Haml::Parser#next_line
       166   (1.0%)          49   (0.3%)     Nanoc::Context#initialize
        46   (0.3%)          46   (0.3%)     block (4 levels) in <main>
        41   (0.2%)          41   (0.2%)     Haml::Parser#block_keyword
        37   (0.2%)          37   (0.2%)     Haml::Helpers#preserve
        37   (0.2%)          37   (0.2%)     Haml::Helpers#html_escape
        47   (0.3%)          36   (0.2%)     Haml::Parser.parse_class_and_id
        59   (0.4%)          34   (0.2%)     Haml::Compiler.filter_and_join
        34   (0.2%)          34   (0.2%)     singleton class
        33   (0.2%)          33   (0.2%)     Psych::TreeBuilder#push
        96   (0.6%)          33   (0.2%)     Psych::Nodes::Mapping#initialize
@benjie

This comment has been minimized.

Copy link
Author

benjie commented Jul 2, 2016

In case it's any use: stackprof stackprof.dump --method 'Hamster::Trie#index_for':

Hamster::Trie#index_for (/Users/benjiegillam/.rbenv/versions/2.3.1/lib/ruby/gems/2.3.0/gems/hamster-3.0.0/lib/hamster/trie.rb:327)
  samples:  5961 self (36.1%)  /   5961 total (36.1%)
  callers:
    5127  (   86.0%)  Hamster::Trie#bulk_put
     420  (    7.0%)  Hamster::Trie#get
     384  (    6.4%)  Hamster::Trie#put!
      30  (    0.5%)  Hamster::Trie#put
  code:
                                  |   327  |     def index_for(key)
 5961   (36.1%) /  5961  (36.1%)  |   328  |       (key.hash.abs >> @significant_bits) & 31
                                  |   329  |     end
@ddfreyne

This comment has been minimized.

Copy link
Member

ddfreyne commented Jul 2, 2016

Looks like a significant amount of time is still spent in the checksummer. Does your site have items with lots of attributes (e.g. large YAML files)?

@ddfreyne

This comment has been minimized.

Copy link
Member

ddfreyne commented Jul 2, 2016

This will be faster in 4.2+, because the checksum is based off the unparsed item/layout attributes, and a checksum of a string is far faster than the checksum of a nested hash.

@ddfreyne

This comment has been minimized.

Copy link
Member

ddfreyne commented Jul 2, 2016

(FWIW, I don’t understand the percentages in stackprof’s output. 327.8% doesn’t seem possible, but at least the relative %s seem to be correct.)

Edit: tmm1/stackprof#9 explains the cause.

@benjie

This comment has been minimized.

Copy link
Author

benjie commented Jul 2, 2016

It's basically just large YAML files; good call :)

@benjie

This comment has been minimized.

Copy link
Author

benjie commented Jul 2, 2016

Upgraded to 4.2.2, just had to add string_pattern_type: legacy and identifier_type: legacy from the upgrade guide and the active support things referenced above; and...

Site compiled in 16.11s

@ddfreyne

This comment has been minimized.

Copy link
Member

ddfreyne commented Jul 2, 2016

Excellent—seems like a net win in the end! I’ll close this issue now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.