Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue with large-ish records #20

Open
jkeck opened this issue Nov 6, 2015 · 4 comments
Open

Performance issue with large-ish records #20

jkeck opened this issue Nov 6, 2015 · 4 comments
Labels

Comments

@jkeck
Copy link
Contributor

jkeck commented Nov 6, 2015

(this may be specific to records w/ many subjects, which is the case in the example I'm using)

In looking into performance issues reported in METADOR-56 I have started doing a little profiling in the Mods/StanfordMods/ModsDisplay stack.

I think my profiling has indicated a pretty serious performance issue in the Mods gem when instantiating new records from large-ish record.

Example XML: https://purl.stanford.edu/pj169kw1971.mods (43.7 KB)
(This record has about 280 subject elements in it)

Simply instantiating a new Mods::Record via from_str takes more than 1 minute. Admittedly 43.7 KB is not that large of an XML file so I'm inclined to believe that something else is going awry somewhere here or in one of this gem's dependencies.

Here is my profiling run:

require 'ruby-prof'
require 'mods'
require 'faraday'

purl = 'https://purl.stanford.edu/pj169kw1971.mods'
mods_file = Faraday.get(purl).body

RubyProf.start
Mods::Record.new.from_str(mods_file)
result = RubyProf.stop

"Mods Version: #{Mods::VERSION}"
RubyProf::FlatPrinter.new(result).print(STDOUT)

Result

Mods Version: 2.0.3
Measure Mode: wall_time
Thread ID: 70336691692040
Fiber ID: 70336696331600
Total: 99.675561
Sort by: self_time

 %self      total      self      wait     child     calls  name
 65.64     71.831    65.424     0.000     6.407   440596   Nokogiri::XML::XPathContext#evaluate
  4.44      4.430     4.430     0.000     0.000   440596   <Class::Nokogiri::XML::XPathContext>#new
  2.31      7.090     2.302     0.000     4.788   440596   Nokogiri::XML::Searchable#extract_params
  1.60      1.598     1.598     0.000     0.000   479930   Module#extend_object
  1.50     91.875     1.497     0.000    90.378   440596   Nokogiri::XML::Searchable#xpath
  1.29     94.422     1.289     0.000    93.133   440596   Nom::XML::Term#nodes
  1.23      1.231     1.231     0.000     0.000   440596   String#gsub
  0.64      0.638     0.638     0.000     0.000   440596   Nokogiri::XML::XPathContext#register_ns
  0.64      3.263     0.638     0.000     2.626   440596   Enumerable#find
  0.62     95.248     0.617     0.000    94.631    17320   Hash#select
  0.60      0.601     0.601     0.000     0.000  2012169   Kernel#is_a?
  0.47      0.469     0.469     0.000     0.000   440624   Array#pop
  0.46      0.461     0.461     0.000     0.000   881390  *Nom::XML::Term#terminology
  0.44      0.437     0.437     0.000     0.000  1468075   Array#last
  0.37      0.923     0.372     0.000     0.550   440596   <Module::Nokogiri>#uses_libxml?
  0.33      0.327     0.326     0.000     0.000   568492  *Nokogiri::XML::Document#root
  0.30      0.299     0.299     0.000     0.000   881196   Array#include?
  0.29      0.292     0.292     0.000     0.000   440596   Nokogiri::VersionInfo#libxml2?
  0.29      0.286     0.286     0.000     0.000   440596   Nom::XML::Terminology#namespaces
  0.27      4.517     0.272     0.000     4.245   440596   Nokogiri::XML::XPathContext#register_namespaces
  0.27      0.266     0.266     0.000     0.000   440596   Array#reverse
  0.26      0.258     0.258     0.000     0.000   440596   <Class::Nokogiri::VersionInfo>#instance
  0.25      0.253     0.253     0.000     0.000   568493   Nokogiri::XML::Node#document
  0.23      0.233     0.229     0.000     0.003   440794  *Nom::XML::Term#xpath
  0.23      0.226     0.226     0.000     0.000   881271   Module#===
  0.23      0.225     0.225     0.000     0.000   885954   Kernel#class
  0.22     99.005     0.218     0.000    98.788  1080204  *Hash#each
  0.21      0.209     0.209     0.000     0.000   440596   Nokogiri::XML::NodeSet#include?
  0.20      0.200     0.200     0.000     0.000   475567   Module#extended
  0.20      0.411     0.195     0.000     0.216   127896   Nokogiri::XML::Node#==
  0.19      1.413     0.185     0.000     1.228    32905   Nokogiri::XML::Node#ancestors
  0.17      0.170     0.170     0.000     0.000   440645   Kernel#nil?
  0.14      0.144     0.144     0.000     0.000   440596   Array#first
  0.13      1.038     0.131     0.000     0.907   251429  *Nom::XML::Decorators::Terminology#respond_to?
  0.12      0.124     0.124     0.000     0.000   440596   String#to_s
  0.12      0.127     0.115     0.000     0.012   261932   Kernel#respond_to?
  0.10      0.594     0.100     0.000     0.494    32905   Nokogiri::XML::NodeSet#initialize
  0.10      0.209     0.095     0.000     0.114    98718   Nom::XML::Decorators::Terminology#root_terms
  0.09      0.093     0.093     0.000     0.000   255792   Nokogiri::XML::Node#pointer_id
  0.09      0.686     0.087     0.000     0.598    33318  *Class#new
  0.07      0.069     0.069     0.000     0.000    98718   Nom::XML::NokogiriExtension#terminology
  0.07      0.068     0.068     0.000     0.000   131623   Nokogiri::XML::Document#document
  0.04      0.038     0.038     0.000     0.000   122501   Nokogiri::XML::Node#parent
  0.04     96.794     0.035     0.000    96.758    29178  *Nom::XML::Decorators::Terminology#terms
  0.04      0.035     0.035     0.000     0.000    89596   Nokogiri::XML::NodeSet#<<
  0.03     99.156     0.032     0.000    99.124    29178  *Nom::XML::Decorators::Terminology#child_terms
  0.03     98.835     0.030     0.000    98.805   982662  *Array#each
  0.03     48.769     0.027     0.000    48.742     2701   Array#reject
  0.03     98.761     0.027     0.000    98.734   479930  *Kernel#extend
  0.03      0.053     0.026     0.000     0.027     3971   Array#flatten
  0.02     98.846     0.023     0.000    98.824   479295  *Nokogiri::XML::Document#decorate
  0.02      0.021     0.021     0.000     0.000     4762   Kernel#sprintf
  0.02     99.199     0.018     0.000    99.181   127896  *Nom::XML::Decorators::Terminology#term_accessors
  0.02      0.018     0.018     0.000     0.000     6025   String#sub
  0.02     80.508     0.017     0.000    80.491    93722  *Nokogiri::XML::NodeSet#[]
  0.02      0.038     0.016     0.000     0.022     2062   Nokogiri::XML::PP::CharacterData#inspect
  0.02      0.015     0.015     0.000     0.000    36003   Nokogiri::XML::NodeSet#length
  0.02      0.017     0.015     0.000     0.002    12307   Global#respond_to_without_terms?
  0.01      0.027     0.015     0.000     0.012     4068   Nom::XML::Term#respond_to?
  0.01      0.014     0.014     0.000     0.000    48986   Kernel#respond_to_missing?
  0.01      0.014     0.014     0.000     0.000    33296   Kernel#block_given?
  0.01      0.013     0.013     0.000     0.000    33937   Symbol#to_sym
  0.01      0.012     0.012     0.000     0.000        2   IO#write
  0.01      0.012     0.012     0.000     0.000     3978   Array#join
  0.01      0.012     0.012     0.000     0.000     5398   String#inspect
  0.01      0.040     0.010     0.000     0.030     2065   Nokogiri::XML::Node#children
  0.01      0.011     0.008     0.000     0.003     3727   Array#uniq
  0.01     98.686     0.008     0.000    98.678     4363  *Nom::XML::Decorators::Terminology#add_terminology_method_overrides!
  0.01      0.008     0.008     0.000     0.000     6025   Symbol#to_s
  0.01      0.039     0.007     0.000     0.032      625   Array#inspect
  0.01     98.692     0.007     0.000    98.686     4363  *<Module::Nom::XML::Decorators::Terminology>#extended
  0.01      0.006     0.006     0.000     0.000     4762   Module#name
  0.01      0.005     0.005     0.000     0.000     3332   Nokogiri::XML::Node#name
  0.00     30.441     0.004     0.000    30.437     1033   Enumerable#all?
  0.00      0.004     0.004     0.000     0.000     3971   Array#compact
  0.00     30.446     0.003     0.000    30.442     1033   Nom::XML::Decorators::NodeSet#respond_to?
  0.00      0.003     0.003     0.000     0.000     2062   Nokogiri::XML::Node#text
  0.00      0.005     0.003     0.000     0.002     4068   Nom::XML::Term#key?
  0.00      0.003     0.003     0.000     0.000     4604   Kernel#hash
  0.00     18.266     0.003     0.000    18.263     1657   Nokogiri::XML::Node#attribute_nodes
  0.00      0.003     0.003     0.000     0.000        1   <Class::Nokogiri::XML::Document>#read_memory
  0.00      0.002     0.002     0.000     0.000     1032   Nokogiri::XML::Node#inspect_attributes
  0.00      0.002     0.002     0.000     0.000     4457   Nom::XML::Term#in_edit_context?
  0.00      0.002     0.002     0.000     0.000     2066   Nokogiri::XML::Namespace#href
  0.00      0.002     0.002     0.000     0.000     2699   Nokogiri::XML::Node#namespace
  0.00      0.002     0.002     0.000     0.000     1033   Nokogiri::XML::Namespace#inspect_attributes
  0.00      0.002     0.002     0.000     0.000     1268   Nokogiri::XML::Node#value
  0.00      0.002     0.002     0.000     0.000     4068   Hash#key?
  0.00      0.001     0.001     0.000     0.000     4762   Kernel#object_id
  0.00      0.002     0.001     0.000     0.001     1033   Nokogiri::XML::NodeSet#empty?
  0.00      0.001     0.001     0.000     0.000      634   Nokogiri::XML::Attr#inspect_attributes
  0.00      0.002     0.001     0.000     0.001      244   Nom::XML::Term#local_xpath
  0.00      0.001     0.001     0.000     0.000        2   Readline#readline
  0.00      0.000     0.000     0.000     0.000     1034   Nokogiri::XML::Namespace#prefix
  0.00     99.651     0.000     0.000    99.650     1032  *Nokogiri::XML::NodeSet#inspect
  0.00      0.013     0.000     0.000     0.012        2   Kernel#printf
  0.00     99.651     0.000     0.000    99.651   443296  *Array#map
  0.00      0.001     0.000     0.000     0.000      737  *Nom::XML::Term#xmlns
  0.00      0.000     0.000     0.000     0.000       19   Array#&
  0.00      0.000     0.000     0.000     0.000      227   Kernel#lambda
  0.00     99.652     0.000     0.000    99.652        2   Kernel#inspect
  0.00      0.001     0.000     0.000     0.001       87   RubyLex#getc
  0.00      0.001     0.000     0.000     0.001        8   RubyLex#identify_identifier
  0.00      0.000     0.000     0.000     0.000      244   Nom::XML::Term#parent_xpath
  0.00     99.652     0.000     0.000    99.652     2700  *Nokogiri::XML::PP::Node#inspect
  0.00     99.655     0.000     0.000    99.654       29   Proc#call
  0.00      0.000     0.000     0.000     0.000        6   String#gsub!
  0.00      0.004     0.000     0.000     0.004      390  *Nom::XML::Term#in_edit_context
  0.00      0.007     0.000     0.000     0.007        4   Kernel#eval
  0.00      0.003     0.000     0.000     0.003       31  *IRB::SLex::Node#match_io
  0.00      0.000     0.000     0.000     0.000      244   NilClass#blank?
  0.00      0.001     0.000     0.000     0.001        2   IRB::ReadlineInputMethod#gets
  0.00      0.004     0.000     0.000     0.004      389  *Nom::XML::Term#initialize
  0.00      0.000     0.000     0.000     0.000       27   RubyLex#ungetc
  0.00      0.003     0.000     0.000     0.003       19   RubyLex#token
  0.00      0.001     0.000     0.000     0.001       21  *RubyToken#Token
  0.00      0.004     0.000     0.000     0.004      389  *Nom::XML::Term#method_missing
  0.00      0.004     0.000     0.000     0.004      389  *Nom::XML::Term#add_term
  0.00      0.000     0.000     0.000     0.000       88   Array#push
  0.00      0.000     0.000     0.000     0.000       96   Kernel#kind_of?
  0.00      0.000     0.000     0.000     0.000       87   Array#shift
  0.00      0.003     0.000     0.000     0.003       19   IRB::SLex#match
  0.00      0.000     0.000     0.000     0.000       70   <Class::RubyLex>#debug?
  0.00      0.000     0.000     0.000     0.000       50   String#concat
  0.00     99.675     0.000     0.000    99.675        1   RubyLex#each_top_level_statement
  0.00      0.000     0.000     0.000     0.000       19   RubyToken::Token#initialize
  0.00      0.000     0.000     0.000     0.000       19   Module#ancestors
  0.00      0.000     0.000     0.000     0.000       19   IRB::Notifier::LeveledNotifier#notify?
  0.00      0.000     0.000     0.000     0.000        8   IRB::Context#verbose?
  0.00      0.000     0.000     0.000     0.000       24   Regexp#===
  0.00     99.673     0.000     0.000    99.673        4   IRB::Irb#signal_status
  0.00      0.000     0.000     0.000     0.000        2   <Module::Readline>#input=
  0.00      0.000     0.000     0.000     0.000       27   RubyLex#peek
  0.00      0.000     0.000     0.000     0.000        6   RubyLex#prompt
  0.00      0.000     0.000     0.000     0.000       19   IRB::Notifier::LeveledNotifier#<=>
  0.00      0.000     0.000     0.000     0.000        6   Kernel#format
  0.00      0.003     0.000     0.000     0.003        2   RubyLex#lex
  0.00      0.000     0.000     0.000     0.000       19   String#==
  0.00     99.650     0.000     0.000    99.650    34970  *Integer#upto
  0.00      0.000     0.000     0.000     0.000        2   <Object::Object>#push
  0.00      0.000     0.000     0.000     0.000       46   Nom::XML::Terminology#terminology
  0.00      0.000     0.000     0.000     0.000       19   IRB::Notifier::AbstractNotifier#printf
  0.00      0.000     0.000     0.000     0.000        1   Nom::XML::NokogiriExtension#nom!
  0.00      0.003     0.000     0.000     0.003        1   <Class::Nokogiri::XML::Document>#parse
  0.00      0.000     0.000     0.000     0.000        6   IRB::Context#prompting?
  0.00      0.000     0.000     0.000     0.000        6   IRB::Irb#prompt
  0.00      0.000     0.000     0.000     0.000        8   RubyToken::TkId#initialize
  0.00      0.000     0.000     0.000     0.000        2   String#chars
  0.00      0.000     0.000     0.000     0.000       27   Array#unshift
  0.00      0.000     0.000     0.000     0.000        6   Kernel#dup
  0.00      0.000     0.000     0.000     0.000       12   RubyLex#getc_of_rests
  0.00      0.000     0.000     0.000     0.000        8   Nokogiri::XML::Document#decorators
  0.00      0.000     0.000     0.000     0.000        2   RubyLex#get_readed
  0.00      0.000     0.000     0.000     0.000        6   IO#tty?
  0.00      0.000     0.000     0.000     0.000        2   IRB::Context#set_last_value
  0.00      0.000     0.000     0.000     0.000        4   Array#rindex
  0.00      0.000     0.000     0.000     0.000       46   Nom::XML::Terminology#xpath
  0.00      0.000     0.000     0.000     0.000       29   NilClass#nil?
  0.00      0.000     0.000     0.000     0.000        2   <Module::Readline>#output=
  0.00      0.003     0.000     0.000     0.003        1   Mods::Reader#from_str
  0.00      0.004     0.000     0.000     0.004        1   Mods::Record#set_terminology_ns
  0.00      0.000     0.000     0.000     0.000        1   Hash#each_pair
  0.00      0.000     0.000     0.000     0.000       19   Comparable#>=
  0.00      0.001     0.000     0.000     0.001        2   RubyLex#buf_input
  0.00      0.000     0.000     0.000     0.000        2   IRB::ReadlineInputMethod#encoding
  0.00      0.000     0.000     0.000     0.000       13   String#[]
  0.00      0.004     0.000     0.000     0.004        1   Nom::XML::NokogiriExtension#set_terminology
  0.00     99.665     0.000     0.000    99.665        2   IRB::Irb#output_value
  0.00      0.000     0.000     0.000     0.000       19   Fixnum#<=>
  0.00     99.650     0.000     0.000    99.650    34970  *Nokogiri::XML::NodeSet#each
  0.00      0.007     0.000     0.000     0.007        2   IRB::Context#evaluate
  0.00      0.000     0.000     0.000     0.000        6   IRB::Context#main
  0.00      0.007     0.000     0.000     0.007        4   IRB::WorkSpace#evaluate
  0.00      0.007     0.000     0.000     0.007        1   Mods::Record#from_str
  0.00      0.000     0.000     0.000     0.000        6   <Object::Object>#to_s
  0.00      0.000     0.000     0.000     0.000        6   Kernel#initialize_dup
  0.00      0.000     0.000     0.000     0.000        2   Array#concat
  0.00     99.650     0.000     0.000    99.650    17716  *Enumerable#map
  0.00      0.000     0.000     0.000     0.000        1   Nokogiri::XML::Document#initialize
  0.00      0.000     0.000     0.000     0.000        6   String#initialize_copy
  0.00      0.004     0.000     0.000     0.004        1   Nom::XML::Terminology#initialize
  0.00      0.000     0.000     0.000     0.000        6   Fixnum#to_s
  0.00      0.000     0.000     0.000     0.000        1   Hash#initialize
  0.00      0.000     0.000     0.000     0.000        1   IRB::Irb#eval_input
  0.00      0.000     0.000     0.000     0.000        1   Nokogiri::XML::ParseOptions#strict?
  0.00      0.000     0.000     0.000     0.000        1   Mods::Record#initialize
  0.00      0.000     0.000     0.000     0.000        2   <Module::IRB>#conf
  0.00      0.000     0.000     0.000     0.000        1   Nokogiri::XML::ParseOptions#xinclude?
  0.00      0.000     0.000     0.000     0.000        2   <Module::IRB>#CurrentContext
  0.00      0.003     0.000     0.000     0.003        1   <Module::Nokogiri>#XML
  0.00      0.000     0.000     0.000     0.000        1   Nokogiri::XML::Node#decorate!
  0.00     99.652     0.000     0.000    99.652        2   IRB::Context#inspect_last_value
  0.00      0.000     0.000     0.000     0.000        8   Symbol#===
  0.00      0.000     0.000     0.000     0.000        2   String#force_encoding
  0.00      0.000     0.000     0.000     0.000        2   IO#external_encoding
  0.00      0.000     0.000     0.000     0.000        2   Kernel#untaint
  0.00      0.000     0.000     0.000     0.000        1   Nokogiri::XML::ParseOptions#initialize
  0.00      0.000     0.000     0.000     0.000        1   Mods::Reader#initialize
  0.00     99.652     0.000     0.000    99.652        2   IRB::Inspector#inspect_value
  0.00      0.000     0.000     0.000     0.000        2   Array#to_a
  0.00      0.000     0.000     0.000     0.000        1   Mods::Reader#normalize_mods
  0.00      0.000     0.000     0.000     0.000        2   Nokogiri::XML::Document#name
  0.00      0.000     0.000     0.000     0.000        2   Hash#[]
  0.00      0.000     0.000     0.000     0.000        1   String#encoding
  0.00      0.000     0.000     0.000     0.000        2   Fixnum#&
  0.00      0.000     0.000     0.000     0.000        1   Nokogiri::XML::Document#inspect_attributes
  0.00      0.000     0.000     0.000     0.000        1   <Class::Nokogiri::XML::Document>#empty_doc?
  0.00      0.000     0.000     0.000     0.000        1   Encoding#to_s
@jkeck jkeck added the bug label Nov 6, 2015
@cbeer
Copy link
Member

cbeer commented Nov 9, 2015

I think the profiling may be misleading -- it looks like the expensive part is calling #inspect on the record and spitting out the result. Can you try to refine the profiling to something that better represents real-world use?

require 'ruby-prof'
require 'mods'
require 'faraday'

purl = 'https://purl.stanford.edu/pj169kw1971.mods'
mods_file = Faraday.get(purl).body

result = RubyProf.profile do
  record = Mods::Record.new.from_str(mods_file)
  record.inspect; 1
end

"Mods Version: #{Mods::VERSION}"
RubyProf::FlatPrinter.new(result).print(STDOUT)

@jkeck
Copy link
Contributor Author

jkeck commented Nov 9, 2015

Perhaps. To be totally honest, I'm not a profiling expert. Maybe I should have just left that data out (because it is possibly misleading) but I figure somebody who is more facile with profiling could take this and run with it. I guess more than anything I just wanted to point out that this issue existed (I'll leave the data out next time).

I don't quite understand how calling Mods::Record.new isn't real-world use. It's what is described in the usage section on the README of the gem, but maybe I'm missing something.

That being said, simply calling Mods::Record.new with that MODS takes a very long time (> 1 minute). I thought that the fact that it took so long to instantiate the class was indicative of a problem (and adding other method calls to fetch data out of the MODS could cloud up the results of the original problem). I'll add a more basic example with simple timing.

require 'mods'
require 'faraday'

purl = 'https://purl.stanford.edu/pj169kw1971.mods'
mods_file = Faraday.get(purl).body

start_time = Time.now
  record = Mods::Record.new.from_str(mods_file)
  puts record.sort_title.inspect
end_time = Time.now
puts "We took #{end_time - start_time} seconds to get the sort_title"
We took 84.352694 seconds to get the sort_title

So in this case I want to get the sort title of this record. In order to do so it took more than 84 seconds.

I'll leave what to profile up to somebody who may be more knowledgable in that area (because clearly I am not).

@cbeer
Copy link
Member

cbeer commented Nov 9, 2015

Sorry -- what I mean is, Mods::Record.new takes almost no time at all (here, I'm seeing it take about 0.3s to run your code if we don't spit it out to the screen). It's where you're outputting the result for IRB that you're taking a very long time, and I'm guess /that's/ not how it's used in the real world. Instead, we'd be extracting many little bits of data.

@jkeck
Copy link
Contributor Author

jkeck commented Nov 9, 2015

So when you run my code example in #20 (comment) you're getting the sort_title output in 0.3s? I think that's a typical use case, store the record to a variable, call a method on that object to extract a piece of data (the sort_title). That data isn't particularly large so it should not be expensive to spit it out to IRB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants