Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

explore: optimize Node#at_css and #at_xpath #2213

Closed
flavorjones opened this issue Mar 29, 2021 · 3 comments
Closed

explore: optimize Node#at_css and #at_xpath #2213

flavorjones opened this issue Mar 29, 2021 · 3 comments

Comments

@flavorjones
Copy link
Member

Currently, #at_css and #at_xpath execute the entire XPath query with multiple results, creates the NodeSet and wraps each result as a Ruby object before discarding all but the first result.

It should be possible to optimize this, both at the XPath layer and while marshalling results.

At the XPath layer, let's play with variations of (original-query)[1]

At the marshalling layer, let's discard the NodeSet and just return the single Ruby object.

@flavorjones
Copy link
Member Author

Holy cow, I just discovered that libxml2 will automatically try to optimize any expression of the form (...)[1]:

#! /usr/bin/env ruby

require "bundler/inline"

gemfile do
  source "https://rubygems.org"
  gem "nokogiri", path: "."
  gem "benchmark-ips"
end

xml = "<root>" +
      (1..2000).map { |i| "<item>#{i}</item>" }.join +
      "</root>"
doc = Nokogiri::XML(xml)

Benchmark.ips do |x|
  x.report("optimized xpath") do
    result = doc.xpath("(//item)[1]")
    raise "unexpected result" unless result.size == 1
  end

  x.report("unoptimized xpath") do
    result = doc.xpath("//item[1]")
    raise "unexpected result" unless result.size == 1
  end

  x.compare!
end

reports

Warming up --------------------------------------
     optimized xpath     1.808k i/100ms
   unoptimized xpath     1.425k i/100ms
Calculating -------------------------------------
     optimized xpath     18.001k (± 1.6%) i/s -     90.400k in   5.023099s
   unoptimized xpath     13.702k (± 5.4%) i/s -     68.400k in   5.008044s

Comparison:
     optimized xpath:    18001.3 i/s
   unoptimized xpath:    13702.3 i/s - 1.31x  (± 0.00) slower

Seems like, other than having to deal with the crappy positional arguments for all the methods in Searchable, this should be an easy way to speed up at_css and at_xpath.

@flavorjones
Copy link
Member Author

Hmm, actually this may not be as big of a win as I thought for some reason ...

xml = "<root>" +
      (1..2000).map { |i| "<item>#{i}</item>" }.join +
      "</root>"
doc = Nokogiri::XML(xml)

Benchmark.ips do |x|
  x.report("optimized xpath") do
    doc.xpath("(//item)[1]")
  end

  x.report("unoptimized xpath") do
    doc.at_xpath("//item")
  end

  x.compare!
end

reports

Warming up --------------------------------------
     optimized xpath     1.547k i/100ms
   unoptimized xpath     1.556k i/100ms
Calculating -------------------------------------
     optimized xpath     16.996k (± 1.9%) i/s -     85.085k in   5.008089s
   unoptimized xpath     16.038k (± 1.7%) i/s -     80.912k in   5.046467s

Comparison:
     optimized xpath:    16995.7 i/s
   unoptimized xpath:    16037.9 i/s - 1.06x  (± 0.00) slower

Weird.

@flavorjones
Copy link
Member Author

OK, this has something to do with how expensive the node's context position is to calculate. If I replace //item with /root/item the benchmark shows optimization:

Warming up --------------------------------------
     optimized xpath     2.396k i/100ms
   unoptimized xpath     2.192k i/100ms
Calculating -------------------------------------
     optimized xpath     23.673k (± 2.0%) i/s -    119.800k in   5.062453s
   unoptimized xpath     21.616k (± 3.5%) i/s -    109.600k in   5.076759s

Comparison:
     optimized xpath:    23673.3 i/s
   unoptimized xpath:    21616.3 i/s - 1.10x  (± 0.00) slower

Shrug, I'm not sure it's worth making this part of at_css and at_xpath ... wrapping the query in (...)[1] doesn't seem like an automatic win.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant