New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace xmlhash with Nokogiri #10085
Comments
On top of parsing XML, we sometimes need to convert a XML document to a hash. In order to do this, I see two possible solutions which need to be benchmarked for performance:
# Setup in an initializer under `src/api/app/config/initializers`
ActiveSupport::XmlMini.backend = 'Nokogiri'
# Usage example
my_hash = Nokogiri::XML("<some><thing></thing></some>").to_hash
# Setup in an initializer under `src/api/app/config/initializers`
ActiveSupport::XmlMini.backend = 'LibXML'
# Usage example
my_hash = Hash.from_xml("<some><thing></thing></some>") |
Here's a benchmark: # 1. Add `libxml-ruby` and `ox` to the Gemfile, then run `bundle install`
# 2. osc api /source/openSUSE:Factory:Staging/dashboard/_history > src/api/history.xml
# 3. Put this file under `src/api/` and run the benchmarks with `ruby src/api/this_file.rb`
require 'benchmark'
require_relative './config/environment'
xml = File.read('history.xml')
Benchmark.bm do |benchmark|
benchmark.report("Nokogiri") do
ActiveSupport::XmlMini.backend = 'Nokogiri'
10.times do
Nokogiri::XML(xml).to_hash
end
end
benchmark.report("NokogiriSAX") do
ActiveSupport::XmlMini.backend = 'NokogiriSAX'
10.times do
Nokogiri::XML(xml).to_hash
end
end
benchmark.report("LibXML") do
ActiveSupport::XmlMini.backend = 'LibXML'
10.times do
Hash.from_xml(xml)
end
end
benchmark.report("LibXMLSAX") do
ActiveSupport::XmlMini.backend = 'LibXMLSAX'
10.times do
Hash.from_xml(xml)
end
end
benchmark.report("Xmlhash") do
10.times do
Xmlhash.parse(xml)
end
end
benchmark.report("Ox") do
10.times do
Ox.load(xml, mode: :hash)
end
end
# Read and convert to a Hash and core class objects only without capturing attributes.
# It might be interesting if we don't need the attributes in some specific cases.
benchmark.report("Ox without attributes") do
10.times do
Ox.load(xml, mode: :hash_no_attrs)
end
end
end The results:
|
If you already have nokogiri around, using it instead of the default rexml backend is surely wise - but for completness: I get different results for your XML and there are 2 more xml_mini backends to try:
But let's not compare artificial XML, let's benchmark the parsing of /source/openSUSE:Factory:Staging/dashboard/_history (10 times)
So for both measures, XmlMini is four to give times slower. Btw: your |
Would you be so kind to share the benchmarks themselves, so the actual Ruby script? Because numbers by themselves don't mean much. |
I just extended yours.
And for the history parsing, |
BTW: my interest in the dashboard _history parsing: visiting https://build.opensuse.org/package/show/openSUSE:Factory:Staging/dashboard loads and parses it for whatever reason |
Ox's returned hash is puzzling (as we experimented further with xmlhash I wanted to look how others handle it):
With xmlhash you do
With Ox's hash if you iterate you have to cope with the nils. So adopting the xmlhash usage to Ox would be quite involved I fear. |
#12406 for the big history parsing |
We currently have 2 gems for parsing XML. We either use
xmlhash
orNokogiri
. We should only useNokogiri
since, unlikexmlhash
, it is widely adopted by the Ruby community. It's the de facto gem for parsing XML.The text was updated successfully, but these errors were encountered: