Skip to content

Commit

Permalink
perf: use -O2 to compile libgumbo and the nokogiri extension
Browse files Browse the repository at this point in the history
My benchmarks show this generates code that is:

- 22% faster at HTML5 serialization
- 93% faster at HTML5 parsing

Note that `-O3` generates slightly slower HTML5 serialization
code (see 8220dc7).

Note also that this doesn't change the compiler options used for
libxml2 and libxslt (which includes `-O2` already).

Using the following benchmark script:

    #! /usr/bin/env ruby
    # coding: utf-8

    require "bundler/inline"

    gemfile do
      source "https://rubygems.org"
      gem "nokogiri", path: "."
      gem "benchmark-ips"
    end

    require "nokogiri"
    require "benchmark/ips"

    input = File.read("test/files/tlm.html")
    puts "input #{input.length} bytes"

    html4_doc = Nokogiri::HTML4::Document.parse(input)
    html5_doc = Nokogiri::HTML5::Document.parse(input)

    puts RUBY_DESCRIPTION

    Benchmark.ips do |x|
      x.time = 10
      x.report("html5 parse") do
        Nokogiri::HTML5::Document.parse(input)
      end
      x.report("html4 parse") do
        Nokogiri::HTML4::Document.parse(input)
      end
      x.compare!
    end

    Benchmark.ips do |x|
      x.time = 10
      x.report("html5 serialize") do
        html5_doc.to_html
      end
      x.report("html4 serialize") do
        html4_doc.to_html
      end
      x.compare!
    end

with default settings on my dev system
(which are `-O3` for extension and unspecified for libgumbo):

> input 70095 bytes
> ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
> Warming up --------------------------------------
>          html5 parse    12.000  i/100ms
>          html4 parse    31.000  i/100ms
> Calculating -------------------------------------
>          html5 parse    129.637  (±16.2%) i/s -      1.260k in  10.051475s
>          html4 parse    355.723  (±21.4%) i/s -      3.441k in  10.104502s
>
> Comparison:
>          html4 parse:      355.7 i/s
>          html5 parse:      129.6 i/s - 2.74x  (± 0.00) slower
>
> Warming up --------------------------------------
>      html5 serialize    85.000  i/100ms
>      html4 serialize   131.000  i/100ms
> Calculating -------------------------------------
>      html5 serialize    843.993  (± 2.4%) i/s -      8.500k in  10.076902s
>      html4 serialize      1.319k (± 2.9%) i/s -     13.231k in  10.039827s
>
> Comparison:
>      html4 serialize:     1319.0 i/s
>      html5 serialize:      844.0 i/s - 1.56x  (± 0.00) slower

after enabling `-O2` on both gumbo and nokogiri source files:

> input 70095 bytes
> ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
> Warming up --------------------------------------
>          html5 parse    21.000  i/100ms
>          html4 parse    36.000  i/100ms
> Calculating -------------------------------------
>          html5 parse    250.245  (±20.8%) i/s -      2.394k in  10.066381s
>          html4 parse    371.905  (±20.2%) i/s -      3.600k in  10.025980s
>
> Comparison:
>          html4 parse:      371.9 i/s
>          html5 parse:      250.2 i/s - same-ish: difference falls within error
>
> Warming up --------------------------------------
>      html5 serialize   101.000  i/100ms
>      html4 serialize   128.000  i/100ms
> Calculating -------------------------------------
>      html5 serialize      1.037k (± 3.3%) i/s -     10.403k in  10.042146s
>      html4 serialize      1.301k (± 4.2%) i/s -     13.056k in  10.055585s
>
> Comparison:
>      html4 serialize:     1300.8 i/s
>      html5 serialize:     1037.2 i/s - 1.25x  (± 0.00) slower
  • Loading branch information
flavorjones committed Aug 28, 2022
1 parent dbb228a commit c30b63b
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion ext/nokogiri/extconf.rb
Expand Up @@ -615,6 +615,9 @@ def do_clean
# errors/warnings. see #2302
append_cflags(["-std=c99", "-Wno-declaration-after-statement"])

# gumbo html5 serialization is slower with O3, let's make sure we use O2
append_cflags("-O2")

# always include debugging information
append_cflags("-g")

Expand Down Expand Up @@ -956,7 +959,7 @@ def install
end

def compile
cflags = concat_flags(ENV["CFLAGS"], "-fPIC", "-g")
cflags = concat_flags(ENV["CFLAGS"], "-fPIC", "-O2", "-g")

env = { "CC" => gcc_cmd, "CFLAGS" => cflags }
if config_cross_build?
Expand Down

0 comments on commit c30b63b

Please sign in to comment.