Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a benchmark template #26792

Merged
merged 1 commit into from
Oct 15, 2016
Merged

Introduce a benchmark template #26792

merged 1 commit into from
Oct 15, 2016

Conversation

chancancode
Copy link
Member

This replaces boilerplate in the “benchmark your code” section of the contributors’ guide with an executable template. I also amended the text to encourage best practices and codified it in the template.

For now this is only good for relatively self-contained changes that can be inlined into a simple script. In the future, this can be expanded to cover how to measure the difference between two commits.

The output looks like this:

==================================== Empty =====================================

Warming up --------------------------------------
              blank?   225.963k i/100ms
         fast_blank?   238.147k i/100ms
Calculating -------------------------------------
              blank?      8.825M (± 6.4%) i/s -     44.063M in   5.014824s
         fast_blank?      9.311M (± 6.3%) i/s -     46.439M in   5.009153s

Comparison:
         fast_blank?:  9310694.8 i/s
              blank?:  8824801.7 i/s - same-ish: difference falls within error


================================= Single Space =================================

Warming up --------------------------------------
              blank?    56.581k i/100ms
         fast_blank?   232.774k i/100ms
Calculating -------------------------------------
              blank?    813.985k (±16.7%) i/s -      4.017M in   5.076576s
         fast_blank?      9.547M (± 5.2%) i/s -     47.719M in   5.013204s

Comparison:
         fast_blank?:  9547414.0 i/s
              blank?:   813985.0 i/s - 11.73x  slower


================================== Two Spaces ==================================

Warming up --------------------------------------
              blank?    58.265k i/100ms
         fast_blank?   244.056k i/100ms
Calculating -------------------------------------
              blank?    823.343k (±16.2%) i/s -      4.020M in   5.014213s
         fast_blank?      9.484M (± 4.9%) i/s -     47.347M in   5.005339s

Comparison:
         fast_blank?:  9484021.6 i/s
              blank?:   823343.1 i/s - 11.52x  slower


=============================== Mixed Whitspaces ===============================

Warming up --------------------------------------
              blank?    53.919k i/100ms
         fast_blank?   237.103k i/100ms
Calculating -------------------------------------
              blank?    763.435k (±16.8%) i/s -      3.720M in   5.018029s
         fast_blank?      9.672M (± 5.8%) i/s -     48.369M in   5.019356s

Comparison:
         fast_blank?:  9672467.2 i/s
              blank?:   763435.4 i/s - 12.67x  slower


=============================== Very Long String ===============================

Warming up --------------------------------------
              blank?    34.037k i/100ms
         fast_blank?   240.366k i/100ms
Calculating -------------------------------------
              blank?    409.731k (± 8.9%) i/s -      2.042M in   5.028235s
         fast_blank?      9.794M (± 4.3%) i/s -     49.035M in   5.016328s

Comparison:
         fast_blank?:  9794225.2 i/s
              blank?:   409731.4 i/s - 23.90x  slower

@chancancode chancancode changed the title Introduce a benchmark template [ci skip] Introduce a benchmark template Oct 15, 2016
Copy link
Member

@tenderlove tenderlove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

This replaces boilerplate in the “benchmark your code” section of the
contributors’ guide with an executable template. I also amended the text
to encourage best practices and codified it in the template.

For now this is only good for relatively self-contained changes that can
be inlined into a simple script. In the future, this can be expanded to
cover how to measure the difference between two commits.

The output looks like this:

```
==================================== Empty =====================================

Warming up --------------------------------------
              blank?   225.963k i/100ms
         fast_blank?   238.147k i/100ms
Calculating -------------------------------------
              blank?      8.825M (± 6.4%) i/s -     44.063M in   5.014824s
         fast_blank?      9.311M (± 6.3%) i/s -     46.439M in   5.009153s

Comparison:
         fast_blank?:  9310694.8 i/s
              blank?:  8824801.7 i/s - same-ish: difference falls within error


================================= Single Space =================================

Warming up --------------------------------------
              blank?    56.581k i/100ms
         fast_blank?   232.774k i/100ms
Calculating -------------------------------------
              blank?    813.985k (±16.7%) i/s -      4.017M in   5.076576s
         fast_blank?      9.547M (± 5.2%) i/s -     47.719M in   5.013204s

Comparison:
         fast_blank?:  9547414.0 i/s
              blank?:   813985.0 i/s - 11.73x  slower


================================== Two Spaces ==================================

Warming up --------------------------------------
              blank?    58.265k i/100ms
         fast_blank?   244.056k i/100ms
Calculating -------------------------------------
              blank?    823.343k (±16.2%) i/s -      4.020M in   5.014213s
         fast_blank?      9.484M (± 4.9%) i/s -     47.347M in   5.005339s

Comparison:
         fast_blank?:  9484021.6 i/s
              blank?:   823343.1 i/s - 11.52x  slower


=============================== Mixed Whitspaces ===============================

Warming up --------------------------------------
              blank?    53.919k i/100ms
         fast_blank?   237.103k i/100ms
Calculating -------------------------------------
              blank?    763.435k (±16.8%) i/s -      3.720M in   5.018029s
         fast_blank?      9.672M (± 5.8%) i/s -     48.369M in   5.019356s

Comparison:
         fast_blank?:  9672467.2 i/s
              blank?:   763435.4 i/s - 12.67x  slower


=============================== Very Long String ===============================

Warming up --------------------------------------
              blank?    34.037k i/100ms
         fast_blank?   240.366k i/100ms
Calculating -------------------------------------
              blank?    409.731k (± 8.9%) i/s -      2.042M in   5.028235s
         fast_blank?      9.794M (± 4.3%) i/s -     49.035M in   5.016328s

Comparison:
         fast_blank?:  9794225.2 i/s
              blank?:   409731.4 i/s - 23.90x  slower
```
@chancancode
Copy link
Member Author

@fxn also reviewed this in-person 😄

@chancancode chancancode merged commit 0bf90fa into master Oct 15, 2016
@chancancode chancancode deleted the benchmark-template branch October 15, 2016 10:36
@kaspth
Copy link
Contributor

kaspth commented Oct 15, 2016

Nice! ❤️

@jonathanhefner
Copy link
Member

It is very easy to make an optimization that improves performance for a specific scenario you care about but regresses on other common cases. Therefore, you should test your change against a list of representative scenarios.

Also worth noting: looping over scenarios inside of a micro-benchmark can skew measurements. Writing the loop around the benchmark, as codified in this PR, is more accurate. But, when you want aggregated stats, this style can make evaluating performance cumbersome. I wrote a gem (repo) to help with this, which I hope could be useful.

Some examples:

WEIGHTED_SCENARIOS = [
  # 20% empty strings
  "", "",
  # 20% short blank strings
  " ", " \n",
  # 10% long blank strings
  " " * 100,
  # 50% non-blank strings of various lengths
  "abc", "xyz", "abcxyz", "abc xyz"
]

Benchmark.inputs(WEIGHTED_SCENARIOS) do |x|
  x.report('blank?')      {|value| value.blank? }
  x.report('fast_blank?') {|value| value.fast_blank? }
  x.compare!
end

# OUTPUT:
#
# blank?
#   883983.8 i/s (±8.41%)
# fast_blank?
#   10418275.3 i/s (±2.35%)
# 
# Comparison:
#   fast_blank?:  10418275.3 i/s
#        blank?:    883983.8 i/s - 11.79x slower
AGGREGATED_SCENARIOS = {
  "Empty"            => [""],
  "Short Blank"      => [" ", "  ", " \t\r\n"],
  "Long Blank"       => [" " * 20, " " * 100],
  "Short Non-blank"  => ["abc", "abc xyz"],
  "Long Non-blank"   => ["x" * 20, "x" * 100],
}

AGGREGATED_SCENARIOS.each_pair do |name, values|
  puts
  puts " #{name} ".center(80, "=")
  puts

  Benchmark.inputs(values) do |x|
    x.report('blank?')      {|value| value.blank? }
    x.report('fast_blank?') {|value| value.fast_blank? }
    x.compare!
  end
end

# OUTPUT:
#
# ==================================== Empty =====================================
# 
# blank?
#   9173564.6 i/s (±2.49%)
# fast_blank?
#   10144075.8 i/s (±5.31%)
# 
# Comparison:
#   fast_blank?:  10144075.8 i/s
#        blank?:   9173564.6 i/s - 1.11x slower
# 
# 
# ================================= Short Blank ==================================
# 
# blank?
#   619318.1 i/s (±10.54%)
# fast_blank?
#   10222090.9 i/s (±3.30%)
# 
# Comparison:
#   fast_blank?:  10222090.9 i/s
#        blank?:    619318.1 i/s - 16.51x slower
# 
# 
# ================================== Long Blank ==================================
# 
# blank?
#   374167.5 i/s (±7.30%)
# fast_blank?
#   10409157.6 i/s (±6.49%)
# 
# Comparison:
#   fast_blank?:  10409157.6 i/s
#        blank?:    374167.5 i/s - 27.82x slower
# 
# 
# =============================== Short Non-blank ================================
# 
# blank?
#   1294028.6 i/s (±5.78%)
# fast_blank?
#   9928018.4 i/s (±4.39%)
# 
# Comparison:
#   fast_blank?:   9928018.4 i/s
#        blank?:   1294028.6 i/s - 7.67x slower
# 
# 
# ================================ Long Non-blank ================================
# 
# blank?
#   1256066.8 i/s (±5.38%)
# fast_blank?
#   10460655.2 i/s (±3.16%)
# 
# Comparison:
#   fast_blank?:  10460655.2 i/s
#        blank?:   1256066.8 i/s - 8.33x slower

@chancancode
Copy link
Member Author

That's pretty nice! I think the ideal approach -something I am hoping to try for my inflector changes - is to record the actual calls from production and replay them (in the same order/frequency) in the benchmarks. I think that is the most realistic way to do it, but it might make sharing the dataset more difficult since it may contain sensitive information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants