-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a benchmark template #26792
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This replaces boilerplate in the “benchmark your code” section of the contributors’ guide with an executable template. I also amended the text to encourage best practices and codified it in the template. For now this is only good for relatively self-contained changes that can be inlined into a simple script. In the future, this can be expanded to cover how to measure the difference between two commits. The output looks like this: ``` ==================================== Empty ===================================== Warming up -------------------------------------- blank? 225.963k i/100ms fast_blank? 238.147k i/100ms Calculating ------------------------------------- blank? 8.825M (± 6.4%) i/s - 44.063M in 5.014824s fast_blank? 9.311M (± 6.3%) i/s - 46.439M in 5.009153s Comparison: fast_blank?: 9310694.8 i/s blank?: 8824801.7 i/s - same-ish: difference falls within error ================================= Single Space ================================= Warming up -------------------------------------- blank? 56.581k i/100ms fast_blank? 232.774k i/100ms Calculating ------------------------------------- blank? 813.985k (±16.7%) i/s - 4.017M in 5.076576s fast_blank? 9.547M (± 5.2%) i/s - 47.719M in 5.013204s Comparison: fast_blank?: 9547414.0 i/s blank?: 813985.0 i/s - 11.73x slower ================================== Two Spaces ================================== Warming up -------------------------------------- blank? 58.265k i/100ms fast_blank? 244.056k i/100ms Calculating ------------------------------------- blank? 823.343k (±16.2%) i/s - 4.020M in 5.014213s fast_blank? 9.484M (± 4.9%) i/s - 47.347M in 5.005339s Comparison: fast_blank?: 9484021.6 i/s blank?: 823343.1 i/s - 11.52x slower =============================== Mixed Whitspaces =============================== Warming up -------------------------------------- blank? 53.919k i/100ms fast_blank? 237.103k i/100ms Calculating ------------------------------------- blank? 763.435k (±16.8%) i/s - 3.720M in 5.018029s fast_blank? 9.672M (± 5.8%) i/s - 48.369M in 5.019356s Comparison: fast_blank?: 9672467.2 i/s blank?: 763435.4 i/s - 12.67x slower =============================== Very Long String =============================== Warming up -------------------------------------- blank? 34.037k i/100ms fast_blank? 240.366k i/100ms Calculating ------------------------------------- blank? 409.731k (± 8.9%) i/s - 2.042M in 5.028235s fast_blank? 9.794M (± 4.3%) i/s - 49.035M in 5.016328s Comparison: fast_blank?: 9794225.2 i/s blank?: 409731.4 i/s - 23.90x slower ```
6a7b206
to
f2f9b88
Compare
@fxn also reviewed this in-person 😄 |
Nice! ❤️ |
Also worth noting: looping over scenarios inside of a micro-benchmark can skew measurements. Writing the loop around the benchmark, as codified in this PR, is more accurate. But, when you want aggregated stats, this style can make evaluating performance cumbersome. I wrote a gem (repo) to help with this, which I hope could be useful. Some examples: WEIGHTED_SCENARIOS = [
# 20% empty strings
"", "",
# 20% short blank strings
" ", " \n",
# 10% long blank strings
" " * 100,
# 50% non-blank strings of various lengths
"abc", "xyz", "abcxyz", "abc xyz"
]
Benchmark.inputs(WEIGHTED_SCENARIOS) do |x|
x.report('blank?') {|value| value.blank? }
x.report('fast_blank?') {|value| value.fast_blank? }
x.compare!
end
# OUTPUT:
#
# blank?
# 883983.8 i/s (±8.41%)
# fast_blank?
# 10418275.3 i/s (±2.35%)
#
# Comparison:
# fast_blank?: 10418275.3 i/s
# blank?: 883983.8 i/s - 11.79x slower AGGREGATED_SCENARIOS = {
"Empty" => [""],
"Short Blank" => [" ", " ", " \t\r\n"],
"Long Blank" => [" " * 20, " " * 100],
"Short Non-blank" => ["abc", "abc xyz"],
"Long Non-blank" => ["x" * 20, "x" * 100],
}
AGGREGATED_SCENARIOS.each_pair do |name, values|
puts
puts " #{name} ".center(80, "=")
puts
Benchmark.inputs(values) do |x|
x.report('blank?') {|value| value.blank? }
x.report('fast_blank?') {|value| value.fast_blank? }
x.compare!
end
end
# OUTPUT:
#
# ==================================== Empty =====================================
#
# blank?
# 9173564.6 i/s (±2.49%)
# fast_blank?
# 10144075.8 i/s (±5.31%)
#
# Comparison:
# fast_blank?: 10144075.8 i/s
# blank?: 9173564.6 i/s - 1.11x slower
#
#
# ================================= Short Blank ==================================
#
# blank?
# 619318.1 i/s (±10.54%)
# fast_blank?
# 10222090.9 i/s (±3.30%)
#
# Comparison:
# fast_blank?: 10222090.9 i/s
# blank?: 619318.1 i/s - 16.51x slower
#
#
# ================================== Long Blank ==================================
#
# blank?
# 374167.5 i/s (±7.30%)
# fast_blank?
# 10409157.6 i/s (±6.49%)
#
# Comparison:
# fast_blank?: 10409157.6 i/s
# blank?: 374167.5 i/s - 27.82x slower
#
#
# =============================== Short Non-blank ================================
#
# blank?
# 1294028.6 i/s (±5.78%)
# fast_blank?
# 9928018.4 i/s (±4.39%)
#
# Comparison:
# fast_blank?: 9928018.4 i/s
# blank?: 1294028.6 i/s - 7.67x slower
#
#
# ================================ Long Non-blank ================================
#
# blank?
# 1256066.8 i/s (±5.38%)
# fast_blank?
# 10460655.2 i/s (±3.16%)
#
# Comparison:
# fast_blank?: 10460655.2 i/s
# blank?: 1256066.8 i/s - 8.33x slower |
That's pretty nice! I think the ideal approach -something I am hoping to try for my inflector changes - is to record the actual calls from production and replay them (in the same order/frequency) in the benchmarks. I think that is the most realistic way to do it, but it might make sharing the dataset more difficult since it may contain sensitive information. |
This replaces boilerplate in the “benchmark your code” section of the contributors’ guide with an executable template. I also amended the text to encourage best practices and codified it in the template.
For now this is only good for relatively self-contained changes that can be inlined into a simple script. In the future, this can be expanded to cover how to measure the difference between two commits.
The output looks like this: