Use transform_values to build hash_rows #37614

jhawthorn · 2019-10-31T17:24:22Z

This commit changes hash_rows to build the first row as it did before, and then to use transform_values to build subsequent rows, using the first as a template.

In Ruby 2.4+ (first version to include transform_values) this is a tiny bit faster because the hash will never be realloc'd and the hash update logic is probably slightly simpler than add.

In Ruby 2.7+ this is a fair bit faster because transform_values is able to run without hashing the keys and only iterates over the values list.

In the case that we only have one row this is only one extra lvar set/read and the condition.

Benchmark

# frozen_string_literal: true
require "active_support/all"

columns = %w[id author_id section_id title byline body published_at created_at updated_at].map(&:-@)

row = [1, 2, 3, "Top 10 tricks to speed up hashes", "by John Hawthorn", "transform_values is quick and even faster in Ruby 2.7. Try it today!", Time.now, Time.now, Time.now]
rows = [row] * ENV.fetch('N', 10).to_i

def hash_like_rails_6_0(columns, row)
  length = columns.length
  hash = {}

  index = 0
  while index < length
    hash[columns[index]] = row[index]
    index += 1
  end

  hash
end

Benchmark.ips do |x|
  x.report "Rails 6.0" do
    rows.map { |row| hash_like_rails_6_0(columns, row) }
  end

  x.report "transform_values" do
    template = nil
    rows.map do |row|
      if template
        index = -1
        template.transform_values do
          row[index += 1]
        end
      else
        template = hash_like_rails_6_0(columns, row)
      end
    end
  end
end

I made a graph to show the current Rails 6.0 implementation vs transform_values on Ruby 2.6 and 2.7. Interestingly, the current Rails 6.0 implementation is actually slower on Ruby 2.7 (a regression I'm looking into), but that shouldn't affect the motivation for this change.

This is about 7% faster on Ruby 2.6 and Ruby 2.7+transform_values is 25% faster than Ruby 2.6+current implementation (using this measurement because Ruby 2.7's hash perf regression)

kaspth · 2019-10-31T17:40:35Z

row = [1, 2, 3, "Top 10 tricks to speed up hashes", "by John Hawthorn", "transform_values is quick and even faster in Ruby 2.7. Try it today!", Time.now, Time.now, Time.now]

lol! Very much approved on this, though I'm not seeing where template is assigned to yet. I'm guessing that's coming soon.

This commit changes hash_rows to build the first row as it did before, and then to use transform_values to build subsequent rows, using the first as a template. In Ruby 2.4+ (first version to include transform_values) this is marginally faster because the hash will never be realloc'd and the hash update logic is probably slightly simpler than add. In Ruby 2.7+ this is a fair bit faster because transform_values is able to run without hashing the keys and only iterates over the values list. In the case that we only have one row this is only one extra lvar set/read and the condition.

jhawthorn · 2019-10-31T17:55:26Z

I'm not seeing where template is assigned to yet

🤦‍♂ fixed

kaspth

Sweet!

p8 · 2019-10-31T21:19:51Z

activerecord/lib/active_record/result.rb

+                end
+              else
+                # In the past we used Hash[columns.zip(row)]
+                #  though elegant, the verbose way is much more efficient


Should this indentation be fixed?

jhawthorn · 2019-10-31T22:07:37Z

Tracked the 2.7 performance difference down to this commit ruby/ruby@72825c3 so the performance difference is intentional and we're just hitting the case where it's slower (hash construction) and my 9-column example happens to be the worst case 😅. In most cases that optimization is probably well worth it.

rails-bot · 2020-01-29T22:48:37Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

kaspth · 2020-01-30T00:51:07Z

activerecord/lib/active_record/result.rb

+                index = 0
+                while index < length
+                  hash[columns[index]] = row[index]
+                  index += 1
+                end


@vinistock since you've been doing some perf work, curious if you want to test if either of these are improvements speed/allocation wise:

hash = {} 0.upto length do |index| hash[columns[index]] = row[index] end hash = {} columns.each_with_index do |column, index| hash[column] = row[index] end

Absolutely! I benchmarked the current implementation vs the two options you suggested. For this benchmark, I removed the memoization using the instance variable.

It appears that the current implementation performs better than the others. And memory wise, it looks like they all perform equally.

Script

# frozen_string_literal: true require "bundler/inline" gemfile(true) do source "https://rubygems.org" git_source(:github) { |repo| "https://github.com/#{repo}.git" } gem "rails", github: "rails/rails", require: "rails/all" gem "benchmark-ips" gem "benchmark-memory", require: "benchmark/memory" end module ActiveRecord class Result private def hash_rows columns = @columns.map(&:-@) length = columns.length template = nil @rows.map { |row| if template index = -1 template.transform_values do row[index += 1] end else hash = {} index = 0 while index < length hash[columns[index]] = row[index] index += 1 end template = hash if hash.length == length hash end } end def upto_hash_rows columns = @columns.map(&:-@) length = columns.length template = nil @rows.map { |row| if template index = -1 template.transform_values do row[index += 1] end else hash = {} 0.upto length do |index| hash[columns[index]] = row[index] end template = hash if hash.length == length hash end } end def each_hash_rows columns = @columns.map(&:-@) length = columns.length template = nil @rows.map { |row| if template index = -1 template.transform_values do row[index += 1] end else hash = {} columns.each_with_index do |column, index| hash[column] = row[index] end template = hash if hash.length == length hash end } end end end result = ActiveRecord::Result.new((0..50).map { |i| "col_#{i}" }, (0..1000).map { |row| (0..50).map { |col| "row_#{row}_col_#{col}" } }) Benchmark.ips do |x| x.report("hash_rows") { result.send(:hash_rows) } x.report("upto_hash_rows") { result.send(:upto_hash_rows) } x.report("each_hash_rows") { result.send(:each_hash_rows) } x.compare! end Benchmark.memory do |x| x.report("hash_rows") { result.send(:hash_rows) } x.report("upto_hash_rows") { result.send(:upto_hash_rows) } x.report("each_hash_rows") { result.send(:each_hash_rows) } x.compare! end

Results

Warming up -------------------------------------- hash_rows 18.000 i/100ms upto_hash_rows 13.000 i/100ms each_hash_rows 25.000 i/100ms Calculating ------------------------------------- hash_rows 277.793 (±20.5%) i/s - 1.242k in 5.011782s upto_hash_rows 148.981 (±14.8%) i/s - 715.000 in 5.066219s each_hash_rows 277.071 (±20.2%) i/s - 1.275k in 5.059477s Comparison: hash_rows: 277.8 i/s each_hash_rows: 277.1 i/s - same-ish: difference falls within error upto_hash_rows: 149.0 i/s - 1.86x slower Calculating ------------------------------------- hash_rows 1.772M memsize ( 0.000 retained) 1.054k objects ( 0.000 retained) 50.000 strings ( 0.000 retained) upto_hash_rows 1.772M memsize ( 0.000 retained) 1.054k objects ( 0.000 retained) 50.000 strings ( 0.000 retained) each_hash_rows 1.772M memsize ( 0.000 retained) 1.054k objects ( 0.000 retained) 50.000 strings ( 0.000 retained) Comparison: hash_rows: 1772296 allocated upto_hash_rows: 1772296 allocated - same each_hash_rows: 1772296 allocated - same

Awesome, thanks!

hash_rows: 277.8 i/s each_hash_rows: 277.1 i/s - same-ish: difference falls within error

Seems like the each_with_index version is just as fast? I do like that we'd remove the index local variable reuse between the two large conditional branches.

Using while is a common optimization pattern to avoid each block evaluation.
That is a reason that didn't use each_with_index in this case.

Now, that part is only onece evaluated by resusing the first column hash template, so no longer extremely important the avoiding block evaluation.

Using while is a common optimization pattern to avoid each block evaluation.

Yeah, totally. It's just that if the block execution isn't slower and doesn't allocate more, then… it's not exactly optimizing that much? In any case I'm fine either way.

Last big refactor was in rails#37614. Somewhat extracted from rails#51744. The concern about columns with the same name didn't make much sense to me because in both code paths, the last one wins, so we can simplify the whole methods. Additionally by implementing `columns_index`, we have a decent template always available.

rails-bot bot added the activerecord label Oct 31, 2019

jhawthorn force-pushed the transform_values branch from b050aa7 to d495eae Compare October 31, 2019 17:52

Only use a template when no duplicate columns

911b600

kaspth approved these changes Oct 31, 2019

View reviewed changes

p8 reviewed Oct 31, 2019

View reviewed changes

rails-bot bot added the stale label Jan 29, 2020

jhawthorn merged commit c801763 into rails:master Jan 29, 2020

jhawthorn deleted the transform_values branch January 29, 2020 23:11

kaspth reviewed Jan 30, 2020

View reviewed changes

eugeneius removed the stale label Nov 15, 2020

casperisfine mentioned this pull request May 7, 2024

Simplify ActiveRecord::Result#hash_rows #51752

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use transform_values to build hash_rows #37614

Use transform_values to build hash_rows #37614

jhawthorn commented Oct 31, 2019 •

edited

kaspth commented Oct 31, 2019

jhawthorn commented Oct 31, 2019

kaspth left a comment

p8 Oct 31, 2019

jhawthorn commented Oct 31, 2019 •

edited

rails-bot bot commented Jan 29, 2020

kaspth Jan 30, 2020

vinistock Jan 30, 2020

kaspth Jan 30, 2020

kamipo Jan 30, 2020

kaspth Jan 31, 2020

Use transform_values to build hash_rows #37614

Use transform_values to build hash_rows #37614

Conversation

jhawthorn commented Oct 31, 2019 • edited

Benchmark

kaspth commented Oct 31, 2019

jhawthorn commented Oct 31, 2019

kaspth left a comment

Choose a reason for hiding this comment

p8 Oct 31, 2019

Choose a reason for hiding this comment

jhawthorn commented Oct 31, 2019 • edited

rails-bot bot commented Jan 29, 2020

kaspth Jan 30, 2020

Choose a reason for hiding this comment

vinistock Jan 30, 2020

Choose a reason for hiding this comment

Script

Results

kaspth Jan 30, 2020

Choose a reason for hiding this comment

kamipo Jan 30, 2020

Choose a reason for hiding this comment

kaspth Jan 31, 2020

Choose a reason for hiding this comment

jhawthorn commented Oct 31, 2019 •

edited

jhawthorn commented Oct 31, 2019 •

edited