Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use transform_values to build hash_rows #37614

Merged
merged 2 commits into from
Jan 29, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 27 additions & 12 deletions activerecord/lib/active_record/result.rb
Original file line number Diff line number Diff line change
Expand Up @@ -145,21 +145,36 @@ def hash_rows
# used as keys in ActiveRecord::Base's @attributes hash
columns = @columns.map(&:-@)
length = columns.length
template = nil

@rows.map { |row|
# In the past we used Hash[columns.zip(row)]
# though elegant, the verbose way is much more efficient
# both time and memory wise cause it avoids a big array allocation
# this method is called a lot and needs to be micro optimised
hash = {}

index = 0
while index < length
hash[columns[index]] = row[index]
index += 1
if template
# We use transform_values to build subsequent rows from the
# hash of the first row. This is faster because we avoid any
# reallocs and in Ruby 2.7+ avoid hashing entirely.
index = -1
template.transform_values do
row[index += 1]
end
else
# In the past we used Hash[columns.zip(row)]
# though elegant, the verbose way is much more efficient
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this indentation be fixed?

# both time and memory wise cause it avoids a big array allocation
# this method is called a lot and needs to be micro optimised
hash = {}

index = 0
while index < length
hash[columns[index]] = row[index]
index += 1
end
Comment on lines +166 to +170
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vinistock since you've been doing some perf work, curious if you want to test if either of these are improvements speed/allocation wise:

hash = {}
0.upto length do |index|
  hash[columns[index]] = row[index]
end

hash = {}
columns.each_with_index do |column, index|
  hash[column] = row[index]
end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Absolutely! I benchmarked the current implementation vs the two options you suggested. For this benchmark, I removed the memoization using the instance variable.

It appears that the current implementation performs better than the others. And memory wise, it looks like they all perform equally.

Script

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  gem "rails", github: "rails/rails", require: "rails/all"
  gem "benchmark-ips"
  gem "benchmark-memory", require: "benchmark/memory"
end

module ActiveRecord
  class Result
    private

    def hash_rows
      columns = @columns.map(&:-@)
      length  = columns.length
      template = nil

      @rows.map { |row|
        if template
          index = -1
          template.transform_values do
            row[index += 1]
          end
        else
          hash = {}

          index = 0
          while index < length
            hash[columns[index]] = row[index]
            index += 1
          end

          template = hash if hash.length == length
          hash
        end
      }
    end

    def upto_hash_rows
      columns = @columns.map(&:-@)
      length  = columns.length
      template = nil

      @rows.map { |row|
        if template
          index = -1
          template.transform_values do
            row[index += 1]
          end
        else
          hash = {}
          0.upto length do |index|
            hash[columns[index]] = row[index]
          end

          template = hash if hash.length == length
          hash
        end
      }
    end

    def each_hash_rows
      columns = @columns.map(&:-@)
      length  = columns.length
      template = nil

      @rows.map { |row|
        if template
          index = -1
          template.transform_values do
            row[index += 1]
          end
        else
          hash = {}
          columns.each_with_index do |column, index|
            hash[column] = row[index]
          end

          template = hash if hash.length == length
          hash
        end
      }
    end
  end
end

result = ActiveRecord::Result.new((0..50).map { |i| "col_#{i}" }, (0..1000).map { |row| (0..50).map { |col| "row_#{row}_col_#{col}" } })

Benchmark.ips do |x|
  x.report("hash_rows")      { result.send(:hash_rows) }
  x.report("upto_hash_rows") { result.send(:upto_hash_rows) }
  x.report("each_hash_rows") { result.send(:each_hash_rows) }
  x.compare!
end

Benchmark.memory do |x|
  x.report("hash_rows")      { result.send(:hash_rows) }
  x.report("upto_hash_rows") { result.send(:upto_hash_rows) }
  x.report("each_hash_rows") { result.send(:each_hash_rows) }
  x.compare!
end

Results

Warming up --------------------------------------
           hash_rows    18.000  i/100ms
      upto_hash_rows    13.000  i/100ms
      each_hash_rows    25.000  i/100ms
Calculating -------------------------------------
           hash_rows    277.793  (±20.5%) i/s -      1.242k in   5.011782s
      upto_hash_rows    148.981  (±14.8%) i/s -    715.000  in   5.066219s
      each_hash_rows    277.071  (±20.2%) i/s -      1.275k in   5.059477s

Comparison:
           hash_rows:      277.8 i/s
      each_hash_rows:      277.1 i/s - same-ish: difference falls within error
      upto_hash_rows:      149.0 i/s - 1.86x  slower

Calculating -------------------------------------
           hash_rows     1.772M memsize (     0.000  retained)
                         1.054k objects (     0.000  retained)
                        50.000  strings (     0.000  retained)
      upto_hash_rows     1.772M memsize (     0.000  retained)
                         1.054k objects (     0.000  retained)
                        50.000  strings (     0.000  retained)
      each_hash_rows     1.772M memsize (     0.000  retained)
                         1.054k objects (     0.000  retained)
                        50.000  strings (     0.000  retained)

Comparison:
           hash_rows:    1772296 allocated
      upto_hash_rows:    1772296 allocated - same
      each_hash_rows:    1772296 allocated - same

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks!

           hash_rows:      277.8 i/s
      each_hash_rows:      277.1 i/s - same-ish: difference falls within error

Seems like the each_with_index version is just as fast? I do like that we'd remove the index local variable reuse between the two large conditional branches.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using while is a common optimization pattern to avoid each block evaluation.
That is a reason that didn't use each_with_index in this case.

Now, that part is only onece evaluated by resusing the first column hash template, so no longer extremely important the avoiding block evaluation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using while is a common optimization pattern to avoid each block evaluation.

Yeah, totally. It's just that if the block execution isn't slower and doesn't allocate more, then… it's not exactly optimizing that much? In any case I'm fine either way.


# It's possible to select the same column twice, in which case
# we can't use a template
template = hash if hash.length == length

hash
end

hash
}
end
end
Expand Down