Skip to content

Improve performance of #uniq across a large number of nodes #206

Merged
merged 1 commit into from Aug 30, 2013

2 participants

@SamSaffron

Been chasing performance regressions in arel after upgrading to rails 4. This fixes

#205

Before:

image

After:

image

This is a mild improvement, I am chasing 10 missing milliseconds so every one counts :)

For context, Rails 3 has a way faster arel ... here is the same page in Rails 3.

image

@ernie
ernie commented Aug 30, 2013

I'm +1 on this, but would appreciate some clarification on "breaking hashes in parent relations" since from my perspective this makes #hash less technically correct. If it's creating a perf issue then we definitely need to fix it and let #eql? handle the false positives, but I was only partially following the twitter thread yesterday. My head was somewhere else, but some of the gists didn't seem to have enough info to help me understand what you mean in the comment.

@ernie
ernie commented Aug 30, 2013

Hm. Never mind. I read #205 and believe I understand what you mean. I was unaware that Array#hash behaved this way. By "this way" I am assuming that it stops once an element has reference loops, which is what causes the (expensive) false positives.

@SamSaffron

Array#hash short circuits and starts returning dupe hashes if there is a circular ref in an internal hash, it seems (which kind of makes sense)

Yeah, should have posted this code here :)

fake = Arel::Nodes::SqlLiteral.new
fake << "BAR"
fake2 = Arel::Nodes::SqlLiteral.new
fake2 << "FOO"
table = Arel::Table.new("T")
table_alias = Arel::Nodes::TableAlias.new(table, "A")
table.aliases << table_alias
attrib = Arel::Attributes::Attribute.new(table_alias,"B")


p [attrib, fake].hash
p [attrib, fake2].hash

# 938249829400974257
# 938249829400974257

Should be fairly easy to generate a standalone dupe of this, if you are interested.

@SamSaffron

@ernie FYI this is all part of my push to figure out why Rails 4 was 20% slower than Rails 3 for Discourse, been hacking at this problem tiny percents at a time, making some improvement, but still a way to go to reach Rails 3 perf .. see (got a few patches in AR and found that sprockets upgrade is stuck with a corrupt tmp/cache)

http://meta.discourse.org/t/when-should-we-upgrade-to-rails-4/4046/53?u=sam

and

http://meta.discourse.org/t/benchmarking-discourse-locally/9070?u=sam

Any help much appreciated

@ernie ernie merged commit a1a46fd into rails:master Aug 30, 2013

1 check passed

Details default The Travis CI build passed
@ernie
ernie commented Aug 30, 2013

Thanks for elaborating -- will pick up the Twitter discussion later. :)

@SamSaffron

@ernie one thing that may be worth considering is including @table_alias in the hash (if it is guaranteed to be a string and does not loop, possibly to protect certain self joins), I am unclear of the role of @aliases vs @table_alias and the tests are quite thin in describing it.

Is @table_alias simply a relic of the past in light of TableAlias?

@ernie
ernie commented Aug 30, 2013

@SamSaffron yeah, basically. You can see comments in the initializer to that effect. If you're ever curious about the use of a node, a great place to look is the ToSql visitor. In this case, you'll see:

      def visit_Arel_Table o, a
        if o.table_alias
          "#{quote_table_name o.name} #{quote_table_name o.table_alias}"
        else
          quote_table_name o.name
        end
      end

      def visit_Arel_Nodes_TableAlias o, a
        "#{visit o.relation, a} #{quote_table_name o.name}"
      end

As you can see, they result in the same SQL. There are similar relics in Function nodes, from a time when the aliasing was handled there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.