PostgreSQL: Use a specialized Result object to defer retrieval of values #33312

larskanis · 2018-07-07T12:21:38Z

PG::Result stores values received from the database internally in a compact memory format. When the value is accessed, it is converted to a ruby string or casted to another ruby object.

While ActiveRecord::Result enforces the conversion of all rows and columns to ruby objects, this implementation defers conversion to when the single value is accessed. Therefore ActiveRecord::ConnectionAdapters::PostgreSQL::Result and PG::Tuple store a reference to the PG::Result object.

PG::Tuple caches all values after being materialized. It detaches the PG::Result object, when all columns have been materialized or when the object is marshaled.

The new Result object is disabled for pg versions < 1.1. This is because PG::Tuple is not yet available in older versions and because keeping the PG::Result around can lead to memory bloat with older pg versions. For the first issue I wrote a lightweight ruby implementation, but since it doesn't prevent memory bloat, I think it's best to make it pg-1.1 only.

This speedups wasteful queries (without explicit selecting required columns) by up to 50%. Calling column_type/get_oid_type only for necessary columns has another small performance advantage even if all columns are used.

pg-1.1.0 is not yet released, but I would like to get some feedback before. For now I patched the Gemfile to use pg from git.

cc @SamSaffron

rails-bot · 2018-07-07T12:21:53Z

r? @kaspth

(@rails-bot has picked a reviewer for you, use r? to override)

SamSaffron · 2018-07-08T22:58:28Z

I think to simplify the code here we should require PG 1.1 and up for Rails 6. I am pretty sure @matthewd / @rafaelfranca are fine with that.

SamSaffron · 2018-07-08T22:59:40Z

activerecord/lib/active_record/connection_adapters/postgresql/result.rb

+      @columns = @pg_result.fields
+    end
+
+    def initialize_copy(other)


hmm what is going on with this blank method?

This is called from the query cache. I'm uncertain why some of the data (not a deep copy of all the values) is duped in AR::Result. They are treated as read-only, so that there should be no need to dup them. Removing initialize_copy from AR::Result doesn't raise any failure in rake test:postgresql or rake test:sqlite3. I only added this method because I derived the class from AR::Result to satisfy some class based tests, but the base method isn't suitable for PostgreSQL::Result.

We could dup ruby objects here as well as kind of precaution. At least PG::Result is immutable, so that it doesn't need to be copied.

hmmm @tenderlove does it make sense to break off from this interface in Rails 6.0 and remove this. It is very odd. ... @larskanis at a minimum we should add a comment explaining why it is no-op.

SamSaffron · 2018-07-08T23:02:43Z

This is awesome so excited about this change!

tenderlove · 2018-07-09T23:06:05Z

It looks like ActiveRecord::ConnectionAdapters::PostgreSQL never calls clear on the underlying PG Result. I know the GC clears automatically, but I thought there was a reason we do it manually. And if there's a reason we do it manually, does that reason not apply to ActiveRecord::ConnectionAdapters::PostgreSQL too?

tenderlove · 2018-07-09T23:07:46Z

Ah, I think I remember the reason we do it manually: because of memory bloat that can be fixed by either using the Ruby allocators or informing the GC of the result size. @larskanis I know you were working on the later, does this branch use it?

SamSaffron · 2018-07-09T23:13:28Z

@tenderlove yes it is using the latest fix which handles the bloat problem using an estimator.

per:

ged/ruby-pg@14ebbef

PG mailing list have been awfully quiet about the complete fix here: https://www.postgresql.org/message-id/flat/fa16a288-9685-14f2-97c8-b8ac84365a4f%40greiz-reinsdorf.de we should bounce that topic. But that is going to take years till it is properly everywhere.

tenderlove · 2018-07-09T23:14:16Z

Nice!!

dillonwelch · 2018-09-28T21:42:50Z

Gemfile

@@ -124,7 +124,8 @@ platforms :ruby, :mswin, :mswin64, :mingw, :x64_mingw do
  gem "sqlite3", "~> 1.3.6"

  group :db do
-    gem "pg", ">= 0.18.0"
+    # gem "pg", ">= 0.18.0"
+    gem "pg", ">= 1.1.0.pre", github: "larskanis/ruby-pg", branch: "lazytuple"


Not sure if you're using something specific from the branch, but it looks like there's a general 1.1.3 version of pg available? https://rubygems.org/gems/pg/versions/1.1.3

larskanis · 2019-01-26T17:16:01Z

I finally managed to update the PR. pg-1.1.x is released since half a year, so that I removed the compatibility code for older versions and made pg-1.1 mandatory. I'll add some comments on specific lines to describe the rationale behind these changes. I hope we can discuss this PR then.

larskanis · 2019-01-26T17:18:41Z

actioncable/lib/action_cable/subscription_adapter/postgresql.rb

@@ -1,6 +1,6 @@
 # frozen_string_literal: true

-gem "pg", ">= 0.18", "< 2.0"
+gem "pg", "~> 1.1"


pg-1.1 isn't really required for ActionCable. I changed it just for consistency.

larskanis · 2019-01-26T17:27:45Z

activerecord/lib/active_record/result.rb

      end
+    end


column_types is replaced by column_type, so that type information can be retrieved for necessary columns only.

PG::Result stores values received from the database internally in a compact memory format. When the value is accessed, it is converted to a ruby string or casted to another ruby object. While ActiveRecord::Result enforces the conversion of all rows and columns to ruby objects, this implementation defers conversion to when the single value is accessed. Therefore ActiveRecord::ConnectionAdapters::PostgreSQL::Result and PG::Tuple store a reference to the PG::Result object. PG::Tuple caches all values after being materialized. It detaches the PG::Result object, when all columns have been materialized or when the object is marshaled. This raises the pg version constraint to ~> 1.1, which provides PG::Tuple and implements compensating controls to avoid memory bloat when PG::Result aren't cleared. [1] ActiveRecord::ConnectionAdapters::PostgreSQL::Result is derived from ActiveRecord::Result although almost all methods are re-implemented. This is to satisfy tests and to make it clear that it implements the same interface. This speedups wasteful queries (without explicit selecting required columns) by up to 50%. Calling column_type/get_oid_type only for necessary columns has another small performance advantage even if all columns are used. [1] https://samsaffron.com/archive/2018/06/13/ruby-x27-s-external-malloc-problem

larskanis · 2019-01-26T17:57:43Z

activerecord/lib/active_record/result.rb

+
+    def column_values(i)
+      @rows.map { |row| row[i] }
+    end


column_values allows to retrieve a single column. In PostgreSQL::Result this is implemented without allocation of Array<Array<value>> result set.

larskanis · 2019-01-26T18:08:56Z

activerecord/lib/active_record/connection_adapters/postgresql/result.rb

+        yield @pg_result.tuple(i)
+        i += 1
+      end
+    end


PostgreSQL::Result doesn't return Hash but PG::Tuple on each, [], first and last methods. PG::Tuple implements enough Hash-like functions to be used as a Hash replacement in ActiveRecord. It allows also Array like access. See: https://deveiate.org/code/pg/PG/Tuple.html

yoelblum · 2019-01-29T08:25:28Z

What a great effort! Can something similar be done with mysql?

SamSaffron · 2019-01-29T14:54:16Z

@tenderlove / @matthewd any chance you can review?

@larskanis any chance you can posts a few before / after micro benches?

larskanis · 2019-01-29T20:37:11Z

I updated the benchmark to the current master branch here. Essentially it's looks like:

# rails branch master:
#
# top 10 id / title PG      2.430k (± 1.7%) i/s -     12.272k in   5.051112s
# top 1000 id wasteful     36.095  (± 5.5%) i/s -    180.000  in   5.010574s
#      pluck 1000 mode    381.441  (± 1.3%) i/s -      1.938k in   5.081686s
#
# rails branch hyperrecord-v2:
#
# top 10 id / title PG      2.849k (± 1.0%) i/s -     14.278k in   5.012458s
# top 1000 id wasteful     61.599  (± 6.5%) i/s -    312.000  in   5.081960s
#      pluck 1000 mode    401.916  (± 1.2%) i/s -      2.028k in   5.046694s

So non-wasteful queries are a little bit faster due to less column lookups, but near to the measuring tolerance. Wasteful queries are pretty clear almost twice as fast. But please do your own benchmarks, since mine tend to be biased 😉

kamipo · 2019-01-30T04:44:50Z

activerecord/test/cases/adapter_test.rb

@@ -329,7 +329,7 @@ def test_select_methods_passing_a_association_relation
      author = Author.create!(name: "john")
      Post.create!(author: author, title: "foo", body: "bar")
      query = author.posts.where(title: "foo").select(:title)
-      assert_equal({ "title" => "foo" }, @connection.select_one(query))
+      assert_equal({ "title" => "foo" }, @connection.select_one(query).to_h)


Does this mean that this PR will change the public API?

Yes, as described above it changes the API. Even if PG::Tuple implements some Hash like methods, it never equals to a Hash, so to_h is required for comparison.

@rafaelfranca called ActiveRecord::Result a "public API by mistake". Although we have several scripts at work making use of this API, it never felt like a public API, since it is halfway between the model query interface and the raw_connection. It also changed several times in the past, due to the needs within AR.

If public API change is a concern, it is possible to exclude select_foo methods from this change and let only exec_foo methods return the specialized PostgreSQL::Result. I changed select_foo for consistency, but they are not used by model query methods and are therefore not so much relevant for performance.

SamSaffron · 2019-02-06T06:53:29Z

I just tested this on Discourse @larskanis with

git clone https://github.com/discourse/discourse.git
cd discourse 

# mangle Gemfile with
if rails_master?
  gem 'arel', git: 'https://github.com/rails/arel.git'
  gem 'rails', path: 'path/to/this/branch'
else

RAILS_MASTER=1 bundle
RAILS_MASTER=1 ruby script/bench.rb -i 100

Not noticing anything much different with or without the patch. Numbers are really close.

What this probably means is that over the years at Discourse we have optimised stuff around this limitation, not that the patch is bad or anything.

I still think this is a great change! just not expecting it to make too much of a diff for Discourse.

rails-bot · 2019-12-18T00:52:51Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

rails-bot assigned kaspth Jul 7, 2018

larskanis force-pushed the hyperrecord-v2 branch from dbf6dac to 4c9f704 Compare July 7, 2018 17:20

SamSaffron reviewed Jul 8, 2018

View reviewed changes

rafaelfranca added this to the 6.0.0 milestone Jul 9, 2018

gmcgibbon added activerecord PostgreSQL labels Sep 28, 2018

dillonwelch reviewed Sep 28, 2018

View reviewed changes

larskanis force-pushed the hyperrecord-v2 branch from 4c9f704 to bd423cb Compare January 26, 2019 14:03

rails-bot bot added actioncable railties labels Jan 26, 2019

larskanis commented Jan 26, 2019

View reviewed changes

larskanis force-pushed the hyperrecord-v2 branch from bd423cb to 30f6020 Compare January 26, 2019 17:35

larskanis force-pushed the hyperrecord-v2 branch from 30f6020 to 335efde Compare January 26, 2019 17:53

larskanis commented Jan 26, 2019

View reviewed changes

kamipo reviewed Jan 30, 2019

View reviewed changes

rafaelfranca removed this from the 6.0.0 milestone Apr 12, 2019

rails-bot bot added the stale label Dec 18, 2019

rails-bot bot closed this Dec 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PostgreSQL: Use a specialized Result object to defer retrieval of values #33312

PostgreSQL: Use a specialized Result object to defer retrieval of values #33312

larskanis commented Jul 7, 2018 •

edited

rails-bot commented Jul 7, 2018

SamSaffron commented Jul 8, 2018

SamSaffron Jul 8, 2018

larskanis Jul 9, 2018

SamSaffron Jul 9, 2018

SamSaffron commented Jul 8, 2018

tenderlove commented Jul 9, 2018

tenderlove commented Jul 9, 2018

SamSaffron commented Jul 9, 2018

tenderlove commented Jul 9, 2018

dillonwelch Sep 28, 2018

larskanis commented Jan 26, 2019

larskanis Jan 26, 2019

larskanis Jan 26, 2019

larskanis Jan 26, 2019

larskanis Jan 26, 2019

yoelblum commented Jan 29, 2019

SamSaffron commented Jan 29, 2019

larskanis commented Jan 29, 2019

kamipo Jan 30, 2019 •

edited

larskanis Jan 30, 2019

SamSaffron commented Feb 6, 2019

rails-bot bot commented Dec 18, 2019

PostgreSQL: Use a specialized Result object to defer retrieval of values #33312

PostgreSQL: Use a specialized Result object to defer retrieval of values #33312

Conversation

larskanis commented Jul 7, 2018 • edited

rails-bot commented Jul 7, 2018

SamSaffron commented Jul 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SamSaffron commented Jul 8, 2018

tenderlove commented Jul 9, 2018

tenderlove commented Jul 9, 2018

SamSaffron commented Jul 9, 2018

tenderlove commented Jul 9, 2018

Choose a reason for hiding this comment

larskanis commented Jan 26, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yoelblum commented Jan 29, 2019

SamSaffron commented Jan 29, 2019

larskanis commented Jan 29, 2019

kamipo Jan 30, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SamSaffron commented Feb 6, 2019

rails-bot bot commented Dec 18, 2019

larskanis commented Jul 7, 2018 •

edited

kamipo Jan 30, 2019 •

edited