Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PostgreSQL: Use a specialized Result object to defer retrieval of values #33312

Closed
wants to merge 1 commit into from

Conversation

larskanis
Copy link
Contributor

@larskanis larskanis commented Jul 7, 2018

PG::Result stores values received from the database internally in a compact memory format. When the value is accessed, it is converted to a ruby string or casted to another ruby object.

While ActiveRecord::Result enforces the conversion of all rows and columns to ruby objects, this implementation defers conversion to when the single value is accessed. Therefore ActiveRecord::ConnectionAdapters::PostgreSQL::Result and PG::Tuple store a reference to the PG::Result object.

PG::Tuple caches all values after being materialized. It detaches the PG::Result object, when all columns have been materialized or when the object is marshaled.

The new Result object is disabled for pg versions < 1.1. This is because PG::Tuple is not yet available in older versions and because keeping the PG::Result around can lead to memory bloat with older pg versions. For the first issue I wrote a lightweight ruby implementation, but since it doesn't prevent memory bloat, I think it's best to make it pg-1.1 only.

This speedups wasteful queries (without explicit selecting required columns) by up to 50%. Calling column_type/get_oid_type only for necessary columns has another small performance advantage even if all columns are used.

pg-1.1.0 is not yet released, but I would like to get some feedback before. For now I patched the Gemfile to use pg from git.

cc @SamSaffron

@rails-bot
Copy link

r? @kaspth

(@rails-bot has picked a reviewer for you, use r? to override)

@SamSaffron
Copy link
Contributor

I think to simplify the code here we should require PG 1.1 and up for Rails 6. I am pretty sure @matthewd / @rafaelfranca are fine with that.

@columns = @pg_result.fields
end

def initialize_copy(other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm what is going on with this blank method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called from the query cache. I'm uncertain why some of the data (not a deep copy of all the values) is duped in AR::Result. They are treated as read-only, so that there should be no need to dup them. Removing initialize_copy from AR::Result doesn't raise any failure in rake test:postgresql or rake test:sqlite3. I only added this method because I derived the class from AR::Result to satisfy some class based tests, but the base method isn't suitable for PostgreSQL::Result.

We could dup ruby objects here as well as kind of precaution. At least PG::Result is immutable, so that it doesn't need to be copied.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm @tenderlove does it make sense to break off from this interface in Rails 6.0 and remove this. It is very odd. ... @larskanis at a minimum we should add a comment explaining why it is no-op.

@SamSaffron
Copy link
Contributor

This is awesome so excited about this change!

@rafaelfranca rafaelfranca added this to the 6.0.0 milestone Jul 9, 2018
@tenderlove
Copy link
Member

It looks like ActiveRecord::ConnectionAdapters::PostgreSQL never calls clear on the underlying PG Result. I know the GC clears automatically, but I thought there was a reason we do it manually. And if there's a reason we do it manually, does that reason not apply to ActiveRecord::ConnectionAdapters::PostgreSQL too?

@tenderlove
Copy link
Member

Ah, I think I remember the reason we do it manually: because of memory bloat that can be fixed by either using the Ruby allocators or informing the GC of the result size. @larskanis I know you were working on the later, does this branch use it?

@SamSaffron
Copy link
Contributor

@tenderlove yes it is using the latest fix which handles the bloat problem using an estimator.

per:

ged/ruby-pg@14ebbef

PG mailing list have been awfully quiet about the complete fix here: https://www.postgresql.org/message-id/flat/fa16a288-9685-14f2-97c8-b8ac84365a4f%40greiz-reinsdorf.de we should bounce that topic. But that is going to take years till it is properly everywhere.

@tenderlove
Copy link
Member

Nice!!

Gemfile Outdated
@@ -124,7 +124,8 @@ platforms :ruby, :mswin, :mswin64, :mingw, :x64_mingw do
gem "sqlite3", "~> 1.3.6"

group :db do
gem "pg", ">= 0.18.0"
# gem "pg", ">= 0.18.0"
gem "pg", ">= 1.1.0.pre", github: "larskanis/ruby-pg", branch: "lazytuple"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if you're using something specific from the branch, but it looks like there's a general 1.1.3 version of pg available? https://rubygems.org/gems/pg/versions/1.1.3

@larskanis
Copy link
Contributor Author

I finally managed to update the PR. pg-1.1.x is released since half a year, so that I removed the compatibility code for older versions and made pg-1.1 mandatory. I'll add some comments on specific lines to describe the rationale behind these changes. I hope we can discuss this PR then.

@@ -1,6 +1,6 @@
# frozen_string_literal: true

gem "pg", ">= 0.18", "< 2.0"
gem "pg", "~> 1.1"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pg-1.1 isn't really required for ActionCable. I changed it just for consistency.

end
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

column_types is replaced by column_type, so that type information can be retrieved for necessary columns only.

PG::Result stores values received from the database internally in a compact
memory format. When the value is accessed, it is converted to a ruby string or
casted to another ruby object.

While ActiveRecord::Result enforces the conversion of all rows and columns to
ruby objects, this implementation defers conversion to when the single value is
accessed. Therefore ActiveRecord::ConnectionAdapters::PostgreSQL::Result and
PG::Tuple store a reference to the PG::Result object.

PG::Tuple caches all values after being materialized. It detaches the PG::Result
object, when all columns have been materialized or when the object is marshaled.

This raises the pg version constraint to ~> 1.1, which provides PG::Tuple and
implements compensating controls to avoid memory bloat when PG::Result aren't
cleared. [1]

ActiveRecord::ConnectionAdapters::PostgreSQL::Result is derived from
ActiveRecord::Result although almost all methods are re-implemented. This is to
satisfy tests and to make it clear that it implements the same interface.

This speedups wasteful queries (without explicit selecting required columns) by
up to 50%. Calling column_type/get_oid_type only for necessary columns has
another small performance advantage even if all columns are used.

[1] https://samsaffron.com/archive/2018/06/13/ruby-x27-s-external-malloc-problem

def column_values(i)
@rows.map { |row| row[i] }
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

column_values allows to retrieve a single column. In PostgreSQL::Result this is implemented without allocation of Array<Array<value>> result set.

yield @pg_result.tuple(i)
i += 1
end
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PostgreSQL::Result doesn't return Hash but PG::Tuple on each, [], first and last methods. PG::Tuple implements enough Hash-like functions to be used as a Hash replacement in ActiveRecord. It allows also Array like access. See: https://deveiate.org/code/pg/PG/Tuple.html

@yoelblum
Copy link

What a great effort! Can something similar be done with mysql?

@SamSaffron
Copy link
Contributor

@tenderlove / @matthewd any chance you can review?

@larskanis any chance you can posts a few before / after micro benches?

@larskanis
Copy link
Contributor Author

I updated the benchmark to the current master branch here. Essentially it's looks like:

# rails branch master:
#
# top 10 id / title PG      2.430k (± 1.7%) i/s -     12.272k in   5.051112s
# top 1000 id wasteful     36.095  (± 5.5%) i/s -    180.000  in   5.010574s
#      pluck 1000 mode    381.441  (± 1.3%) i/s -      1.938k in   5.081686s
#
# rails branch hyperrecord-v2:
#
# top 10 id / title PG      2.849k (± 1.0%) i/s -     14.278k in   5.012458s
# top 1000 id wasteful     61.599  (± 6.5%) i/s -    312.000  in   5.081960s
#      pluck 1000 mode    401.916  (± 1.2%) i/s -      2.028k in   5.046694s

So non-wasteful queries are a little bit faster due to less column lookups, but near to the measuring tolerance. Wasteful queries are pretty clear almost twice as fast. But please do your own benchmarks, since mine tend to be biased 😉

@@ -329,7 +329,7 @@ def test_select_methods_passing_a_association_relation
author = Author.create!(name: "john")
Post.create!(author: author, title: "foo", body: "bar")
query = author.posts.where(title: "foo").select(:title)
assert_equal({ "title" => "foo" }, @connection.select_one(query))
assert_equal({ "title" => "foo" }, @connection.select_one(query).to_h)
Copy link
Member

@kamipo kamipo Jan 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that this PR will change the public API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as described above it changes the API. Even if PG::Tuple implements some Hash like methods, it never equals to a Hash, so to_h is required for comparison.

@rafaelfranca called ActiveRecord::Result a "public API by mistake". Although we have several scripts at work making use of this API, it never felt like a public API, since it is halfway between the model query interface and the raw_connection. It also changed several times in the past, due to the needs within AR.

If public API change is a concern, it is possible to exclude select_foo methods from this change and let only exec_foo methods return the specialized PostgreSQL::Result. I changed select_foo for consistency, but they are not used by model query methods and are therefore not so much relevant for performance.

@SamSaffron
Copy link
Contributor

I just tested this on Discourse @larskanis with

git clone https://github.com/discourse/discourse.git
cd discourse 

# mangle Gemfile with
if rails_master?
  gem 'arel', git: 'https://github.com/rails/arel.git'
  gem 'rails', path: 'path/to/this/branch'
else

RAILS_MASTER=1 bundle
RAILS_MASTER=1 ruby script/bench.rb -i 100

Not noticing anything much different with or without the patch. Numbers are really close.

What this probably means is that over the years at Discourse we have optimised stuff around this limitation, not that the patch is bad or anything.

I still think this is a great change! just not expecting it to make too much of a diff for Discourse.

@rafaelfranca rafaelfranca removed this from the 6.0.0 milestone Apr 12, 2019
@rails-bot
Copy link

rails-bot bot commented Dec 18, 2019

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
Thank you for your contributions.

@rails-bot rails-bot bot added the stale label Dec 18, 2019
@rails-bot rails-bot bot closed this Dec 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants