Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

copy_data memory bloat in v1.4 #473

Closed
davidtaylorhq opened this issue Aug 8, 2022 · 2 comments · Fixed by #474
Closed

copy_data memory bloat in v1.4 #473

davidtaylorhq opened this issue Aug 8, 2022 · 2 comments · Fixed by #474

Comments

@davidtaylorhq
Copy link

We use Connection#copy_data to stream large volumes of data into a temporary table. We recently observed significant performance degradation and increased memory use for this system. Here's a minimal reproduction:

require "securerandom"
require "objspace"
require "bundler/inline"

PG_VERSION = "1.4.2"
gemfile do
  source 'https://rubygems.org'
  gem 'pg', PG_VERSION
end

puts "PG::version #{PG::VERSION}"

def memory_use
  3.times { GC.start }
  objspace_size_mb = ObjectSpace.memsize_of_all / 1024 / 1024
  rss_mb = `ps  -p #{Process.pid} -o rss`.split("\n")[1].to_i / 1024
  "objspace:#{objspace_size_mb}mb; rss:#{rss_mb}mb"
end

puts "Before: #{memory_use}"

start_at = Time.now()

connection = PG.connect(dbname: 'discourse_development')
table_name = "my_temp_table"
connection.exec("CREATE TEMP TABLE #{table_name}(url text UNIQUE)")
connection.copy_data("COPY #{table_name} FROM STDIN CSV") do
  1_000_000.times do |i|
    connection.put_copy_data("#{SecureRandom.hex(100)}\n")
  end
  puts "After loop, inside copy_data: #{memory_use}"
end

puts "After: #{memory_use}"
puts "Took #{Time.now - start_at}s"

With version 1.3.5, this script takes ~10s on my machine, and reports ~47mb RSS at the end. With version 1.4.0 (and 1.4.1, 1.4.2), it takes ~80s and reports ~182mb RSS at the end. The RSS appears to scale with the amount of data being copied.

larskanis added a commit to larskanis/ruby-pg that referenced this issue Aug 8, 2022
We had a blocking flush in pg-1.3.x at every call to put_copy_data.
This made sure, that all data is sent until the next put_copy_data.
In ged#462 (and pg-1.4.0 to .2) the behaviour was changed to rely on the non-blocking flushs libpq is doing internally.
This makes a decent performance improvement especially on Windows.
Unfortunately ged#473 proved that memory bloat can happen, when sending the data is slower than calls to put_copy_data happen.

As a trade-off this proposes to do a blocking flush only every 100 calls.

If libpq is running in blocking mode (PG::Connection.async_api = false) put_copy_data does a blocking flush every time new memory is allocated.
Unfortunately we don't have this kind of information, since we don't have access to libpq's PGconn struct and the return codes don't give us an indication when this happens.
So doing a flush at every fixed number of calls is a very simple heuristic.

Fixes ged#473
@larskanis
Copy link
Collaborator

I can reproduce this issue. Not that dramatic as you measured, but also measurable. The root cause is #462. I made it when I noticed that put_copy_data is quite slow on Windows. Unfortunately it can result in a memory bloat when put_copy_data is called with more data than what actually can be transferred over the wire.

My proposal is to fix it like #474. Not ideal, but I think the only practical trade-off we can do.

CC @SamSaffron

@davidtaylorhq
Copy link
Author

Thanks for the quick fix @larskanis - #474 does indeed fix the issue in my minimal repro script ❤️

it can result in a memory bloat when put_copy_data is called with more data than what actually can be transferred over the wire.

Aha I see! Indeed, adding a sleep 60 to the end of the repro script and then measuring again shows that the memory drops back to normal levels. 👍

@davidtaylorhq davidtaylorhq changed the title copy_data memory leak in v1.4 copy_data memory bloat in v1.4 Aug 8, 2022
larskanis added a commit to larskanis/ruby-pg that referenced this issue Aug 9, 2022
We had a blocking flush in pg-1.3.x at every call to put_copy_data.
This made sure, that all data is sent until the next put_copy_data.
In ged#462 (and pg-1.4.0 to .2) the behaviour was changed to rely on the non-blocking flushs libpq is doing internally.
This makes a decent performance improvement especially on Windows.
Unfortunately ged#473 proved that memory bloat can happen, when sending the data is slower than calls to put_copy_data happen.

As a trade-off this proposes to do a blocking flush only every 100 calls.

If libpq is running in blocking mode (PG::Connection.async_api = false) put_copy_data does a blocking flush every time new memory is allocated.
Unfortunately we don't have this kind of information, since we don't have access to libpq's PGconn struct and the return codes don't give us an indication when this happens.
So doing a flush at every fixed number of calls is a very simple heuristic.

Fixes ged#473
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants