Skip to content

Data corruption: bytea data is not properly escaped #7589

Closed
xb opened this Issue Sep 10, 2012 · 8 comments

4 participants

@xb
xb commented Sep 10, 2012

All as of http://www.postgresql.org/docs/9.1/static/datatype-binary.html bytea-strings have to be escaped with leading "E" as in

INSERT INTO table (col) VALUES (E'\\101foo')

and not as in

INSERT INTO table (col) VALUES ('\\101foo')

The difference is, whether 4 bytes (41666f6f, correct) or whether 7 bytes (5c313031666f6f, incorrect) are stored.

Specifically, https://github.com/rails/rails/blob/232d2223ebcfe5c9e0425c821f5d30a7d5968512/activerecord/lib/active_record/connection_adapters/postgresql/quoting.rb#L46 needs to be changed by adding an 'E' prefix.

This data loss bug has been introduced more than 2 years ago by this dac80f7#L0L379 commit.

@kennyj
kennyj commented Sep 10, 2012

I can reproduce this issue.
But when I added an 'E' prefix, some test cases are failed -- loading fixtures to binaries table.

After investigating, I think PGconn.escape_bytea(value) behavior is a bit strange ?.
BTW: If we use hex dump format (with E prefix), it works fine.

def escape_bytea(value)
  # PGconn.escape_bytea(value) if value
  if value
    "\\\\x" << (value.unpack('C*').map{ |b| "%02X" % b }.join(''))
  end
end

I'll try to investigate detail tomorrow.

@kennyj
kennyj commented Sep 15, 2012

sorry, I was confusing. Please tell me about the core of problem. Data size problem ???

@senny
Ruby on Rails member
senny commented Mar 9, 2013

What is the status on this issue? Is it still happening on master?

@senny senny added the PostgreSQL label Mar 26, 2014
@senny
Ruby on Rails member
senny commented Apr 6, 2014

/cc @matthewd

@matthewd
Ruby on Rails member
matthewd commented Apr 6, 2014

Since b839d40, we avoid doing any mangling of bytea data when it's typecast for a bind param. And we've been progressively increasing our use of bind parameters, so at this point, it'd probably take a few tries to find something that actually uses this path -- particularly with a bytea.

The result of PGconn.escape_bytea varies based on the connected server. (Yes, the class method, that doesn't have a connection.) It won't ever use hex encoding, but it will decide how many slashes should be used based on the current value of standard_conforming_strings. Which, as noted above, we explicitly set while connecting.

So as far as I can see, the current implementation is correct... as long as you're only talking to one database server, and not changing standard_conforming_strings.

If we were to use PGconn#escape_bytea instead, we'd get to use hex encoding, which is a win for memory and performance, and wouldn't risk getting confused between different connections. (Is there a reason we don't? At a glance, it seems we should have an @connection right there :interrobang:)

If you're changing standard_conforming_strings on "our" connection, you're still Gonna Have A Bad Time: we have to be able to trust that the server's definition of "correct" won't change between the time we construct the escaped value, and the time we send it.

There is actually a chance of a race condition, if the remote server is defaulting to standard_conforming_strings=off, and we call escape_bytea while another thread has just connected, and before it changes the setting, we could end up with incorrectly-doubled slashes. But that seems unlikely to be the source of the reported (and apparently reproducible) problem. Again, this would be fixed by changing escape_bytea to use the connection.


All that said, I can't rule out the possibility that some older version of libpq and/or the pg gem has some less-desirable behaviour: I've just been experimenting locally:

irb(main):001:0> s = "\001foo"
=> "\u0001foo"
irb(main):002:0> s.size
=> 4
irb(main):003:0> require 'pg'
=> true
irb(main):004:0> puts PGconn.escape_bytea(s)
\\001foo
=> nil
irb(main):005:0> c = PGconn.new(...)
=> #<PG::Connection:...>
irb(main):006:0> c.query('set standard_conforming_strings to on')
=> #<PG::Result:...>
irb(main):007:0> puts PGconn.escape_bytea(s)
\001foo
=> nil
@matthewd
Ruby on Rails member
matthewd commented Apr 6, 2014

'\\101foo' ... 4 bytes (41666f6f, correct)

Err, where did that string even come from?! I know of no escaping mechanism that would produce bare f and o, yet favour \101 over the equivalent-but-shorter A.

Sounds like (some of?) @xb's string had already been slash-escaped once before we got it.

@matthewd
Ruby on Rails member
matthewd commented Apr 6, 2014

tl;dr: We should probably change escape_bytea to use the connection for correctness and corner cases, but even without that, I can't find a real bug here -- at least not now.

@senny
Ruby on Rails member
senny commented Apr 6, 2014

@matthewd thank you for your detailed explanation :heart: . I'm giving this one a close, do you have time to prepare the PR to change to PGConn#escape_bytea ?

@senny senny closed this Apr 6, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.