Special characters not quoted in BYTEA columns #1

iustin opened this Issue Jan 7, 2011 · 8 comments


None yet
4 participants

iustin commented Jan 7, 2011


It seems that the BYTEA/ByteString column type doesn't quote "special" characters properly, which results in silent data truncation at the first NULL character.

Sample table:

create table test (name bytea);

Sample program:

import Database.HDBC.PostgreSQL
import Database.HDBC
import Data.ByteString

main = do
  db <- connectPostgreSQL "dbname=debug"
  stmt <- prepare db "INSERT INTO test (name) VALUES($1)"
  execute stmt [toSql $ pack [0]]
  execute stmt [toSql $ pack [65, 0, 66]]
  commit db

The statements as received by the postgres server are:

2011-01-07 16:56:04 CET LOG:  execute <unnamed>: INSERT INTO test (name) VALUES($1)
2011-01-07 16:56:04 CET DETAIL:  parameters: $1 = ''
2011-01-07 16:56:04 CET LOG:  execute <unnamed>: INSERT INTO test (name) VALUES($1)
2011-01-07 16:56:04 CET DETAIL:  parameters: $1 = 'A'

As you can see, the bindings do not correctly escape the NULL bytes and this results in silent truncation. The proper escaping rules for BYTEA columns are documented here: http://www.postgresql.org/docs/8.4/static/datatype-binary.html



jgoerzen commented Jan 7, 2011

Is it valid to escape all of these even when not going into a BYTEA column? I may need to go back and look at the code, but I believe that we do not always know whether a BYTEA value is expected.

iustin commented Jan 7, 2011

The escaping seems to be accepted:

debug=# SELECT E'\\000'::bytea;
(1 row)

debug=# SELECT E'\\000'::varchar;
(1 row)

So I don't think we'll have more breakage by this. On the other hand, I'm suprised varchar supports null chars :)

iustin commented Jan 10, 2011

An alternative to "quoting everywhere" would be to add such quoting to ByteString fields (on the haskell side), and disallow NULLs in plain Strings.


jgoerzen commented Jan 10, 2011

I have made commit 6cc0578 that I believe will fix this for you. Please test and let me know. It is rather unclear what level of binary escaping is needed using the C API vs. using psql. Also, it is unclear whether unesaping will have to be done on data come back the other way around. Let me know.


-- John


lpsmith commented Jan 17, 2011

Incidentally, I know that it is possible to efficiently transfer large binary blobs to and from Postgres using libpq without any quoting. I'd have to dig through the docs to figure out how though.


jgoerzen commented Jan 17, 2011

I would be happy to know what you find out. I had thought I already was doing this, but apparently not.


lpsmith commented Jan 18, 2011

Ok, I've uploaded an example of what needs to happen here.


soenkehahn commented Apr 16, 2013

Reading a ByteString from a bytea field yields the binary data in "bytea Hex Format" as described here: http://www.postgresql.org/docs/9.1/static/datatype-binary.html. This is probably not what one wants. Our fork (https://github.com/soenkehahn/hdbc-postgresql) fixes this by converting bytestrings to "Hex Format" before writing them to the DB and decoding the format when reading.
The downside to our current implementation is that you cannot use ByteStrings to write to any other fields than bytea fields (e.g. VARCHAR). This will probably break a lot of code. We think however that this restriction might be a good thing: ByteString is a type for binary data. For textual data SqlString should be used.

And comments on that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment