Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Special characters not quoted in BYTEA columns #1

Open
iustin opened this Issue · 8 comments

4 participants

@iustin

Hi,

It seems that the BYTEA/ByteString column type doesn't quote "special" characters properly, which results in silent data truncation at the first NULL character.

Sample table:

create table test (name bytea);

Sample program:

import Database.HDBC.PostgreSQL
import Database.HDBC
import Data.ByteString

main = do
  db <- connectPostgreSQL "dbname=debug"
  stmt <- prepare db "INSERT INTO test (name) VALUES($1)"
  execute stmt [toSql $ pack [0]]
  execute stmt [toSql $ pack [65, 0, 66]]
  commit db

The statements as received by the postgres server are:

2011-01-07 16:56:04 CET LOG:  execute <unnamed>: INSERT INTO test (name) VALUES($1)
2011-01-07 16:56:04 CET DETAIL:  parameters: $1 = ''
2011-01-07 16:56:04 CET LOG:  execute <unnamed>: INSERT INTO test (name) VALUES($1)
2011-01-07 16:56:04 CET DETAIL:  parameters: $1 = 'A'

As you can see, the bindings do not correctly escape the NULL bytes and this results in silent truncation. The proper escaping rules for BYTEA columns are documented here: http://www.postgresql.org/docs/8.4/static/datatype-binary.html

thanks,
iustin

@jgoerzen
Owner

Is it valid to escape all of these even when not going into a BYTEA column? I may need to go back and look at the code, but I believe that we do not always know whether a BYTEA value is expected.

@iustin

The escaping seems to be accepted:

debug=# SELECT E'\\000'::bytea;
 bytea 
-------
 \000
(1 row)

debug=# SELECT E'\\000'::varchar;
 varchar 
---------
 \000
(1 row)

So I don't think we'll have more breakage by this. On the other hand, I'm suprised varchar supports null chars :)

@iustin

An alternative to "quoting everywhere" would be to add such quoting to ByteString fields (on the haskell side), and disallow NULLs in plain Strings.

@jgoerzen
Owner

I have made commit 6cc0578 that I believe will fix this for you. Please test and let me know. It is rather unclear what level of binary escaping is needed using the C API vs. using psql. Also, it is unclear whether unesaping will have to be done on data come back the other way around. Let me know.

Thanks,

-- John

@lpsmith

Incidentally, I know that it is possible to efficiently transfer large binary blobs to and from Postgres using libpq without any quoting. I'd have to dig through the docs to figure out how though.

@jgoerzen
Owner

I would be happy to know what you find out. I had thought I already was doing this, but apparently not.

@lpsmith

Ok, I've uploaded an example of what needs to happen here.

@soenkehahn

Reading a ByteString from a bytea field yields the binary data in "bytea Hex Format" as described here: http://www.postgresql.org/docs/9.1/static/datatype-binary.html. This is probably not what one wants. Our fork (https://github.com/soenkehahn/hdbc-postgresql) fixes this by converting bytestrings to "Hex Format" before writing them to the DB and decoding the format when reading.
The downside to our current implementation is that you cannot use ByteStrings to write to any other fields than bytea fields (e.g. VARCHAR). This will probably break a lot of code. We think however that this restriction might be a good thing: ByteString is a type for binary data. For textual data SqlString should be used.

And comments on that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.