Skip to content

Commit

Permalink
db: fix CSV escaping by switching to jackc/pgx
Browse files Browse the repository at this point in the history
lib/pq has been in maintenance mode for a while, and issue timescale#61 appears
to have run into one of its idiosyncrasies: its COPY implementation
assumes that you're using a query generated via pq.CopyIn(), which uses
the default TEXT format, so it runs all of the incoming data through an
additional escaping layer.

Our code uses CSV by default (and there appears to be no way to use TEXT
format, since we're using the old COPY syntax), which means that
incoming CSV containing its own escapes will be double-escaped and
corrupted. This is most visible with bytea columns, but the tests
currently document additional problems with tab and backslash
characters, and there are probably other problematic cases too.

To fix, switch from lib/pq over to jackc/pgx, and reimplement
db.CopyFromLines() using the PgConn.CopyFrom() API. We were already
depending on a part of this library before, so the new dependency isn't
as big of a change as it would have been otherwise, but the switch isn't
free. The compiled binary gains roughly 1.5 MB in size -- likely due to
jackc's extensive type conversion system, which is unfortunate because
we're not using it. Further optimization could probably be done, at the
expense of having most of the DB logic go through the low-level APIs
rather than database/sql.

We make use of the new sql.Conn.Raw() method to easily drop down to the
lowest API level, so bump our minimum Go version to 1.13. (1.12 has been
EOL for about three years now.) This escaping fix is a breaking change
for anyone who may have already worked around this problem, so bump the
utility's version to 0.4.0.
  • Loading branch information
jchampio committed Jun 9, 2022
1 parent 43ebc7f commit 25f285a
Show file tree
Hide file tree
Showing 6 changed files with 176 additions and 54 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ PostgreSQL's built-in `COPY` functionality for bulk inserting data
into [TimescaleDB.](//github.com/timescale/timescaledb/)

### Getting started
You need the Go runtime (1.6+) installed, then simply `go get` this repo:
You need the Go runtime (1.13+) installed, then simply `go get` this repo:
```bash
$ go install github.com/timescale/timescaledb-parallel-copy/cmd/timescaledb-parallel-copy@latest
```
Expand Down
5 changes: 3 additions & 2 deletions cmd/timescaledb-parallel-copy/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,14 @@ import (
"sync/atomic"
"time"

_ "github.com/lib/pq"
_ "github.com/jackc/pgx/v4/stdlib"

"github.com/timescale/timescaledb-parallel-copy/internal/db"
)

const (
binName = "timescaledb-parallel-copy"
version = "0.3.0"
version = "0.4.0-dev"
tabCharStr = "\\t"
)

Expand Down
8 changes: 4 additions & 4 deletions go.mod
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
module github.com/timescale/timescaledb-parallel-copy

go 1.12
go 1.13

require (
github.com/jackc/pgconn v1.1.0
github.com/jackc/pgconn v1.12.1
github.com/jackc/pgx/v4 v4.16.1
github.com/jmoiron/sqlx v1.2.0
github.com/lib/pq v1.2.0
golang.org/x/crypto v0.0.0-20190911031432-227b76d455e7 // indirect
github.com/lib/pq v1.10.6 // indirect
google.golang.org/appengine v1.6.5 // indirect
)

0 comments on commit 25f285a

Please sign in to comment.