Python script that takes a SQL pg_dump as input and outputs COPY commands containing the bad utf8 rows.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE
README.md
pg_find_bad_utf8.py

README.md

pg_find_bad_utf8

Python script that takes a SQL pg_dump as input and outputs COPY commands containing the bad utf8 rows to STDOUT and a schema-only pg_dump command to STDERR which only dumps the bad tables, for ease of creating a temporary DB with only the bad rows.

USAGE: pg_dump mydb | ./pg_find_bad_utf8.py > badtables.sql 2> dumpcommand.sh

Then you can create a db like so:

createdb badtables

Next, load it with the schema from dumpcommand.sh:

pg_dump --schema-only --table badtable1 --table badtable2 mydb | psql badtables

Then load your SQL file full of COPY commands:

psql -f badtables.sql badtables

This script is licensed under the PostgreSQL License. See the LICENSE file for details.