Skip to content

ollieglass/sqlalchemy-pg-copy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sqlalchemy-pg-copy

Helper function to COPY a list of dicts to Postgres

Motivation

  • Postgres COPY is the fastest way to insert data
  • SQLAlchemy has no function to perform a Postgres COPY
  • COPY requires data to be in a specific CSV format
  • Python's CSV library can't transform data to that format.

This helper function:

  • takes a SQLAlchemy engine, list of dictionaries to insert, and table name
  • transforms the list of dictionaries to Postgres COPY CSV format
  • ignores duplicate records (by inserting via a temporary staging table)

Example

import pg_copy

objs = [
    {
        'id': i,
        'description': f'record description {i}'
    } for i in range(100_000)
]

pg_copy.insert_with_copy(engine, objs, target_table)

By default, values already present in the table are ignored. This can be changed by setting the ignore_duplicates argument to False.

insert_with_copy can take an optional target_schema argument.

See example.py

References

Use COPY to load all the rows in one command, instead of using a series of INSERT commands. The COPY command is optimized for loading large numbers of rows; it is less flexible than INSERT, but incurs significantly less overhead for large data loads. Since COPY is a single command, there is no need to disable autocommit if you use this method to populate a table.

Postgres docs (Populating a Database - Chapter 14. Performance Tips)

Stack Overflow - How does COPY work and why is it so much faster than INSERT?

SQLAlchemy docs, Bulk Operations

Example code and benchmarks for all methods of inserting data into Postgres. COPY is ~2-200x faster than any other method Fastest Way to Load Data Into PostgreSQL Using Python

Releases

No releases published

Packages

No packages published

Languages