Skip to content

No public API for COPY FROM STDIN (text/binary stream ingestion) #166

@Dev-iL

Description

@Dev-iL

Summary

psqlpy exposes no public method for streaming data into PostgreSQL via the COPY FROM STDIN protocol. binary_copy_to_table exists on Connection but requires pre-encoded binary data in PostgreSQL's binary COPY format, which is not interchangeable with the text/CSV COPY FROM STDIN path used by every other Python driver.

Environment

  • psqlpy: 0.11.12
  • Python: 3.14.3
  • PostgreSQL: 14.20

What other drivers provide

Driver API
asyncpg conn.copy_records_to_table(table, records=[...], columns=[...])
psycopg3 async with cur.copy("COPY t FROM STDIN") as copy: await copy.write(data)
psycopg2 cur.copy_from(file_obj, table, columns=[...])

COPY is the fastest bulk-load path in PostgreSQL (bypasses row-by-row parsing and WAL overhead for unlogged tables). Benchmarks against asyncpg's copy_records_to_table show it sustaining >1 M rows/sec on local connections.

What psqlpy currently provides

# Exists on Connection, but takes raw binary COPY format bytes:
conn.binary_copy_to_table(source, table_name, columns=None, schema_name=None)

There is no text/CSV COPY path, no record-list API, and no streaming write API analogous to psycopg3's async with cur.copy(...) as copy.

Expected / requested API

A high-level API that accepts Python objects (or at least raw text/bytes in PostgreSQL text COPY format) without requiring callers to produce binary COPY wire format manually:

# Option A: record list (like asyncpg)
await conn.copy_records_to_table(
    "my_table",
    records=[(1, "a"), (2, "b")],
    columns=["id", "name"],
)

# Option B: streaming writer (like psycopg3)
async with conn.copy("COPY my_table FROM STDIN") as writer:
    await writer.write("1\ta\n2\tb\n")

Impact

Any application that bulk-loads data — ETL pipelines, data import scripts, benchmarks — cannot use psqlpy for this workload without a lossy workaround (falling back to execute_many, which is significantly slower; see related issue on execute_many performance).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions