Skip to content

Support Arrow-backed Pandas dtypes (pyarrow string/float) in sender.dataframe() #115

@caolixiang

Description

@caolixiang

Is your feature request related to a problem?

We’re using the native ILP Python client (sender.dataframe()) to ingest Polars DataFrames after converting them to Pandas. When we enable use_pyarrow_extension_array=True (or DataFrame.to_pandas(use_pyarrow_extension_array=True)), QuestDB rejects the batch with errors such as:

Unsupported dtype large_string[pyarrow]
Unsupported dtype double[pyarrow]
To work around this we currently rerun the conversion with use_pyarrow_extension_array=False, which copies every column back to NumPy/Python dtypes. That re-conversion adds ~30–40% CPU and memory overhead for large batches, and prevents us from using the zero-copy Arrow path that Polars/Pandas now offer.

Feature request

Allow sender.dataframe() to accept Pandas columns backed by pyarrow dtypes (e.g., string[pyarrow], float64[pyarrow], timestamp[pyarrow]).
Alternatively, provide an option to let the client detect pyarrow-backed columns and convert them server-side without forcing us to re-materialize the entire batch in Python.
Why it matters

Newer Pandas/Polars pipelines default to Arrow-backed storage for performance; QuestDB ingestion currently forces an extra copy step.
Importing tens of millions of rows per batch becomes CPU-bound on the client simply because we have to downgrade data types.
Happy to provide sample code or traces if needed. Thanks for considering!

Describe the solution you'd like.

No response

Describe alternatives you've considered.

No response

Full Name:

Lixiang Cao

Affiliation:

I am a freelancer

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions