-
-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Summary
Add CSV format support to the write_to_storage_sync() method on ArrowResult, similar to the existing Parquet support.
Use Case
When building data export tools that need to produce CSV files compatible with other systems (e.g., shell scripts, legacy tools), having native CSV export would simplify the workflow.
Currently, you can export to Parquet:
result = session.select_to_arrow("SELECT * FROM users")
result.write_to_storage_sync("/path/to/users.parquet", format_hint="parquet")But for CSV, you need to convert to pandas first:
result = session.select_to_arrow("SELECT * FROM users")
df = result.to_pandas()
df.to_csv("/path/to/users.csv", sep="|", index=False)Proposed Solution
Add format_hint="csv" support to write_to_storage_sync():
result = session.select_to_arrow("SELECT * FROM users")
result.write_to_storage_sync(
"/path/to/users.csv",
format_hint="csv",
delimiter="|",
header=True,
quote_style="all" # or "needed", "none"
)Additional Options
Consider supporting these CSV options:
delimiter- field separator (default:,)header- include header row (default:True)quote_style- how to quote fields (all,needed,none)null_value- string to represent NULL values
Implementation Notes
PyArrow has pyarrow.csv.write_csv() which could be used for the implementation:
import pyarrow.csv as pa_csv
write_options = pa_csv.WriteOptions(
include_header=True,
delimiter=delimiter
)
pa_csv.write_csv(table, path, write_options=write_options)Context
This came up when building shell-script-compatible collection output for database migration tools where CSV with specific formatting (pipe delimiter, quoted strings) is required for downstream compatibility.