-
-
Notifications
You must be signed in to change notification settings - Fork 19.3k
Open
Description
Summary
Add an opt-in flag to preserve SparseDtype on Parquet/Feather roundtrip by storing minimal dtype metadata in Arrow schema and reconstructing on read. No behavior change by default.
API
- Write:
DataFrame.to_parquet(..., preserve_sparse=False),DataFrame.to_feather(..., preserve_sparse=False) - Read:
read_parquet(..., preserve_sparse=False),read_feather(..., preserve_sparse=False) - Alternative name for feedback:
preserve_extension_arrays.
Behavior
- Default (
False): current behavior unchanged (dense on read). - When
True: write Arrow field metadata (subtype, fill_value); read reconstructsSparseArray(SparseDtype).
Implementation sketch
- Writer: detect
SparseDtypecolumns, attach schema field metadata (e.g.,b"pandas.sparse.dtype",b"pandas.sparse.version"), keep physical encoding compatible. - Reader: if
preserve_sparse=Trueand metadata present, rebuild sparse columns from dense values + recordedfill_value/subtype.
Tests
- Parquet and Feather roundtrip.
- Subtypes: int64/float64/boolean; various
fill_values (0, 0.0, False, NaN). - Mixed frames (sparse + dense).
- Verify off-by-default behavior.
Notes
- Scopes strictly to I/O compatibility (acknowledges DEPR: SparseDtype #56518 discussion).
- Backward compatible and opt-in.
- Namespaced metadata (e.g.,
pandas.sparse.*).
Request for feedback
- API flag name (
preserve_sparsevs generalized). - Metadata keys/placement and interop concerns.
Metadata
Metadata
Assignees
Labels
No labels