Skip to content

Add pandas integration with type inference for NSV format#4

Merged
namingbe merged 8 commits intomasterfrom
claude/constrain-pandas-csv-types-01SNTCQdxd21HTHqZd61uHHY
Mar 14, 2026
Merged

Add pandas integration with type inference for NSV format#4
namingbe merged 8 commits intomasterfrom
claude/constrain-pandas-csv-types-01SNTCQdxd21HTHqZd61uHHY

Conversation

@namingbe
Copy link
Copy Markdown
Collaborator

Summary

This PR adds comprehensive pandas integration for the NSV (Newline-Separated Values) format, including automatic type inference for read_nsv() and proper handling of non-string types in to_nsv().

Key Changes

  • Enhanced read_nsv() function:

    • Implements automatic type inference matching pandas' read_csv() behavior using pd.to_numeric()
    • Adds support for explicit dtype parameter to override inference
    • Intelligently preserves string columns when numeric conversion would lose data
  • Improved to_nsv() method:

    • Properly converts non-string values (integers, floats) to strings
    • Handles NaN/NA values by converting them to empty strings
    • Enables proper roundtrip serialization/deserialization
  • Comprehensive test suite (tests/test_pandas.py):

    • Type inference tests comparing NSV behavior with CSV
    • Tests for integers, floats, mixed numeric/string data, scientific notation, negative numbers
    • Explicit dtype parameter tests
    • Roundtrip serialization tests for various data types
    • Edge case handling (empty fields, all-empty columns, NaN values)

Implementation Details

  • Type inference uses pd.to_numeric(..., errors='coerce') to safely attempt numeric conversion
  • Columns are only converted to numeric types if no non-empty values are lost in the conversion
  • The patch_pandas() function now properly registers both read_nsv() and to_nsv() with pandas

https://claude.ai/code/session_01SNTCQdxd21HTHqZd61uHHY

claude and others added 8 commits February 18, 2026 20:43
read_nsv now infers numeric types per-column (like read_csv) instead of
leaving everything as strings. to_nsv converts non-string values to str
and NaN to empty string before writing.
- reader.py: check() appended a bare int instead of a (pos, line, col)
  tuple when the string ends with a trailing backslash, which would crash
  the subsequent unpacking loop
- core.py: remove unused loop variable i in dumps()
- test_utils.py: load_then_dump() splatted the list into dumps() instead
  of passing it as a single iterable argument
- Apply pandas' default NA value set (NA, NaN, nan, null, None, etc.)
  before type inference so NA strings become NaN in all column types,
  matching read_csv behaviour
- Detect all-true/false columns (case-insensitive) and cast to bool;
  bool+NA columns return object with Python bools and NaN, also
  matching read_csv behaviour
- Refactor inference into _infer_column() helper
- Add TestReadNsvNullInference and TestReadNsvBoolInference test classes,
  all using read_csv as the oracle
CI runs without pandas (it's an optional dependency). Guard all test
classes with @skip_no_pandas so the suite passes without it.
The pandas extra was not being installed in CI. Switch to
pip install -e ".[pandas]" so the tests actually run.
Revert the skipUnless guards added in the previous commit.
Instead of reimplementing pandas' bool/NA/numeric detection,
convert NSV rows to CSV in memory and pass to read_csv directly.
Keep type inference local to patch_pandas rather than leaking
constants into module scope. No CSV serialization overhead.
@namingbe namingbe merged commit e321bed into master Mar 14, 2026
10 checks passed
@namingbe namingbe deleted the claude/constrain-pandas-csv-types-01SNTCQdxd21HTHqZd61uHHY branch March 14, 2026 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants