Add pandas integration with type inference for NSV format by namingbe · Pull Request #4 · nsv-format/nsv-python

namingbe · 2026-03-14T23:08:04Z

Summary

This PR adds comprehensive pandas integration for the NSV (Newline-Separated Values) format, including automatic type inference for read_nsv() and proper handling of non-string types in to_nsv().

Key Changes

Enhanced read_nsv() function:
- Implements automatic type inference matching pandas' read_csv() behavior using pd.to_numeric()
- Adds support for explicit dtype parameter to override inference
- Intelligently preserves string columns when numeric conversion would lose data
Improved to_nsv() method:
- Properly converts non-string values (integers, floats) to strings
- Handles NaN/NA values by converting them to empty strings
- Enables proper roundtrip serialization/deserialization
Comprehensive test suite (tests/test_pandas.py):
- Type inference tests comparing NSV behavior with CSV
- Tests for integers, floats, mixed numeric/string data, scientific notation, negative numbers
- Explicit dtype parameter tests
- Roundtrip serialization tests for various data types
- Edge case handling (empty fields, all-empty columns, NaN values)

Implementation Details

Type inference uses pd.to_numeric(..., errors='coerce') to safely attempt numeric conversion
Columns are only converted to numeric types if no non-empty values are lost in the conversion
The patch_pandas() function now properly registers both read_nsv() and to_nsv() with pandas

https://claude.ai/code/session_01SNTCQdxd21HTHqZd61uHHY

read_nsv now infers numeric types per-column (like read_csv) instead of leaving everything as strings. to_nsv converts non-string values to str and NaN to empty string before writing.

…xd21HTHqZd61uHHY

- reader.py: check() appended a bare int instead of a (pos, line, col) tuple when the string ends with a trailing backslash, which would crash the subsequent unpacking loop - core.py: remove unused loop variable i in dumps() - test_utils.py: load_then_dump() splatted the list into dumps() instead of passing it as a single iterable argument

- Apply pandas' default NA value set (NA, NaN, nan, null, None, etc.) before type inference so NA strings become NaN in all column types, matching read_csv behaviour - Detect all-true/false columns (case-insensitive) and cast to bool; bool+NA columns return object with Python bools and NaN, also matching read_csv behaviour - Refactor inference into _infer_column() helper - Add TestReadNsvNullInference and TestReadNsvBoolInference test classes, all using read_csv as the oracle

CI runs without pandas (it's an optional dependency). Guard all test classes with @skip_no_pandas so the suite passes without it.

The pandas extra was not being installed in CI. Switch to pip install -e ".[pandas]" so the tests actually run. Revert the skipUnless guards added in the previous commit.

Instead of reimplementing pandas' bool/NA/numeric detection, convert NSV rows to CSV in memory and pass to read_csv directly.

Keep type inference local to patch_pandas rather than leaking constants into module scope. No CSV serialization overhead.

claude and others added 8 commits February 18, 2026 20:43

constrain pandas patch types to match csv semantics

f6dfd38

read_nsv now infers numeric types per-column (like read_csv) instead of leaving everything as strings. to_nsv converts non-string values to str and NaN to empty string before writing.

Merge branch 'master' into claude/constrain-pandas-csv-types-01SNTCQd…

7cfebeb

…xd21HTHqZd61uHHY

Skip pandas tests gracefully when pandas is not installed

0186ec8

CI runs without pandas (it's an optional dependency). Guard all test classes with @skip_no_pandas so the suite passes without it.

Install pandas in CI and revert test skip guards

1d5cbe4

The pandas extra was not being installed in CI. Switch to pip install -e ".[pandas]" so the tests actually run. Revert the skipUnless guards added in the previous commit.

Replace hand-rolled type inference with read_csv

f7a63f1

Instead of reimplementing pandas' bool/NA/numeric detection, convert NSV rows to CSV in memory and pass to read_csv directly.

Scope bool_values inside patch_pandas, drop CSV roundtrip

ce1c4e3

Keep type inference local to patch_pandas rather than leaking constants into module scope. No CSV serialization overhead.

namingbe merged commit e321bed into master Mar 14, 2026
10 checks passed

namingbe deleted the claude/constrain-pandas-csv-types-01SNTCQdxd21HTHqZd61uHHY branch March 14, 2026 23:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pandas integration with type inference for NSV format#4

Add pandas integration with type inference for NSV format#4
namingbe merged 8 commits intomasterfrom
claude/constrain-pandas-csv-types-01SNTCQdxd21HTHqZd61uHHY

namingbe commented Mar 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

namingbe commented Mar 14, 2026

Summary

Key Changes

Implementation Details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants