Skip to content

Conversation

@cofin
Copy link
Member

@cofin cofin commented Oct 8, 2025

Summary

Implements optional NumPy array serialization support in SQLSpec's serialization system, following the established pattern for optional dependencies (pydantic, msgspec, attrs, pyarrow).

Changes

Type System (_typing.py, typing.py)

  • Added NumpyArrayStub protocol for type safety when numpy not installed
  • Added conditional numpy import following pyarrow pattern
  • Exported NumpyArray type for public API
  • Added NUMPY_INSTALLED flag for runtime checks

Serialization (_serialization.py)

  • Made numpy imports conditional based on NUMPY_INSTALLED flag
  • Added support for numpy array serialization to JSON (arrays → lists)
  • Fixed _type_to_string return type bug (was returning str, now returns Any)
  • Enhanced module docstring to mention numpy support
  • Graceful degradation when numpy not available
  • Conditional OPT_SERIALIZE_NUMPY flag in OrjsonSerializer

Testing (test_serialization.py)

  • Added 27 comprehensive function-based tests
  • Coverage includes: 1D/2D/3D arrays, dtypes, edge cases, integration tests
  • Tests for both with/without numpy scenarios
  • All tests passing (42/42 serialization tests, 2,394 total unit tests)

Backward Compatibility

  • ✅ Zero breaking changes
  • ✅ NumPy remains optional dependency
  • ✅ Type-safe with proper stub protocol
  • ✅ All existing tests pass
  • ✅ Linting passes (ruff, mypy, pyright)

Test Results

tests/unit/test_serialization.py: 42 passed
All unit tests: 2,394 passed, 9 skipped
Linting: ✓ ruff, mypy, pyright all clean

Implementation Details

Type Stubs Pattern

Following the established pattern from PyArrow:

@runtime_checkable
class NumpyArrayStub(Protocol):
    """Protocol stub for numpy.ndarray when numpy is not installed."""
    
    def tolist(self) -> list[Any]:
        """Convert array to Python list."""
        ...

try:
    from numpy import ndarray as NumpyArray
except ImportError:
    NumpyArray = NumpyArrayStub  # type: ignore[assignment,misc]

Conditional Serialization

In _type_to_string:

if NUMPY_INSTALLED:
    import numpy as np
    
    if isinstance(value, np.ndarray):
        return value.tolist()

In OrjsonSerializer.encode:

options = OPT_NAIVE_UTC | OPT_SERIALIZE_UUID

if NUMPY_INSTALLED:
    from orjson import OPT_SERIALIZE_NUMPY
    options |= OPT_SERIALIZE_NUMPY

Test Coverage

27 new tests covering:

  1. Basic Serialization: 1D, 2D, 3D arrays
  2. Edge Cases: Empty arrays, 0-D arrays, single element
  3. Data Types: int32, float64, bool, string arrays
  4. Round-trip: Encoding and decoding with bytes/string
  5. Integration: Works with datetime/UUID serialization
  6. Nested Structures: Arrays in dicts, lists, nested objects
  7. Performance: Large array handling
  8. Optional Handling: Graceful degradation without numpy
  9. Mixed Content: Arrays combined with other special types

Test Plan

  • With numpy installed: arrays serialize to lists
  • Without numpy: no import errors, graceful degradation
  • Type checking passes in both scenarios
  • Integration with datetime/UUID serialization
  • Edge cases: empty arrays, 0-D arrays, large arrays
  • Multiple dtypes: int32, float64, bool, string
  • All existing tests continue to pass
  • Ruff linting passes
  • Mypy type checking passes
  • Pyright type checking passes

Migration Guide

No migration needed - this is a purely additive feature. If numpy is installed, arrays will automatically serialize. If not, everything works as before.

Example Usage

from sqlspec._serialization import encode_json, decode_json
import numpy as np

data = {
    "id": 123,
    "values": np.array([1.5, 2.5, 3.5]),
    "matrix": np.array([[1, 2], [3, 4]])
}

# Serialize (arrays become lists)
json_str = encode_json(data)

# Deserialize
result = decode_json(json_str)
# result["values"] == [1.5, 2.5, 3.5]
# result["matrix"] == [[1, 2], [3, 4]]

Related Issues

N/A - This is a proactive enhancement to support a common use case.

- Add NumpyArray type stubs in _typing.py for optional numpy support
- Make numpy imports conditional in _serialization.py
- Add 27 comprehensive function-based tests covering:
  * 1D, 2D, 3D, and edge case array serialization
  * Integration with datetime/UUID serialization
  * Graceful degradation without numpy
  * Multiple dtypes (int32, float64, bool, string)
- Fix bug in _type_to_string return type
- Update documentation for optional numpy feature

Backward compatible - numpy remains optional dependency
@cofin cofin merged commit 8a63277 into main Oct 8, 2025
10 checks passed
@cofin cofin deleted the feat/optional-numpy-serialization branch October 8, 2025 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants