fix: StringDType / <U interop and left join null handling (v2.2.1) by dsfulf · Pull Request #25 · petbox-dev/tafra

dsfulf · 2026-03-24T23:14:23Z

Summary

Join dtype validation: compare base types (_reduce_dtype) instead of raw numpy dtypes — StringDType vs <U, <U8 vs <U12 no longer reject
Union dtype validation: same base-type comparison
update_dtypes_inplace: preserve raw numpy dtypes for casting instead of round-tripping through formatted string labels — fixes silent no-op when converting between <U and StringDType
Left join null-fill: string columns use StringDType(na_object=None) instead of object, float columns use NaN instead of object, int/bool fall back to object with a warning
Version bump to 2.2.1

Motivation

pyodbc returns Python strings as object arrays, which tafra's ObjectFormatter converts to StringDType. Meanwhile, operations like .upper() on string columns produce fixed-width <U arrays. Joining these two tafras raised TypeError: dtype StringDType() does not match dtype <U12.

Test plan

108 tests pass (3 new: test_mixed_string_dtypes, test_update_dtypes_string_conversion, test_left_join_object_fallback_warning)
Verified: inner join, left join (both directions), union, concat with mixed StringDType/<U/<U widths
Verified: left join null-fill preserves string dtype, uses NaN for float, warns on int→object
Verified: update_dtypes_inplace int→float, float→int conversions still work

🤖 Generated with Claude Code

- Join _validate_dtypes: compare base types via _reduce_dtype instead of raw numpy dtypes, so StringDType vs <U and <U8 vs <U12 are accepted - Union.apply: same base-type comparison for column dtype validation - update_dtypes_inplace: preserve raw numpy dtypes for casting instead of round-tripping through _format_dtype string labels (fixes silent no-op when both <U and StringDType map to 'str') - Add test_mixed_string_dtypes covering all interop scenarios - Bump version to 2.2.1 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- String columns use StringDType(na_object=None) instead of object - Float columns use NaN instead of object - Int/bool columns still fall back to object (can't represent None) - Both C-path and fallback path updated - Tests updated for new null-fill behavior Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Warns once per column when an int/bool column has unmatched rows and must fall back to object dtype. String and float columns are handled natively (StringDType(na_object=None) and NaN respectively) and do not warn. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ning - test_update_dtypes_string_conversion: verifies <U/StringDType handling, int->float, float->int conversions - test_left_join_object_fallback_warning: asserts warning fires for int columns, does NOT fire for string/float columns Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- .readthedocs.yml: replaced by GitHub Pages / MkDocs - MANIFEST.in: unnecessary with setuptools>=64 + pyproject.toml - dev-requirements.txt: replaced by pyproject.toml + CI - test/test.bat, test/test.sh: referenced flake8 + sphinx, both removed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- E722: bare except → except StopIteration - E501: wrap long StringDType/read_csv lines - E712: == True/False → truthiness checks - F541: remove extraneous f-string prefixes - E401: split multi-imports Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Cast const npy_int64** to void* before free() to avoid "different 'const' qualifiers" warning on MSVC. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Use cast() for the object-dtype branch in left join null-fill, matching the pattern used elsewhere in the codebase. Add type: ignore for StringDType(na_object=) which numpy stubs don't support yet. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add type: ignore[assignment] on None assignments to ndarray (needed by Python 3.12 numpy stubs) - Keep type: ignore[call-arg] on StringDType(na_object=) (needed by Python 3.10 numpy stubs) - Disable warn_unused_ignores: numpy stubs differ across versions, making per-line ignores impossible to satisfy universally Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR aims to improve dtype interoperability across joins/unions (notably StringDType vs NumPy <U...) and to improve left-join null filling semantics for string/float columns, along with a version bump to 2.2.1.

Changes:

Relax join/union dtype validation to compare base types via _reduce_dtype rather than exact NumPy dtypes.
Adjust left-join null fill behavior to preserve string dtypes (StringDType(na_object=None)) and use NaN for floats, with warnings + object fallback for int/bool.
Rework update_dtypes_inplace to preserve raw dtypes for casting (and bump version/changelog/tests accordingly).

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`tafra/group.py`	Base-type dtype validation for joins/unions; new left-join null-fill behavior + warnings.
`tafra/base.py`	`update_dtypes_inplace` now keeps “raw” target dtypes for casting instead of round-tripping through formatted labels.
`test/test_tafra.py`	Updates many `StringDType` array constructions and adds tests for mixed string dtypes, `update_dtypes_inplace`, and left-join warnings/nulls.
`docs/changelog.md`	Adds 2.2.1 release notes describing dtype interop changes.
`pyproject.toml`	Version bump to 2.2.1; mypy config tweak (`warn_unused_ignores = false`).
`recipe/meta.yaml`	Version bump to 2.2.1 for packaging recipe.
`tafra/_accel.c`	Minor free-cast tweak for `col_data`.
`test/bench_vs_pandas_vs_polars.py`	Minor print formatting change.
`test/bench_tafra.py`	Minor timer label formatting change.
`test/test.sh`	Removed legacy test runner script.
`test/test.bat`	Removed legacy test runner script.
`dev-requirements.txt`	Removed legacy dev requirements file.
`MANIFEST.in`	Removed manifest file.
`.readthedocs.yml`	Removed Read the Docs config.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T00:34:41Z

docs/changelog.md

+  instead of round-tripping through formatted string labels -- fixes silent
+  no-op when converting between `<U` and `StringDType`
+


The changelog claims update_dtypes_inplace fixes a silent no-op when converting between <U and StringDType, but the current implementation still skips casting for any 'str'→'str' conversion. Either adjust the implementation to actually perform the conversion, or update this changelog entry to reflect the current behavior (e.g., only metadata updates / conversion intentionally skipped).

Suggested change

instead of round-tripping through formatted string labels -- fixes silent

no-op when converting between `<U` and `StringDType`

instead of round-tripping through formatted string labels; note that

`'str'`→`'str'` conversions (including `<U`↔`StringDType`) still skip data

casting and only update dtype metadata

Copilot · 2026-03-25T00:34:41Z

test/test_tafra.py

+            yield {'y': np.array(
+            ['one', 'two', 'one', 'two', 'one', 'two'],
+            dtype=np.dtypes.StringDType(),
+        )}


These continuation lines are misindented relative to the surrounding code, which hurts readability and makes future diffs noisier. Align the list/dtype arguments under the opening np.array( (as done in the iterator() block just above).

Copilot · 2026-03-25T00:34:42Z

test/test_tafra.py

+            yield ['y', np.array(
+            ['one', 'two', 'one', 'two', 'one', 'two'],
+            dtype=np.dtypes.StringDType(),
+        )]


Same indentation issue here: the arguments to np.array( are not indented under the opening paren, unlike the rest of the file. Re-indent to match the prevailing style used elsewhere in this test module.

Copilot · 2026-03-25T00:34:42Z

tafra/base.py

+            # Skip if both are string types — StringDType and <U are
+            # compatible and casting between them can fail or lose data.
+            if (self._reduce_dtype(current_dtype) == 'str'
+                    and self._reduce_dtype(target_dtype) == 'str'):


update_dtypes_inplace now unconditionally skips casting when both current and target dtypes reduce to 'str'. That means explicit conversions like <U10 → StringDType(na_object=None) (or StringDType → <U...>/str) will still be silently ignored, which contradicts the PR description/changelog about fixing the no-op for <U ↔ StringDType. Consider either performing the requested cast when the concrete dtypes differ, or emitting a warning/raising when skipping so callers don't think the conversion happened.

Suggested change

# Skip if both are string types — StringDType and <U are

# compatible and casting between them can fail or lose data.

if (self._reduce_dtype(current_dtype) == 'str'

and self._reduce_dtype(target_dtype) == 'str'):

# Only skip when both are already the same concrete string dtype.

# This avoids unnecessary casts but still allows explicit

# conversions between different string dtypes (e.g. <U ↔ StringDType).

if (self._reduce_dtype(current_dtype) == 'str'

and self._reduce_dtype(target_dtype) == 'str'

and current_dtype == target_dtype):

Copilot · 2026-03-25T00:34:42Z

tafra/group.py

@@ -1253,9 +1256,29 @@ def apply(self, left_t: 'Tafra', right_t: 'Tafra') -> 'Tafra':
                else:
                    # right column: fill matched rows, None for unmatched
                    if has_null:
-                        out = np.empty(len(li), dtype=object)
-                        out[matched] = right_t._data[c][ri[matched]]
-                        out[~matched] = None
+                        col_kind = right_t._data[c].dtype.kind
+                        if col_kind in ('T', 'U', 'S'):
+                            # String types: use StringDType(na_object=None)
+                            out = np.empty(len(li), dtype=np.dtypes.StringDType(na_object=None))  # type: ignore[call-arg]
+                            out[matched] = right_t._data[c][ri[matched]]
+                            out[~matched] = None  # type: ignore[assignment]
+                            dtypes[c] = 'str'
+                        elif col_kind == 'f':
+                            # Float types: use NaN for missing
+                            out = np.full(len(li), np.nan, dtype=right_t._data[c].dtype)
+                            out[matched] = right_t._data[c][ri[matched]]
+                        else:
+                            # int, bool, etc.: fall back to object
+                            warnings.warn(
+                                f"Left join: column '{c}' (dtype {right_t._data[c].dtype}) "
+                                f"has unmatched rows and no native null representation. "
+                                f"Dtype has been cast to object. "
+                                f"Use .astype(float) if NaN semantics are needed.",
+                                stacklevel=4,


Left-join null fill treats any non-('T','U','S','f') dtype as having no native nulls and falls back to object + warning. This incorrectly includes datetime64/timedelta64 (kinds 'M'/'m'), which do have a native missing value (NaT) and should be preserved similarly to floats (fill with NaT instead of casting to object/warning).

Copilot · 2026-03-25T00:34:42Z

tafra/group.py

+                            col_kind = right_t._data[column].dtype.kind
+                            if col_kind not in ('T', 'U', 'S', 'f') and dtypes[column] != 'object':
+                                warnings.warn(
+                                    f"Left join: column '{column}' "
+                                    f"(dtype {right_t._data[column].dtype}) "
+                                    f"has unmatched rows and no native null "
+                                    f"representation. Dtype has been cast to "
+                                    f"object. Use .astype(float) if NaN "
+                                    f"semantics are needed.",
+                                    stacklevel=2,
+                                )


In the non-vectorized left-join path, the fallback-to-object warning logic also treats datetime64/timedelta64 columns (kinds 'M'/'m') as having no native null representation. To avoid unnecessary dtype changes and warnings, handle these kinds by preserving the original dtype and using NaT for unmatched rows.

Dtype validation: - Join/Union validate on _dtypes metadata (user intent), not raw array dtypes. Since _format_dtype collapses all string variants to 'str', StringDType vs <U no longer rejects. - Restore metadata consistency check in joins (was removed, now back) update_dtypes_inplace: - 'str' label maps to StringDType(na_object=None) for nullable strings - _from_init flag skips this during construction to preserve original dtype Left join null-fill: - String columns: StringDType(na_object=None) with None - Float columns: original dtype with NaN - Datetime/timedelta: original dtype with NaT - Int/bool/bytes: object with None + warning - Byte strings (kind='S') excluded from string path - has_null flag replaces O(n) `None in v` scan - stacklevel=3 on both C-path and fallback warnings Test fixes: - Restore TypeError expectation for corrupted _dtypes in joins - Add datetime NaT null-fill test - Fix test indentation (DictIterable, SequenceIterable) Docs: - CLAUDE.md: document dtype intent principle, update build/architecture - README.md: add dtype metadata and left join null handling sections - Changelog: updated to reflect all behavioral changes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

30 tests in TestStringDtypeInterop covering all v2.2.1 bugfixes: - StringDType vs <U joins (both directions) - <U8 vs <U12 width mismatch joins - Union with mixed string dtypes - update_dtypes_inplace 'str' → StringDType(na_object=None) - Construction preserves original dtype - Left join null-fill for every dtype kind: string, float, float32, datetime64, timedelta64, int, bool, bytes - Object fallback warning (fires for int/bool/bytes, not str/float/dt) - Corrupted _dtypes metadata rejected in joins - Matching metadata accepted despite different array representations - Join key dtype combinations: int64, float64, StringDType, <U - Cross-type join rejection: int vs float, int vs string - Concat mixed string dtypes - Full-match left join preserves original dtypes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T02:52:37Z

tafra/group.py

+                            # datetime64/timedelta64: use NaT for missing
+                            out = np.empty(len(li), dtype=right_t._data[c].dtype)
+                            out[matched] = right_t._data[c][ri[matched]]
+                            out[~matched] = np.datetime64('NaT')


For datetime64/timedelta64 columns, missing values are filled with np.datetime64('NaT') regardless of whether the target dtype is datetime64[...] or timedelta64[...]. Using a datetime64 scalar for timedelta64 relies on cross-dtype casting and can be fragile across NumPy versions/units. Prefer constructing a NaT scalar matching the column dtype (e.g., via np.array('NaT', dtype=right_t._data[c].dtype).item()) and assign that.

Suggested change

out[~matched] = np.datetime64('NaT')

nat_value = np.array('NaT', dtype=right_t._data[c].dtype).item()

out[~matched] = nat_value

Copilot · 2026-03-25T02:52:38Z

tafra/base.py

        tafra: Tafra | None
            The updated `Tafra`.


The docstring's Returns section says tafra: Tafra | None ("The updated Tafra"), but update_dtypes_inplace is an in-place mutator that returns None. Please update the docstring to match the actual return value to avoid confusing callers.

Suggested change

tafra: Tafra | None

The updated `Tafra`.

None

This method mutates the `Tafra` in place.

Copilot · 2026-03-25T02:52:38Z

CLAUDE.md

+**`Tafra`** (`tafra/base.py`) — The dataframe class. Wraps `dict[str, np.ndarray]` where every column must share the same row count. Tracks dtypes separately in `_dtypes` — this metadata represents the user's declared intent for column types and controls dtype validation in joins, unions, and dtype updates. Implements dict-like access (keys, values, items, get, update). Decorated with `@dataclass`. String columns use numpy `StringDType(na_object=None)` to support `None` values; `_dtypes` stores `'str'` for all string variants (`StringDType`, `<U`).
+
+**Dtype design principle:** `_dtypes` metadata is the user's intent. Join/union validation compares `_dtypes` labels, not raw numpy dtypes. `update_dtypes_inplace` is how users change their intent — it updates the label AND casts the array. The `'str'` label maps to `StringDType(na_object=None)`. If assigning directly to `_data`, you must call `_coalesce_dtypes()` afterwards.


This claims Tafra string columns generally use StringDType(na_object=None), but current construction/assignment paths still create plain np.dtypes.StringDType() (e.g., _ensure_valid uses StringDType() and ObjectFormatter/CSVReader also use StringDType()). Either update those code paths to consistently use na_object=None, or clarify here that na_object=None is used only for specific cases (e.g. left-join null fill / update_dtypes_inplace({'x':'str'})).

Copilot · 2026-03-25T02:52:38Z

CLAUDE.md

+- `formatter.py` — `ObjectFormatter` for custom dtype parsing (e.g., Decimal → float); auto-converts object arrays of strings to `StringDType(na_object=None)`
+- `csvreader.py` — CSV reader with type inference; string columns produce `StringDType(na_object=None)`


These bullets say ObjectFormatter and CSVReader produce StringDType(na_object=None), but the current implementations still cast to np.dtypes.StringDType() without na_object=None. Please align the docs with the code (or update tafra/formatter.py and tafra/csvreader.py if the new dtype is intended everywhere).

Suggested change

- `formatter.py` — `ObjectFormatter` for custom dtype parsing (e.g., Decimal → float); auto-converts object arrays of strings to `StringDType(na_object=None)`

- `csvreader.py` — CSV reader with type inference; string columns produce `StringDType(na_object=None)`

- `formatter.py` — `ObjectFormatter` for custom dtype parsing (e.g., Decimal → float); auto-converts object arrays of strings to `np.dtypes.StringDType()`

- `csvreader.py` — CSV reader with type inference; string columns produce `np.dtypes.StringDType()`

- base.py _ensure_valid: string scalar creates nullable StringDType - base.py describe(): stat column uses nullable StringDType - base.py update_dtypes_inplace: fix docstring return type (None, not Tafra) - formatter.py: object arrays of strings → StringDType(na_object=None) - csvreader.py: CSV string columns → StringDType(na_object=None) - group.py: dtype-matched NaT for datetime/timedelta null-fill - CLAUDE.md: docs now accurately reflect na_object=None usage Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T03:17:28Z

pyproject.toml

+# Disabled: numpy stubs differ across Python versions, causing
+# type: ignore comments to be needed in some envs but not others.
+warn_unused_ignores = false


warn_unused_ignores is disabled globally, which reduces mypy’s ability to catch stale/accidental # type: ignore usage across the entire codebase. Consider keeping it enabled globally and instead using a narrower override (e.g., per-module/file where numpy-stubs variance forces conditional ignores), so real unused ignores elsewhere are still surfaced.

Suggested change

# Disabled: numpy stubs differ across Python versions, causing

# type: ignore comments to be needed in some envs but not others.

warn_unused_ignores = false

# Note: numpy stubs can differ across Python versions, which may cause

# type: ignore comments to be needed in some envs but not others.

warn_unused_ignores = true

test/test_tafra.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…numpy stubs Keep warn_unused_ignores=true globally so stale type: ignore comments are caught. Add per-module override for tafra.base, tafra.group, tafra.formatter, tafra.csvreader where numpy stub variance across Python versions requires conditional ignores. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

tafra/group.py:352

Union._validate zips left._data.items() with left._dtypes.items(), which can pair the wrong dtype with a column if dict insertion orders differ (e.g., user passes dtypes in a different key order than data). This can lead to incorrect dtype validation and misleading error messages. Iterate by column name (e.g., for col, dtype in left._dtypes.items(): value = left._data[col] ...) instead of zipping two dicts.

        for (data_column, value), (dtype_column, dtype) \
                in zip(left._data.items(), left._dtypes.items()):

            if data_column not in right._data or dtype_column not in right._dtypes:
                raise TypeError(
                    f'This `Tafra` column `{data_column}` does not exist in right `Tafra`.')

            # Compare user-declared dtypes (metadata = intent).
            # _format_dtype collapses string variants to 'str'.
            elif dtype != right._dtypes[dtype_column]:
                raise TypeError(
                    f'This `Tafra` column `{data_column}` dtype `{dtype}` '
                    f'does not match right `Tafra` dtype `{right._dtypes[dtype_column]}`.')

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T03:31:32Z

test/test_tafra.py

+def test_mixed_string_dtypes() -> None:
+    """StringDType vs <U should interop in joins, unions, and concat."""
+    left = Tafra({
+        'key': np.array(['a', 'b', 'c'], dtype=np.dtypes.StringDType()),
+        'val': np.array([1, 2, 3]),
+    })
+    right = Tafra({


The new test_mixed_string_dtypes covers many of the same scenarios that are then repeated (often more granularly) in TestStringDtypeInterop below (joins, union, concat, left-join null handling). This duplication significantly increases test size/runtime and maintenance burden; consider keeping one canonical suite (either the single function or the class) and removing/reducing overlap.

Copilot · 2026-03-25T03:31:32Z

test/test_tafra.py

+
+    def test_left_join_warns_on_object_fallback(self) -> None:
+        import warnings
+        left = Tafra({'k': np.array([1, 2, 3]), 'v': np.array([10, 20, 30])})
+        right = Tafra({
+            'k': np.array([2, 3]),
+            'count': np.array([100, 200]),
+        })
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter('always')
+            left.left_join(right, on=[('k', 'k', '==')])
+        msgs = [x for x in w if 'cast to object' in str(x.message)]
+        assert len(msgs) == 1
+        assert "'count'" in str(msgs[0].message)
+
+    def test_left_join_no_warning_for_native_nulls(self) -> None:
+        import warnings
+        left = Tafra({'k': np.array([1, 2, 3]), 'v': np.array([10, 20, 30])})
+        right = Tafra({
+            'k': np.array([2, 3]),
+            'name': np.array(['b', 'c'],
+                             dtype=np.dtypes.StringDType()),
+            'score': np.array([1.5, 2.5]),
+            'ts': np.array(['2024-01-01', '2024-06-15'],
+                           dtype='datetime64[D]'),
+        })
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter('always')
+            left.left_join(right, on=[('k', 'k', '==')])
+        msgs = [x for x in w if 'cast to object' in str(x.message)]
+        assert len(msgs) == 0
+


The left-join object fallback warning is tested twice (test_left_join_object_fallback_warning and TestStringDtypeInterop.test_left_join_warns_on_object_fallback), along with similar “no warning” coverage. Consider deduplicating to a single test location to avoid divergence over time (message matching, stacklevel expectations, etc.).

Suggested change

def test_left_join_warns_on_object_fallback(self) -> None:

import warnings

left = Tafra({'k': np.array([1, 2, 3]), 'v': np.array([10, 20, 30])})

right = Tafra({

'k': np.array([2, 3]),

'count': np.array([100, 200]),

})

with warnings.catch_warnings(record=True) as w:

warnings.simplefilter('always')

left.left_join(right, on=[('k', 'k', '==')])

msgs = [x for x in w if 'cast to object' in str(x.message)]

assert len(msgs) == 1

assert "'count'" in str(msgs[0].message)

def test_left_join_no_warning_for_native_nulls(self) -> None:

import warnings

left = Tafra({'k': np.array([1, 2, 3]), 'v': np.array([10, 20, 30])})

right = Tafra({

'k': np.array([2, 3]),

'name': np.array(['b', 'c'],

dtype=np.dtypes.StringDType()),

'score': np.array([1.5, 2.5]),

'ts': np.array(['2024-01-01', '2024-06-15'],

dtype='datetime64[D]'),

})

with warnings.catch_warnings(record=True) as w:

warnings.simplefilter('always')

left.left_join(right, on=[('k', 'k', '==')])

msgs = [x for x in w if 'cast to object' in str(x.message)]

assert len(msgs) == 0

# Covered in dedicated string-dtype interop tests (see TestStringDtypeInterop).

dsfulf and others added 10 commits March 24, 2026 17:56

style: fix ruff E501 line too long in Union dtype check

491c2dd

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: suppress MSVC C4090 const qualifier warning in free()

4e69d6e

Cast const npy_int64** to void* before free() to avoid "different 'const' qualifiers" warning on MSVC. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dsfulf requested a review from Copilot March 25, 2026 00:28

Copilot started reviewing on behalf of dsfulf March 25, 2026 00:28 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

dsfulf and others added 2 commits March 24, 2026 21:38

dsfulf requested a review from Copilot March 25, 2026 02:46

Copilot started reviewing on behalf of dsfulf March 25, 2026 02:47 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

dsfulf and others added 2 commits March 24, 2026 22:05

docs: changelog precision — StringDType(na_object=None)

2f265ae

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

dsfulf requested a review from Copilot March 25, 2026 03:12

Copilot started reviewing on behalf of dsfulf March 25, 2026 03:12 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

dsfulf and others added 2 commits March 24, 2026 22:23

Update test/test_tafra.py

1357457

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

dsfulf requested a review from Copilot March 25, 2026 03:26

Copilot started reviewing on behalf of dsfulf March 25, 2026 03:26 View session

Copilot AI reviewed Mar 25, 2026

View reviewed changes

dsfulf merged commit 6cda733 into main Mar 25, 2026
11 checks passed

		instead of round-tripping through formatted string labels -- fixes silent
		no-op when converting between `<U` and `StringDType`

-  instead of round-tripping through formatted string labels -- fixes silent
-  no-op when converting between `<U` and `StringDType`
+  instead of round-tripping through formatted string labels; note that
+  `'str'`→`'str'` conversions (including `<U`↔`StringDType`) still skip data
+  casting and only update dtype metadata

	out[~matched] = np.datetime64('NaT')
	nat_value = np.array('NaT', dtype=right_t._data[c].dtype).item()
	out[~matched] = nat_value

		`Tafra` (`tafra/base.py`) — The dataframe class. Wraps `dict[str, np.ndarray]` where every column must share the same row count. Tracks dtypes separately in `_dtypes` — this metadata represents the user's declared intent for column types and controls dtype validation in joins, unions, and dtype updates. Implements dict-like access (keys, values, items, get, update). Decorated with `@dataclass`. String columns use numpy `StringDType(na_object=None)` to support `None` values; `_dtypes` stores `'str'` for all string variants (`StringDType`, `<U`).

		Dtype design principle: `_dtypes` metadata is the user's intent. Join/union validation compares `_dtypes` labels, not raw numpy dtypes. `update_dtypes_inplace` is how users change their intent — it updates the label AND casts the array. The `'str'` label maps to `StringDType(na_object=None)`. If assigning directly to `_data`, you must call `_coalesce_dtypes()` afterwards.

		- `formatter.py` — `ObjectFormatter` for custom dtype parsing (e.g., Decimal → float); auto-converts object arrays of strings to `StringDType(na_object=None)`
		- `csvreader.py` — CSV reader with type inference; string columns produce `StringDType(na_object=None)`

Conversation

dsfulf commented Mar 24, 2026

Summary

Motivation

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants