v2.2.0: MkDocs website, C extension perf, typing modernization#24
Merged
v2.2.0: MkDocs website, C extension perf, typing modernization#24
Conversation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change table links from api.md#anchor to #anchor so they resolve as same-page fragment links matching mkdocstrings-generated IDs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Logo SVG: white squares (not teal) for visibility on teal nav bar - Hero h1: 3rem -> 2.2rem, subheading: 1.2rem -> 1.05rem - Section titles: 1.8rem -> 1.4rem - Reduced section padding throughout - Force white text on Get Started button to prevent Material override Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…figures Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Polars map_rows is 2.3x slower than tafra's tuple_map (1.61 ms vs 0.71 ms). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Materialized tuple_map results for fair comparison. pandas itertuples wins row mapping (1.4x faster than tafra), polars map_rows is 5-7x slower. Updated benchmark script, docs, and hero page. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skip NamedTuple construction when name=None, using zip(*values) instead. ~2-4x faster than NamedTuple path, now beats pandas itertuples at all scales (6.24 vs 6.63 ms at 10k rows). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Blue function names, teal parameters in API headings - TOC limited to depth 3 (no Parameters/Returns clutter) - Class vs method distinction in sidebar (bold vs indented) - Remove duplicate headings in api.md - Fix parameter pre-block wrapping Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
De-indent numpy-style parameter/return sections to align with dashes. Fix mismatched param names (df→s, column→columns, group_by→columns, etc). Enable docstring_style: numpy and table rendering in mkdocstrings. Add CSS for structured parameter tables (teal param names, gray types). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace Dict→dict, List→list, Tuple→tuple, Type→type, Optional[X]→X|None, Union[X,Y]→X|Y across all source files. Add from __future__ import annotations. Remove unused typing imports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ting - Replace :class:`X`, :meth:`X`, :attr:`X` etc with backtick `X` - Replace ``X`` double-backtick RST literals with `X` - Modernize InitVar type alias display in Tafra class docstring - Add JS-based syntax highlighting for method calls (blue) and type names (teal) in source code accordions - Add CSS classes for fn-call and type-name spans Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove stray :class`, :meth:, :attr: fragments. Convert remaining Union[], Dict[], List[], Tuple[], Optional[] in docstrings to Python 3.10+ pipe syntax. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…docs - Replace emoji clipboard with SVG copy icon - Reword feature cards: Just Arrays, SQL on NumPy, Built for Numerics - Hero benchmarks: Construction, Column Access, Numba JIT, Transform - Fix square_plus_one find-replace damage - Update getting-started: pip primary, pre-built wheels messaging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- composite_key: single-pass positional encoding of multi-column keys - group_indices: O(n) hash-based group labeling + row index scatter, replacing np.unique + argsort + split pipeline - 1.3x faster groupby at 100k-1M rows single-column - All 88 tests pass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hash-based string→integer encoding replaces np.unique (O(n log n)) with O(n) open-addressing hash table using PyObject_Hash. Speedups on string-column groupby: - 2.5x faster on single string column (100k-1M rows) - 1.9x faster on multi-column groupby with strings Applied to both _encode_columns and _encode_columns_paired (joins). All 88 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New C extension functions (composite_key, group_indices, encode_strings), tuple_map fast path, typing modernization, MkDocs website, docstring cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Py_DECREF after Py_BuildValue in encode_strings and group_indices (Py_BuildValue "O" format increments refcount) - Add 50k threshold to _encode_columns_paired matching _encode_columns Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
17 new tests covering composite_key, group_indices, encode_strings: - empty arrays, single element, all-same, all-unique - first-seen ordering, multi-column keys - high cardinality (10k unique), realloc triggers - non-string hashable objects Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Handle None values in state_read (str | None from _decode_missing) - Remove --strict from CI docs build (6 unavoidable griffe warnings from dataclass InitVar params) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Change Tafra class docstring Parameters→Attributes (dataclass InitVar fields aren't function params) - Move tuple_map 'name' kwarg docs into **kwargs description - Restore --strict in CI docs build - Add warn_unknown_params, docstring_style to mkdocstrings config mkdocs build --strict now passes cleanly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix mypy error: handle None in csvreader state_read - Fix griffe warnings: Parameters→Attributes for dataclass InitVar, move tuple_map 'name' kwarg into **kwargs docs - Restore --strict in CI docs build (now passes cleanly) - Modernize test file typing (Dict→dict, List→list, etc.) - Add mypy override to ignore test/ errors (pre-existing InitVar issues) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Upload LCOV coverage from Python 3.12 test run to Coveralls. Add lcov output to pytest-cov config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove explicit github-token (auto-detected), add format: lcov. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds TYPE_CHECKING-only __init__ override to Tafra class so type checkers see the correct (data, dtypes, validate, check_rows) signature instead of the confusing dataclass InitVar expansion. Fixes "Expected bool, found dict" errors in downstream code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move name from hidden kwargs.pop() to explicit keyword parameter - Remove invalid warn_unknown_params option (not in CI's mkdocstrings version) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dsfulf
added a commit
that referenced
this pull request
Mar 21, 2026
* feat: add MkDocs Material scaffold with custom landing page Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: landing page link, clipboard feedback, footer text Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: migrate benchmarks, changelog, and API reference to Markdown Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add getting started page Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add user guide pages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add tutorial pages (iris, titanic, timeseries) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: remove Sphinx, update CI to MkDocs, update project URLs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * ci: add GitHub Pages deploy workflow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update README badge and links to GitHub Pages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: widen content, API spacing, enable syntax highlighting Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add categorized method reference table to API page Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: reorganize benchmarks with charts, remove %timeit notation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add figures and output examples to tutorials Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: fix API reference table anchor links Change table links from api.md#anchor to #anchor so they resolve as same-page fragment links matching mkdocstrings-generated IDs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: replace mermaid charts with CSS bar charts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: white logo for teal nav, tighten font sizes, fix button contrast - Logo SVG: white squares (not teal) for visibility on teal nav bar - Hero h1: 3rem -> 2.2rem, subheading: 1.2rem -> 1.05rem - Section titles: 1.8rem -> 1.4rem - Reduced section padding throughout - Force white text on Get Started button to prevent Material override Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: fix hero width, content width, API spacing, favicon, tutorial figures Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update benchmarks with fresh pandas 3.0.1 / polars 1.39.0 numbers Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add polars map_rows to row mapping benchmark Polars map_rows is 2.3x slower than tafra's tuple_map (1.61 ms vs 0.71 ms). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: convert inline comment outputs to collapsible Output admonitions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: scale row mapping benchmark to 10k/100k/1M rows Materialized tuple_map results for fair comparison. pandas itertuples wins row mapping (1.4x faster than tafra), polars map_rows is 5-7x slower. Updated benchmark script, docs, and hero page. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: add fast plain-tuple path to tuple_map(name=None) Skip NamedTuple construction when name=None, using zip(*values) instead. ~2-4x faster than NamedTuple path, now beats pandas itertuples at all scales (6.24 vs 6.63 ms at 10k rows). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add row mapping bar charts, switch hero to 1M scale Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: switch to polars map_elements, add vectorized expression benchmark Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add numba benchmark, row map scales to hero, version footnotes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: shorten tafra/numpy labels to tafra in benchmark charts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: add blue highlighting to API function/method names Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: fix API reference TOC, parameter wrapping, syntax colors - Blue function names, teal parameters in API headings - TOC limited to depth 3 (no Parameters/Returns clutter) - Class vs method distinction in sidebar (bold vs indented) - Remove duplicate headings in api.md - Fix parameter pre-block wrapping Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: normalize docstring indentation for griffe parsing De-indent numpy-style parameter/return sections to align with dashes. Fix mismatched param names (df→s, column→columns, group_by→columns, etc). Enable docstring_style: numpy and table rendering in mkdocstrings. Add CSS for structured parameter tables (teal param names, gray types). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: modernize typing to Python 3.10+ built-in generics Replace Dict→dict, List→list, Tuple→tuple, Type→type, Optional[X]→X|None, Union[X,Y]→X|Y across all source files. Add from __future__ import annotations. Remove unused typing imports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: remove RST syntax from docstrings, add source code highlighting - Replace :class:`X`, :meth:`X`, :attr:`X` etc with backtick `X` - Replace ``X`` double-backtick RST literals with `X` - Modernize InitVar type alias display in Tafra class docstring - Add JS-based syntax highlighting for method calls (blue) and type names (teal) in source code accordions - Add CSS classes for fn-call and type-name spans Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: clean remaining RST artifacts and modernize docstring types Remove stray :class`, :meth:, :attr: fragments. Convert remaining Union[], Dict[], List[], Tuple[], Optional[] in docstrings to Python 3.10+ pipe syntax. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * style: hero card rewording, copy icon, feature card updates, install docs - Replace emoji clipboard with SVG copy icon - Reword feature cards: Just Arrays, SQL on NumPy, Built for Numerics - Hero benchmarks: Construction, Column Access, Numba JIT, Transform - Fix square_plus_one find-replace damage - Update getting-started: pip primary, pre-built wheels messaging Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: add C-accelerated composite_key and group_indices - composite_key: single-pass positional encoding of multi-column keys - group_indices: O(n) hash-based group labeling + row index scatter, replacing np.unique + argsort + split pipeline - 1.3x faster groupby at 100k-1M rows single-column - All 88 tests pass Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: add C-accelerated encode_strings for O(n) string hashing Hash-based string→integer encoding replaces np.unique (O(n log n)) with O(n) open-addressing hash table using PyObject_Hash. Speedups on string-column groupby: - 2.5x faster on single string column (100k-1M rows) - 1.9x faster on multi-column groupby with strings Applied to both _encode_columns and _encode_columns_paired (joins). All 88 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: update benchmarks with C extension perf improvements Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * bump version to 2.2.0 New C extension functions (composite_key, group_indices, encode_strings), tuple_map fast path, typing modernization, MkDocs website, docstring cleanup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: memory leak in Py_BuildValue calls, threshold consistency - Py_DECREF after Py_BuildValue in encode_strings and group_indices (Py_BuildValue "O" format increments refcount) - Add 50k threshold to _encode_columns_paired matching _encode_columns Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * test: add direct unit tests for C extension functions 17 new tests covering composite_key, group_indices, encode_strings: - empty arrays, single element, all-same, all-unique - first-seen ordering, multi-column keys - high cardinality (10k unique), realloc triggers - non-string hashable objects Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: mypy error in csvreader, relax docs strict mode - Handle None values in state_read (str | None from _decode_missing) - Remove --strict from CI docs build (6 unavoidable griffe warnings from dataclass InitVar params) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve griffe warnings for strict docs build - Change Tafra class docstring Parameters→Attributes (dataclass InitVar fields aren't function params) - Move tuple_map 'name' kwarg docs into **kwargs description - Restore --strict in CI docs build - Add warn_unknown_params, docstring_style to mkdocstrings config mkdocs build --strict now passes cleanly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: CI lint/docs failures, modernize test typing - Fix mypy error: handle None in csvreader state_read - Fix griffe warnings: Parameters→Attributes for dataclass InitVar, move tuple_map 'name' kwarg into **kwargs docs - Restore --strict in CI docs build (now passes cleanly) - Modernize test file typing (Dict→dict, List→list, etc.) - Add mypy override to ignore test/ errors (pre-existing InitVar issues) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * ci: add Coveralls coverage reporting Upload LCOV coverage from Python 3.12 test run to Coveralls. Add lcov output to pytest-cov config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * ci: simplify Coveralls config per docs Remove explicit github-token (auto-detected), add format: lcov. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: convert README from RST to Markdown Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add py.typed marker and TYPE_CHECKING __init__ stub Adds TYPE_CHECKING-only __init__ override to Tafra class so type checkers see the correct (data, dtypes, validate, check_rows) signature instead of the confusing dataclass InitVar expansion. Fixes "Expected bool, found dict" errors in downstream code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: make tuple_map name a proper param, fix mkdocstrings config - Move name from hidden kwargs.pop() to explicit keyword parameter - Remove invalid warn_unknown_params option (not in CI's mkdocstrings version) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
gh-pagesbranch.composite_key,group_indices,encode_strings— O(n) hash-based group construction and string encoding, replacing O(n log n) numpy paths. Multi-column GroupBy 1.9x faster, string-column GroupBy 2.5x faster.tuple_map(name=None)fast path — skips NamedTuple construction, now beats pandasitertuplesat all scales.Dict→dict,List→list,Optional[X]→X | None,Union→|withfrom __future__ import annotations.Performance highlights (vs previous release)
Test plan
pytest— 105/105 tests pass (17 new for C extension functions)mkdocs build— site builds cleanlyPy_BuildValue, threshold consistency🤖 Generated with Claude Code