Skip to content

v2.2.0: MkDocs website, C extension perf, typing modernization#24

Merged
dsfulf merged 47 commits intomainfrom
feature/mkdocs-website
Mar 20, 2026
Merged

v2.2.0: MkDocs website, C extension perf, typing modernization#24
dsfulf merged 47 commits intomainfrom
feature/mkdocs-website

Conversation

@dsfulf
Copy link
Member

@dsfulf dsfulf commented Mar 20, 2026

Summary

  • MkDocs Material website with custom landing page, benchmark charts, tutorials, user guide, and API reference with structured parameter tables. Replaces Sphinx/ReadTheDocs. Deploys to GitHub Pages via gh-pages branch.
  • C extension additions: composite_key, group_indices, encode_strings — O(n) hash-based group construction and string encoding, replacing O(n log n) numpy paths. Multi-column GroupBy 1.9x faster, string-column GroupBy 2.5x faster.
  • tuple_map(name=None) fast path — skips NamedTuple construction, now beats pandas itertuples at all scales.
  • Typing modernizationDictdict, Listlist, Optional[X]X | None, Union| with from __future__ import annotations.
  • Docstring cleanup — RST cross-references removed, indentation normalized for griffe, parameter name mismatches fixed.
  • Version bump to 2.2.0. No breaking changes.

Performance highlights (vs previous release)

Benchmark v2.1.0 v2.2.0 Improvement
GroupBy 100k, 2-col 8.72 ms 4.00 ms 2.2x
GroupBy 1M, 2-col 97 ms 48 ms 2.0x
GroupBy 100k, str-col 8.68 ms 3.52 ms 2.5x
tuple_map 100k 87.6 ms 62.2 ms 1.4x

Test plan

  • pytest — 105/105 tests pass (17 new for C extension functions)
  • mkdocs build — site builds cleanly
  • C extension edge cases: empty arrays, single element, high cardinality, realloc triggers
  • Code review: fixed memory leak in Py_BuildValue, threshold consistency
  • Verify GitHub Pages deploy after merge
  • Verify PyPI wheel includes new C functions

🤖 Generated with Claude Code

dsfulf and others added 30 commits March 18, 2026 19:35
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change table links from api.md#anchor to #anchor so they resolve
as same-page fragment links matching mkdocstrings-generated IDs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Logo SVG: white squares (not teal) for visibility on teal nav bar
- Hero h1: 3rem -> 2.2rem, subheading: 1.2rem -> 1.05rem
- Section titles: 1.8rem -> 1.4rem
- Reduced section padding throughout
- Force white text on Get Started button to prevent Material override

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…figures

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Polars map_rows is 2.3x slower than tafra's tuple_map (1.61 ms vs 0.71 ms).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Materialized tuple_map results for fair comparison. pandas itertuples
wins row mapping (1.4x faster than tafra), polars map_rows is 5-7x
slower. Updated benchmark script, docs, and hero page.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Skip NamedTuple construction when name=None, using zip(*values)
instead. ~2-4x faster than NamedTuple path, now beats pandas
itertuples at all scales (6.24 vs 6.63 ms at 10k rows).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Blue function names, teal parameters in API headings
- TOC limited to depth 3 (no Parameters/Returns clutter)
- Class vs method distinction in sidebar (bold vs indented)
- Remove duplicate headings in api.md
- Fix parameter pre-block wrapping

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
De-indent numpy-style parameter/return sections to align with dashes.
Fix mismatched param names (df→s, column→columns, group_by→columns,
etc). Enable docstring_style: numpy and table rendering in mkdocstrings.
Add CSS for structured parameter tables (teal param names, gray types).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace Dict→dict, List→list, Tuple→tuple, Type→type,
Optional[X]→X|None, Union[X,Y]→X|Y across all source files.
Add from __future__ import annotations. Remove unused typing imports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dsfulf and others added 17 commits March 20, 2026 10:40
…ting

- Replace :class:`X`, :meth:`X`, :attr:`X` etc with backtick `X`
- Replace ``X`` double-backtick RST literals with `X`
- Modernize InitVar type alias display in Tafra class docstring
- Add JS-based syntax highlighting for method calls (blue) and
  type names (teal) in source code accordions
- Add CSS classes for fn-call and type-name spans

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove stray :class`, :meth:, :attr: fragments. Convert remaining
Union[], Dict[], List[], Tuple[], Optional[] in docstrings to
Python 3.10+ pipe syntax.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…docs

- Replace emoji clipboard with SVG copy icon
- Reword feature cards: Just Arrays, SQL on NumPy, Built for Numerics
- Hero benchmarks: Construction, Column Access, Numba JIT, Transform
- Fix square_plus_one find-replace damage
- Update getting-started: pip primary, pre-built wheels messaging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- composite_key: single-pass positional encoding of multi-column keys
- group_indices: O(n) hash-based group labeling + row index scatter,
  replacing np.unique + argsort + split pipeline
- 1.3x faster groupby at 100k-1M rows single-column
- All 88 tests pass

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Hash-based string→integer encoding replaces np.unique (O(n log n))
with O(n) open-addressing hash table using PyObject_Hash.

Speedups on string-column groupby:
- 2.5x faster on single string column (100k-1M rows)
- 1.9x faster on multi-column groupby with strings

Applied to both _encode_columns and _encode_columns_paired (joins).
All 88 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New C extension functions (composite_key, group_indices, encode_strings),
tuple_map fast path, typing modernization, MkDocs website, docstring cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Py_DECREF after Py_BuildValue in encode_strings and group_indices
  (Py_BuildValue "O" format increments refcount)
- Add 50k threshold to _encode_columns_paired matching _encode_columns

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
17 new tests covering composite_key, group_indices, encode_strings:
- empty arrays, single element, all-same, all-unique
- first-seen ordering, multi-column keys
- high cardinality (10k unique), realloc triggers
- non-string hashable objects

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Handle None values in state_read (str | None from _decode_missing)
- Remove --strict from CI docs build (6 unavoidable griffe warnings
  from dataclass InitVar params)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Change Tafra class docstring Parameters→Attributes (dataclass InitVar
  fields aren't function params)
- Move tuple_map 'name' kwarg docs into **kwargs description
- Restore --strict in CI docs build
- Add warn_unknown_params, docstring_style to mkdocstrings config

mkdocs build --strict now passes cleanly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix mypy error: handle None in csvreader state_read
- Fix griffe warnings: Parameters→Attributes for dataclass InitVar,
  move tuple_map 'name' kwarg into **kwargs docs
- Restore --strict in CI docs build (now passes cleanly)
- Modernize test file typing (Dict→dict, List→list, etc.)
- Add mypy override to ignore test/ errors (pre-existing InitVar issues)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Upload LCOV coverage from Python 3.12 test run to Coveralls.
Add lcov output to pytest-cov config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove explicit github-token (auto-detected), add format: lcov.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds TYPE_CHECKING-only __init__ override to Tafra class so type
checkers see the correct (data, dtypes, validate, check_rows) signature
instead of the confusing dataclass InitVar expansion. Fixes "Expected
bool, found dict" errors in downstream code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Move name from hidden kwargs.pop() to explicit keyword parameter
- Remove invalid warn_unknown_params option (not in CI's mkdocstrings version)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@dsfulf dsfulf merged commit 7d83e0a into main Mar 20, 2026
7 checks passed
dsfulf added a commit that referenced this pull request Mar 21, 2026
* feat: add MkDocs Material scaffold with custom landing page

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: landing page link, clipboard feedback, footer text

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: migrate benchmarks, changelog, and API reference to Markdown

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add getting started page

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add user guide pages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add tutorial pages (iris, titanic, timeseries)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove Sphinx, update CI to MkDocs, update project URLs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci: add GitHub Pages deploy workflow

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update README badge and links to GitHub Pages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: widen content, API spacing, enable syntax highlighting

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add categorized method reference table to API page

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: reorganize benchmarks with charts, remove %timeit notation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add figures and output examples to tutorials

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: fix API reference table anchor links

Change table links from api.md#anchor to #anchor so they resolve
as same-page fragment links matching mkdocstrings-generated IDs.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: replace mermaid charts with CSS bar charts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: white logo for teal nav, tighten font sizes, fix button contrast

- Logo SVG: white squares (not teal) for visibility on teal nav bar
- Hero h1: 3rem -> 2.2rem, subheading: 1.2rem -> 1.05rem
- Section titles: 1.8rem -> 1.4rem
- Reduced section padding throughout
- Force white text on Get Started button to prevent Material override

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: fix hero width, content width, API spacing, favicon, tutorial figures

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update benchmarks with fresh pandas 3.0.1 / polars 1.39.0 numbers

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add polars map_rows to row mapping benchmark

Polars map_rows is 2.3x slower than tafra's tuple_map (1.61 ms vs 0.71 ms).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: convert inline comment outputs to collapsible Output admonitions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: scale row mapping benchmark to 10k/100k/1M rows

Materialized tuple_map results for fair comparison. pandas itertuples
wins row mapping (1.4x faster than tafra), polars map_rows is 5-7x
slower. Updated benchmark script, docs, and hero page.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: add fast plain-tuple path to tuple_map(name=None)

Skip NamedTuple construction when name=None, using zip(*values)
instead. ~2-4x faster than NamedTuple path, now beats pandas
itertuples at all scales (6.24 vs 6.63 ms at 10k rows).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add row mapping bar charts, switch hero to 1M scale

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: switch to polars map_elements, add vectorized expression benchmark

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: add numba benchmark, row map scales to hero, version footnotes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: shorten tafra/numpy labels to tafra in benchmark charts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: add blue highlighting to API function/method names

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: fix API reference TOC, parameter wrapping, syntax colors

- Blue function names, teal parameters in API headings
- TOC limited to depth 3 (no Parameters/Returns clutter)
- Class vs method distinction in sidebar (bold vs indented)
- Remove duplicate headings in api.md
- Fix parameter pre-block wrapping

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: normalize docstring indentation for griffe parsing

De-indent numpy-style parameter/return sections to align with dashes.
Fix mismatched param names (df→s, column→columns, group_by→columns,
etc). Enable docstring_style: numpy and table rendering in mkdocstrings.
Add CSS for structured parameter tables (teal param names, gray types).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: modernize typing to Python 3.10+ built-in generics

Replace Dict→dict, List→list, Tuple→tuple, Type→type,
Optional[X]→X|None, Union[X,Y]→X|Y across all source files.
Add from __future__ import annotations. Remove unused typing imports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: remove RST syntax from docstrings, add source code highlighting

- Replace :class:`X`, :meth:`X`, :attr:`X` etc with backtick `X`
- Replace ``X`` double-backtick RST literals with `X`
- Modernize InitVar type alias display in Tafra class docstring
- Add JS-based syntax highlighting for method calls (blue) and
  type names (teal) in source code accordions
- Add CSS classes for fn-call and type-name spans

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: clean remaining RST artifacts and modernize docstring types

Remove stray :class`, :meth:, :attr: fragments. Convert remaining
Union[], Dict[], List[], Tuple[], Optional[] in docstrings to
Python 3.10+ pipe syntax.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: hero card rewording, copy icon, feature card updates, install docs

- Replace emoji clipboard with SVG copy icon
- Reword feature cards: Just Arrays, SQL on NumPy, Built for Numerics
- Hero benchmarks: Construction, Column Access, Numba JIT, Transform
- Fix square_plus_one find-replace damage
- Update getting-started: pip primary, pre-built wheels messaging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: add C-accelerated composite_key and group_indices

- composite_key: single-pass positional encoding of multi-column keys
- group_indices: O(n) hash-based group labeling + row index scatter,
  replacing np.unique + argsort + split pipeline
- 1.3x faster groupby at 100k-1M rows single-column
- All 88 tests pass

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* perf: add C-accelerated encode_strings for O(n) string hashing

Hash-based string→integer encoding replaces np.unique (O(n log n))
with O(n) open-addressing hash table using PyObject_Hash.

Speedups on string-column groupby:
- 2.5x faster on single string column (100k-1M rows)
- 1.9x faster on multi-column groupby with strings

Applied to both _encode_columns and _encode_columns_paired (joins).
All 88 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: update benchmarks with C extension perf improvements

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* bump version to 2.2.0

New C extension functions (composite_key, group_indices, encode_strings),
tuple_map fast path, typing modernization, MkDocs website, docstring cleanup.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: memory leak in Py_BuildValue calls, threshold consistency

- Py_DECREF after Py_BuildValue in encode_strings and group_indices
  (Py_BuildValue "O" format increments refcount)
- Add 50k threshold to _encode_columns_paired matching _encode_columns

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add direct unit tests for C extension functions

17 new tests covering composite_key, group_indices, encode_strings:
- empty arrays, single element, all-same, all-unique
- first-seen ordering, multi-column keys
- high cardinality (10k unique), realloc triggers
- non-string hashable objects

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: mypy error in csvreader, relax docs strict mode

- Handle None values in state_read (str | None from _decode_missing)
- Remove --strict from CI docs build (6 unavoidable griffe warnings
  from dataclass InitVar params)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: resolve griffe warnings for strict docs build

- Change Tafra class docstring Parameters→Attributes (dataclass InitVar
  fields aren't function params)
- Move tuple_map 'name' kwarg docs into **kwargs description
- Restore --strict in CI docs build
- Add warn_unknown_params, docstring_style to mkdocstrings config

mkdocs build --strict now passes cleanly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: CI lint/docs failures, modernize test typing

- Fix mypy error: handle None in csvreader state_read
- Fix griffe warnings: Parameters→Attributes for dataclass InitVar,
  move tuple_map 'name' kwarg into **kwargs docs
- Restore --strict in CI docs build (now passes cleanly)
- Modernize test file typing (Dict→dict, List→list, etc.)
- Add mypy override to ignore test/ errors (pre-existing InitVar issues)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci: add Coveralls coverage reporting

Upload LCOV coverage from Python 3.12 test run to Coveralls.
Add lcov output to pytest-cov config.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* ci: simplify Coveralls config per docs

Remove explicit github-token (auto-detected), add format: lcov.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: convert README from RST to Markdown

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add py.typed marker and TYPE_CHECKING __init__ stub

Adds TYPE_CHECKING-only __init__ override to Tafra class so type
checkers see the correct (data, dtypes, validate, check_rows) signature
instead of the confusing dataclass InitVar expansion. Fixes "Expected
bool, found dict" errors in downstream code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: make tuple_map name a proper param, fix mkdocstrings config

- Move name from hidden kwargs.pop() to explicit keyword parameter
- Remove invalid warn_unknown_params option (not in CI's mkdocstrings version)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant