Skip to content

charex 0.5.0

Choose a tag to compare

@nmehran nmehran released this 01 Jun 14:53
· 9 commits to master since this release

charex 0.5.0

charex 0.5.0 is the shape/layout release for the Numba 0.65.1 compatibility
line. It extends the read-only string operation surface from scalar, 0-D, and
1-D inputs to NumPy-matching N-D and broadcast-compatible shapes.

Compatibility

  • Python >=3.10,<3.15
  • Numba >=0.65.1,<0.66
  • NumPy supported/tested window: >=1.22,<1.27 or >=2.0,<2.5
  • llvmlite 0.47.x

np.strings support is conditional on NumPy 2.x.

Highlights

  • Supports N-D and broadcast-compatible shapes for fixed-width S/U arrays
    through both np.char and np.strings.
  • Supports N-D and broadcast-compatible shapes for NumPy 2.x variable-width
    StringDType arrays through np.strings.
  • Supports contiguous arrays, read-only views, positive and negative strides,
    zero-stride views, and empty views for the supported read-only catalog.
  • Supports default StringDType() and StringDType(na_object=...) variants
    with NumPy-matching operation-specific null behavior.
  • Preserves separate np.char and np.strings semantics, including the
    trailing whitespace/NUL behavior difference.
  • Keeps transformation/output-producing operations outside this release scope.

Supported Read-Only Catalog

  • comparisons: equal, not_equal, greater, greater_equal, less,
    less_equal;
  • occurrence/search: count, startswith, endswith, find, rfind,
    index, rindex;
  • information/predicates: str_len, isalpha, isalnum, isdigit,
    isdecimal, isnumeric, isspace, islower, isupper, istitle;
  • np.char.compare_chararrays for fixed-width S/U.

Parity Audit

The full shape audit on Python 3.12.8, NumPy 2.4.6, and Numba 0.65.1 reports:

  • rows: 1702
  • matching rows: 1702
  • mismatches: 0
  • NumPy accepts but charex rejects: 0

The audit CSV is written to
docs/exploration/string_array_shape_audit.csv; the summary is maintained in
docs/string-array-shape-parity.md.

Benchmarks

The Numba 0.65.1 benchmark matrix in
docs/benchmarks/numba-v-0.65.1 includes
fixed-width np.char inputs and NumPy 2.x StringDType inputs through
np.strings.

The current matrix reports a 1.60x median speedup across 135 fixed-width and
StringDType cases, with results ranging from 1.02x to 6.51x NumPy speed.

Not In Scope

  • Transformation/output-producing operations such as replace, case conversion,
    strip, pad, join, split, encode, and decode.
  • Object array bridges.
  • Max-performance experimental kernels that have not been distilled.