Skip to content

perf(color): map categorical shapes colour by code, not per-row hex parse#737

Merged
timtreis merged 1 commit into
mainfrom
perf/colorspec-codes-gather
Jun 22, 2026
Merged

perf(color): map categorical shapes colour by code, not per-row hex parse#737
timtreis merged 1 commit into
mainfrom
perf/colorspec-codes-gather

Conversation

@timtreis

Copy link
Copy Markdown
Member

Problem

ColorSpec.to_rgba — the matplotlib shapes fill + outline colour mapping — parses every row via to_rgba_array(list(color_vector)). For a categorical color_vector (K distinct hex strings over N rows) that is O(N) string parsing for O(K) actual colours.

Fix

  • Categorical color_vector: parse the K categories once, gather by .codes (codes ≥ 0 — resolution fills NaN with an explicit na_color category, _color.py:682–685).
  • Object color_vector (align_to_length pad / uniform na): pd.factorize(sort=False) on the distinct colours, then gather. (np.unique would sort the hex strings and is slower than the baseline at scale — deliberately avoided.)

Byte-identical

Verified main-vs-branch on real renders — categorical fill, categorical+outline, categorical+groups, none — max|diff| = 0. Unit test TestColorSpecToRgba locks to_rgba against the per-row to_rgba_array(list(...)) across all 5 color_vector variants (clean / na-category / filtered / none-uniform / object-after-align).

Scope & impact

@codecov-commenter

codecov-commenter commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 79.28%. Comparing base (55f4970) to head (f486614).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #737      +/-   ##
==========================================
+ Coverage   79.26%   79.28%   +0.02%     
==========================================
  Files          17       17              
  Lines        4596     4601       +5     
  Branches     1028     1029       +1     
==========================================
+ Hits         3643     3648       +5     
  Misses        603      603              
  Partials      350      350              
Files with missing lines Coverage Δ
src/spatialdata_plot/pl/_color.py 69.30% <100.00%> (+0.24%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…arse

ColorSpec.to_rgba (the matplotlib shapes fill+outline mapping) parsed every
row via to_rgba_array(list(color_vector)). For a categorical color_vector
(K distinct hex over N rows) this is O(N) string parsing for O(K) colours.

Map the K categories once and gather by code (codes >= 0: resolution fills
NaN with an na_color category); object vectors (align_to_length pad, uniform
na) use pd.factorize(sort=False) on the distinct colours. Byte-identical to
the per-row parse (verified main-vs-branch on categorical fill/outline/groups,
max|diff|=0), ~56x on the categorical path, ~16x on the object path.

Continuous (source_vector is None) is untouched; labels (map_array codes) and
points (#731) already use the compact form.
@timtreis timtreis force-pushed the perf/colorspec-codes-gather branch from 3c8c1df to f486614 Compare June 22, 2026 08:28
@timtreis timtreis merged commit eb0491d into main Jun 22, 2026
7 of 8 checks passed
@timtreis timtreis deleted the perf/colorspec-codes-gather branch June 22, 2026 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants