Skip to content

Speed up identity distance metric computation for two different sequence arrays#701

Merged
grst merged 2 commits into
scverse:mainfrom
felixpetschko:performance/identity-metric
May 7, 2026
Merged

Speed up identity distance metric computation for two different sequence arrays#701
grst merged 2 commits into
scverse:mainfrom
felixpetschko:performance/identity-metric

Conversation

@felixpetschko
Copy link
Copy Markdown
Collaborator

Summary

This PR speeds up the identity distance metric computation when comparing two different sequence arrays via the IdentityDistanceCalculator.

Instead of checking every pairwise combination of seqs and seqs2, the implementation now builds a lookup from sequences in seqs2 to their column indices and uses it to get matching matrix entries directly.

For example, the previous performance issue became a problem when running the following call with a large number of cells:

ir.pp.ir_dist(mdata, vdjdb, metric="identity", sequence="aa")

With this change, I was able to run the IdentityDistanceCalculator with millions of sequences in seconds.

@felixpetschko felixpetschko changed the title Speed up identity distance metric computation for two sequence arrays Speed up identity distance metric computation for two different sequence arrays May 7, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

❌ Patch coverage is 0% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 19.75%. Comparing base (05344e5) to head (644ad8b).

Files with missing lines Patch % Lines
src/scirpy/ir_dist/metrics.py 0.00% 11 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #701      +/-   ##
==========================================
- Coverage   19.76%   19.75%   -0.02%     
==========================================
  Files          51       51              
  Lines        4578     4581       +3     
==========================================
  Hits          905      905              
- Misses       3673     3676       +3     
Files with missing lines Coverage Δ
src/scirpy/ir_dist/metrics.py 14.56% <0.00%> (-0.08%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@grst grst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@grst grst merged commit 4e970d1 into scverse:main May 7, 2026
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants