Skip to content

Add tcrblosum support to TCRdist#685

Draft
felixpetschko wants to merge 11 commits intoscverse:mainfrom
felixpetschko:feature/tcrblosum
Draft

Add tcrblosum support to TCRdist#685
felixpetschko wants to merge 11 commits intoscverse:mainfrom
felixpetschko:feature/tcrblosum

Conversation

@felixpetschko
Copy link
Copy Markdown
Collaborator

So far, the TCRdist metric used a distance matrix derived from the blosum62 substitution matrix. This PR extends TCRdistDistanceCalculator with a new base_matrix="tcrblosum" option alongside the existing default blosum62 behavior. This way, distance matrices based on the tcrblosum substitution matrices (different matrices for alpha and beta chain) are used for the TCRdist metric calculation.

I try to illustrate how I derived the tcrblosum based distance matrices in this google colab notebook.

The usage of the tcrblosum matrices was already discussed in #591.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 6, 2026

Codecov Report

❌ Patch coverage is 29.41176% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 19.20%. Comparing base (c0a3947) to head (09da693).

Files with missing lines Patch % Lines
src/scirpy/ir_dist/metrics.py 28.57% 10 Missing ⚠️
src/scirpy/ir_dist/__init__.py 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #685      +/-   ##
==========================================
- Coverage   19.31%   19.20%   -0.12%     
==========================================
  Files          51       51              
  Lines        4633     4645      +12     
==========================================
- Hits          895      892       -3     
- Misses       3738     3753      +15     
Files with missing lines Coverage Δ
src/scirpy/ir_dist/__init__.py 21.21% <33.33%> (ø)
src/scirpy/ir_dist/metrics.py 13.81% <28.57%> (-0.84%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@felixpetschko felixpetschko requested a review from grst April 6, 2026 10:01
@grst grst added the run-gpu-ci runs GPU CI label Apr 7, 2026
Copy link
Copy Markdown
Collaborator

@grst grst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation-wise this looks great!

What's still missing is

  • changelog update
  • Documentation-update of the user-facing (pp.ir_dist) method. Probably best to add a new metric tcrblosum or tcrdist_tcrblosum.
  • Reference to the TCRblosum paper in the documentation
  • Maybe tutorial update?

# fmt: off
tcr_dict_distance_matrix = {('A', 'A'): 0, ('A', 'C'): 4, ('A', 'D'): 4, ('A', 'E'): 4, ('A', 'F'): 4, ('A', 'G'): 4, ('A', 'H'): 4, ('A', 'I'): 4, ('A', 'K'): 4, ('A', 'L'): 4, ('A', 'M'): 4, ('A', 'N'): 4, ('A', 'P'): 4, ('A', 'Q'): 4, ('A', 'R'): 4, ('A', 'S'): 3, ('A', 'T'): 4, ('A', 'V'): 4, ('A', 'W'): 4, ('A', 'Y'): 4, ('C', 'A'): 4, ('C', 'C'): 0, ('C', 'D'): 4, ('C', 'E'): 4, ('C', 'F'): 4, ('C', 'G'): 4, ('C', 'H'): 4, ('C', 'I'): 4, ('C', 'K'): 4, ('C', 'L'): 4, ('C', 'M'): 4, ('C', 'N'): 4, ('C', 'P'): 4, ('C', 'Q'): 4, ('C', 'R'): 4, ('C', 'S'): 4, ('C', 'T'): 4, ('C', 'V'): 4, ('C', 'W'): 4, ('C', 'Y'): 4, ('D', 'A'): 4, ('D', 'C'): 4, ('D', 'D'): 0, ('D', 'E'): 2, ('D', 'F'): 4, ('D', 'G'): 4, ('D', 'H'): 4, ('D', 'I'): 4, ('D', 'K'): 4, ('D', 'L'): 4, ('D', 'M'): 4, ('D', 'N'): 3, ('D', 'P'): 4, ('D', 'Q'): 4, ('D', 'R'): 4, ('D', 'S'): 4, ('D', 'T'): 4, ('D', 'V'): 4, ('D', 'W'): 4, ('D', 'Y'): 4, ('E', 'A'): 4, ('E', 'C'): 4, ('E', 'D'): 2, ('E', 'E'): 0, ('E', 'F'): 4, ('E', 'G'): 4, ('E', 'H'): 4, ('E', 'I'): 4, ('E', 'K'): 3, ('E', 'L'): 4, ('E', 'M'): 4, ('E', 'N'): 4, ('E', 'P'): 4, ('E', 'Q'): 2, ('E', 'R'): 4, ('E', 'S'): 4, ('E', 'T'): 4, ('E', 'V'): 4, ('E', 'W'): 4, ('E', 'Y'): 4, ('F', 'A'): 4, ('F', 'C'): 4, ('F', 'D'): 4, ('F', 'E'): 4, ('F', 'F'): 0, ('F', 'G'): 4, ('F', 'H'): 4, ('F', 'I'): 4, ('F', 'K'): 4, ('F', 'L'): 4, ('F', 'M'): 4, ('F', 'N'): 4, ('F', 'P'): 4, ('F', 'Q'): 4, ('F', 'R'): 4, ('F', 'S'): 4, ('F', 'T'): 4, ('F', 'V'): 4, ('F', 'W'): 3, ('F', 'Y'): 1, ('G', 'A'): 4, ('G', 'C'): 4, ('G', 'D'): 4, ('G', 'E'): 4, ('G', 'F'): 4, ('G', 'G'): 0, ('G', 'H'): 4, ('G', 'I'): 4, ('G', 'K'): 4, ('G', 'L'): 4, ('G', 'M'): 4, ('G', 'N'): 4, ('G', 'P'): 4, ('G', 'Q'): 4, ('G', 'R'): 4, ('G', 'S'): 4, ('G', 'T'): 4, ('G', 'V'): 4, ('G', 'W'): 4, ('G', 'Y'): 4, ('H', 'A'): 4, ('H', 'C'): 4, ('H', 'D'): 4, ('H', 'E'): 4, ('H', 'F'): 4, ('H', 'G'): 4, ('H', 'H'): 0, ('H', 'I'): 4, ('H', 'K'): 4, ('H', 'L'): 4, ('H', 'M'): 4, ('H', 'N'): 3, ('H', 'P'): 4, ('H', 'Q'): 4, ('H', 'R'): 4, ('H', 'S'): 4, ('H', 'T'): 4, ('H', 'V'): 4, ('H', 'W'): 4, ('H', 'Y'): 2, ('I', 'A'): 4, ('I', 'C'): 4, ('I', 'D'): 4, ('I', 'E'): 4, ('I', 'F'): 4, ('I', 'G'): 4, ('I', 'H'): 4, ('I', 'I'): 0, ('I', 'K'): 4, ('I', 'L'): 2, ('I', 'M'): 3, ('I', 'N'): 4, ('I', 'P'): 4, ('I', 'Q'): 4, ('I', 'R'): 4, ('I', 'S'): 4, ('I', 'T'): 4, ('I', 'V'): 1, ('I', 'W'): 4, ('I', 'Y'): 4, ('K', 'A'): 4, ('K', 'C'): 4, ('K', 'D'): 4, ('K', 'E'): 3, ('K', 'F'): 4, ('K', 'G'): 4, ('K', 'H'): 4, ('K', 'I'): 4, ('K', 'K'): 0, ('K', 'L'): 4, ('K', 'M'): 4, ('K', 'N'): 4, ('K', 'P'): 4, ('K', 'Q'): 3, ('K', 'R'): 2, ('K', 'S'): 4, ('K', 'T'): 4, ('K', 'V'): 4, ('K', 'W'): 4, ('K', 'Y'): 4, ('L', 'A'): 4, ('L', 'C'): 4, ('L', 'D'): 4, ('L', 'E'): 4, ('L', 'F'): 4, ('L', 'G'): 4, ('L', 'H'): 4, ('L', 'I'): 2, ('L', 'K'): 4, ('L', 'L'): 0, ('L', 'M'): 2, ('L', 'N'): 4, ('L', 'P'): 4, ('L', 'Q'): 4, ('L', 'R'): 4, ('L', 'S'): 4, ('L', 'T'): 4, ('L', 'V'): 3, ('L', 'W'): 4, ('L', 'Y'): 4, ('M', 'A'): 4, ('M', 'C'): 4, ('M', 'D'): 4, ('M', 'E'): 4, ('M', 'F'): 4, ('M', 'G'): 4, ('M', 'H'): 4, ('M', 'I'): 3, ('M', 'K'): 4, ('M', 'L'): 2, ('M', 'M'): 0, ('M', 'N'): 4, ('M', 'P'): 4, ('M', 'Q'): 4, ('M', 'R'): 4, ('M', 'S'): 4, ('M', 'T'): 4, ('M', 'V'): 3, ('M', 'W'): 4, ('M', 'Y'): 4, ('N', 'A'): 4, ('N', 'C'): 4, ('N', 'D'): 3, ('N', 'E'): 4, ('N', 'F'): 4, ('N', 'G'): 4, ('N', 'H'): 3, ('N', 'I'): 4, ('N', 'K'): 4, ('N', 'L'): 4, ('N', 'M'): 4, ('N', 'N'): 0, ('N', 'P'): 4, ('N', 'Q'): 4, ('N', 'R'): 4, ('N', 'S'): 3, ('N', 'T'): 4, ('N', 'V'): 4, ('N', 'W'): 4, ('N', 'Y'): 4, ('P', 'A'): 4, ('P', 'C'): 4, ('P', 'D'): 4, ('P', 'E'): 4, ('P', 'F'): 4, ('P', 'G'): 4, ('P', 'H'): 4, ('P', 'I'): 4, ('P', 'K'): 4, ('P', 'L'): 4, ('P', 'M'): 4, ('P', 'N'): 4, ('P', 'P'): 0, ('P', 'Q'): 4, ('P', 'R'): 4, ('P', 'S'): 4, ('P', 'T'): 4, ('P', 'V'): 4, ('P', 'W'): 4, ('P', 'Y'): 4, ('Q', 'A'): 4, ('Q', 'C'): 4, ('Q', 'D'): 4, ('Q', 'E'): 2, ('Q', 'F'): 4, ('Q', 'G'): 4, ('Q', 'H'): 4, ('Q', 'I'): 4, ('Q', 'K'): 3, ('Q', 'L'): 4, ('Q', 'M'): 4, ('Q', 'N'): 4, ('Q', 'P'): 4, ('Q', 'Q'): 0, ('Q', 'R'): 3, ('Q', 'S'): 4, ('Q', 'T'): 4, ('Q', 'V'): 4, ('Q', 'W'): 4, ('Q', 'Y'): 4, ('R', 'A'): 4, ('R', 'C'): 4, ('R', 'D'): 4, ('R', 'E'): 4, ('R', 'F'): 4, ('R', 'G'): 4, ('R', 'H'): 4, ('R', 'I'): 4, ('R', 'K'): 2, ('R', 'L'): 4, ('R', 'M'): 4, ('R', 'N'): 4, ('R', 'P'): 4, ('R', 'Q'): 3, ('R', 'R'): 0, ('R', 'S'): 4, ('R', 'T'): 4, ('R', 'V'): 4, ('R', 'W'): 4, ('R', 'Y'): 4, ('S', 'A'): 3, ('S', 'C'): 4, ('S', 'D'): 4, ('S', 'E'): 4, ('S', 'F'): 4, ('S', 'G'): 4, ('S', 'H'): 4, ('S', 'I'): 4, ('S', 'K'): 4, ('S', 'L'): 4, ('S', 'M'): 4, ('S', 'N'): 3, ('S', 'P'): 4, ('S', 'Q'): 4, ('S', 'R'): 4, ('S', 'S'): 0, ('S', 'T'): 3, ('S', 'V'): 4, ('S', 'W'): 4, ('S', 'Y'): 4, ('T', 'A'): 4, ('T', 'C'): 4, ('T', 'D'): 4, ('T', 'E'): 4, ('T', 'F'): 4, ('T', 'G'): 4, ('T', 'H'): 4, ('T', 'I'): 4, ('T', 'K'): 4, ('T', 'L'): 4, ('T', 'M'): 4, ('T', 'N'): 4, ('T', 'P'): 4, ('T', 'Q'): 4, ('T', 'R'): 4, ('T', 'S'): 3, ('T', 'T'): 0, ('T', 'V'): 4, ('T', 'W'): 4, ('T', 'Y'): 4, ('V', 'A'): 4, ('V', 'C'): 4, ('V', 'D'): 4, ('V', 'E'): 4, ('V', 'F'): 4, ('V', 'G'): 4, ('V', 'H'): 4, ('V', 'I'): 1, ('V', 'K'): 4, ('V', 'L'): 3, ('V', 'M'): 3, ('V', 'N'): 4, ('V', 'P'): 4, ('V', 'Q'): 4, ('V', 'R'): 4, ('V', 'S'): 4, ('V', 'T'): 4, ('V', 'V'): 0, ('V', 'W'): 4, ('V', 'Y'): 4, ('W', 'A'): 4, ('W', 'C'): 4, ('W', 'D'): 4, ('W', 'E'): 4, ('W', 'F'): 3, ('W', 'G'): 4, ('W', 'H'): 4, ('W', 'I'): 4, ('W', 'K'): 4, ('W', 'L'): 4, ('W', 'M'): 4, ('W', 'N'): 4, ('W', 'P'): 4, ('W', 'Q'): 4, ('W', 'R'): 4, ('W', 'S'): 4, ('W', 'T'): 4, ('W', 'V'): 4, ('W', 'W'): 0, ('W', 'Y'): 2, ('Y', 'A'): 4, ('Y', 'C'): 4, ('Y', 'D'): 4, ('Y', 'E'): 4, ('Y', 'F'): 1, ('Y', 'G'): 4, ('Y', 'H'): 2, ('Y', 'I'): 4, ('Y', 'K'): 4, ('Y', 'L'): 4, ('Y', 'M'): 4, ('Y', 'N'): 4, ('Y', 'P'): 4, ('Y', 'Q'): 4, ('Y', 'R'): 4, ('Y', 'S'): 4, ('Y', 'T'): 4, ('Y', 'V'): 4, ('Y', 'W'): 2, ('Y', 'Y'): 0}
blosum62_distance_matrix = {('A', 'A'): 0, ('A', 'C'): 4, ('A', 'D'): 4, ('A', 'E'): 4, ('A', 'F'): 4, ('A', 'G'): 4, ('A', 'H'): 4, ('A', 'I'): 4, ('A', 'K'): 4, ('A', 'L'): 4, ('A', 'M'): 4, ('A', 'N'): 4, ('A', 'P'): 4, ('A', 'Q'): 4, ('A', 'R'): 4, ('A', 'S'): 3, ('A', 'T'): 4, ('A', 'V'): 4, ('A', 'W'): 4, ('A', 'Y'): 4, ('C', 'A'): 4, ('C', 'C'): 0, ('C', 'D'): 4, ('C', 'E'): 4, ('C', 'F'): 4, ('C', 'G'): 4, ('C', 'H'): 4, ('C', 'I'): 4, ('C', 'K'): 4, ('C', 'L'): 4, ('C', 'M'): 4, ('C', 'N'): 4, ('C', 'P'): 4, ('C', 'Q'): 4, ('C', 'R'): 4, ('C', 'S'): 4, ('C', 'T'): 4, ('C', 'V'): 4, ('C', 'W'): 4, ('C', 'Y'): 4, ('D', 'A'): 4, ('D', 'C'): 4, ('D', 'D'): 0, ('D', 'E'): 2, ('D', 'F'): 4, ('D', 'G'): 4, ('D', 'H'): 4, ('D', 'I'): 4, ('D', 'K'): 4, ('D', 'L'): 4, ('D', 'M'): 4, ('D', 'N'): 3, ('D', 'P'): 4, ('D', 'Q'): 4, ('D', 'R'): 4, ('D', 'S'): 4, ('D', 'T'): 4, ('D', 'V'): 4, ('D', 'W'): 4, ('D', 'Y'): 4, ('E', 'A'): 4, ('E', 'C'): 4, ('E', 'D'): 2, ('E', 'E'): 0, ('E', 'F'): 4, ('E', 'G'): 4, ('E', 'H'): 4, ('E', 'I'): 4, ('E', 'K'): 3, ('E', 'L'): 4, ('E', 'M'): 4, ('E', 'N'): 4, ('E', 'P'): 4, ('E', 'Q'): 2, ('E', 'R'): 4, ('E', 'S'): 4, ('E', 'T'): 4, ('E', 'V'): 4, ('E', 'W'): 4, ('E', 'Y'): 4, ('F', 'A'): 4, ('F', 'C'): 4, ('F', 'D'): 4, ('F', 'E'): 4, ('F', 'F'): 0, ('F', 'G'): 4, ('F', 'H'): 4, ('F', 'I'): 4, ('F', 'K'): 4, ('F', 'L'): 4, ('F', 'M'): 4, ('F', 'N'): 4, ('F', 'P'): 4, ('F', 'Q'): 4, ('F', 'R'): 4, ('F', 'S'): 4, ('F', 'T'): 4, ('F', 'V'): 4, ('F', 'W'): 3, ('F', 'Y'): 1, ('G', 'A'): 4, ('G', 'C'): 4, ('G', 'D'): 4, ('G', 'E'): 4, ('G', 'F'): 4, ('G', 'G'): 0, ('G', 'H'): 4, ('G', 'I'): 4, ('G', 'K'): 4, ('G', 'L'): 4, ('G', 'M'): 4, ('G', 'N'): 4, ('G', 'P'): 4, ('G', 'Q'): 4, ('G', 'R'): 4, ('G', 'S'): 4, ('G', 'T'): 4, ('G', 'V'): 4, ('G', 'W'): 4, ('G', 'Y'): 4, ('H', 'A'): 4, ('H', 'C'): 4, ('H', 'D'): 4, ('H', 'E'): 4, ('H', 'F'): 4, ('H', 'G'): 4, ('H', 'H'): 0, ('H', 'I'): 4, ('H', 'K'): 4, ('H', 'L'): 4, ('H', 'M'): 4, ('H', 'N'): 3, ('H', 'P'): 4, ('H', 'Q'): 4, ('H', 'R'): 4, ('H', 'S'): 4, ('H', 'T'): 4, ('H', 'V'): 4, ('H', 'W'): 4, ('H', 'Y'): 2, ('I', 'A'): 4, ('I', 'C'): 4, ('I', 'D'): 4, ('I', 'E'): 4, ('I', 'F'): 4, ('I', 'G'): 4, ('I', 'H'): 4, ('I', 'I'): 0, ('I', 'K'): 4, ('I', 'L'): 2, ('I', 'M'): 3, ('I', 'N'): 4, ('I', 'P'): 4, ('I', 'Q'): 4, ('I', 'R'): 4, ('I', 'S'): 4, ('I', 'T'): 4, ('I', 'V'): 1, ('I', 'W'): 4, ('I', 'Y'): 4, ('K', 'A'): 4, ('K', 'C'): 4, ('K', 'D'): 4, ('K', 'E'): 3, ('K', 'F'): 4, ('K', 'G'): 4, ('K', 'H'): 4, ('K', 'I'): 4, ('K', 'K'): 0, ('K', 'L'): 4, ('K', 'M'): 4, ('K', 'N'): 4, ('K', 'P'): 4, ('K', 'Q'): 3, ('K', 'R'): 2, ('K', 'S'): 4, ('K', 'T'): 4, ('K', 'V'): 4, ('K', 'W'): 4, ('K', 'Y'): 4, ('L', 'A'): 4, ('L', 'C'): 4, ('L', 'D'): 4, ('L', 'E'): 4, ('L', 'F'): 4, ('L', 'G'): 4, ('L', 'H'): 4, ('L', 'I'): 2, ('L', 'K'): 4, ('L', 'L'): 0, ('L', 'M'): 2, ('L', 'N'): 4, ('L', 'P'): 4, ('L', 'Q'): 4, ('L', 'R'): 4, ('L', 'S'): 4, ('L', 'T'): 4, ('L', 'V'): 3, ('L', 'W'): 4, ('L', 'Y'): 4, ('M', 'A'): 4, ('M', 'C'): 4, ('M', 'D'): 4, ('M', 'E'): 4, ('M', 'F'): 4, ('M', 'G'): 4, ('M', 'H'): 4, ('M', 'I'): 3, ('M', 'K'): 4, ('M', 'L'): 2, ('M', 'M'): 0, ('M', 'N'): 4, ('M', 'P'): 4, ('M', 'Q'): 4, ('M', 'R'): 4, ('M', 'S'): 4, ('M', 'T'): 4, ('M', 'V'): 3, ('M', 'W'): 4, ('M', 'Y'): 4, ('N', 'A'): 4, ('N', 'C'): 4, ('N', 'D'): 3, ('N', 'E'): 4, ('N', 'F'): 4, ('N', 'G'): 4, ('N', 'H'): 3, ('N', 'I'): 4, ('N', 'K'): 4, ('N', 'L'): 4, ('N', 'M'): 4, ('N', 'N'): 0, ('N', 'P'): 4, ('N', 'Q'): 4, ('N', 'R'): 4, ('N', 'S'): 3, ('N', 'T'): 4, ('N', 'V'): 4, ('N', 'W'): 4, ('N', 'Y'): 4, ('P', 'A'): 4, ('P', 'C'): 4, ('P', 'D'): 4, ('P', 'E'): 4, ('P', 'F'): 4, ('P', 'G'): 4, ('P', 'H'): 4, ('P', 'I'): 4, ('P', 'K'): 4, ('P', 'L'): 4, ('P', 'M'): 4, ('P', 'N'): 4, ('P', 'P'): 0, ('P', 'Q'): 4, ('P', 'R'): 4, ('P', 'S'): 4, ('P', 'T'): 4, ('P', 'V'): 4, ('P', 'W'): 4, ('P', 'Y'): 4, ('Q', 'A'): 4, ('Q', 'C'): 4, ('Q', 'D'): 4, ('Q', 'E'): 2, ('Q', 'F'): 4, ('Q', 'G'): 4, ('Q', 'H'): 4, ('Q', 'I'): 4, ('Q', 'K'): 3, ('Q', 'L'): 4, ('Q', 'M'): 4, ('Q', 'N'): 4, ('Q', 'P'): 4, ('Q', 'Q'): 0, ('Q', 'R'): 3, ('Q', 'S'): 4, ('Q', 'T'): 4, ('Q', 'V'): 4, ('Q', 'W'): 4, ('Q', 'Y'): 4, ('R', 'A'): 4, ('R', 'C'): 4, ('R', 'D'): 4, ('R', 'E'): 4, ('R', 'F'): 4, ('R', 'G'): 4, ('R', 'H'): 4, ('R', 'I'): 4, ('R', 'K'): 2, ('R', 'L'): 4, ('R', 'M'): 4, ('R', 'N'): 4, ('R', 'P'): 4, ('R', 'Q'): 3, ('R', 'R'): 0, ('R', 'S'): 4, ('R', 'T'): 4, ('R', 'V'): 4, ('R', 'W'): 4, ('R', 'Y'): 4, ('S', 'A'): 3, ('S', 'C'): 4, ('S', 'D'): 4, ('S', 'E'): 4, ('S', 'F'): 4, ('S', 'G'): 4, ('S', 'H'): 4, ('S', 'I'): 4, ('S', 'K'): 4, ('S', 'L'): 4, ('S', 'M'): 4, ('S', 'N'): 3, ('S', 'P'): 4, ('S', 'Q'): 4, ('S', 'R'): 4, ('S', 'S'): 0, ('S', 'T'): 3, ('S', 'V'): 4, ('S', 'W'): 4, ('S', 'Y'): 4, ('T', 'A'): 4, ('T', 'C'): 4, ('T', 'D'): 4, ('T', 'E'): 4, ('T', 'F'): 4, ('T', 'G'): 4, ('T', 'H'): 4, ('T', 'I'): 4, ('T', 'K'): 4, ('T', 'L'): 4, ('T', 'M'): 4, ('T', 'N'): 4, ('T', 'P'): 4, ('T', 'Q'): 4, ('T', 'R'): 4, ('T', 'S'): 3, ('T', 'T'): 0, ('T', 'V'): 4, ('T', 'W'): 4, ('T', 'Y'): 4, ('V', 'A'): 4, ('V', 'C'): 4, ('V', 'D'): 4, ('V', 'E'): 4, ('V', 'F'): 4, ('V', 'G'): 4, ('V', 'H'): 4, ('V', 'I'): 1, ('V', 'K'): 4, ('V', 'L'): 3, ('V', 'M'): 3, ('V', 'N'): 4, ('V', 'P'): 4, ('V', 'Q'): 4, ('V', 'R'): 4, ('V', 'S'): 4, ('V', 'T'): 4, ('V', 'V'): 0, ('V', 'W'): 4, ('V', 'Y'): 4, ('W', 'A'): 4, ('W', 'C'): 4, ('W', 'D'): 4, ('W', 'E'): 4, ('W', 'F'): 3, ('W', 'G'): 4, ('W', 'H'): 4, ('W', 'I'): 4, ('W', 'K'): 4, ('W', 'L'): 4, ('W', 'M'): 4, ('W', 'N'): 4, ('W', 'P'): 4, ('W', 'Q'): 4, ('W', 'R'): 4, ('W', 'S'): 4, ('W', 'T'): 4, ('W', 'V'): 4, ('W', 'W'): 0, ('W', 'Y'): 2, ('Y', 'A'): 4, ('Y', 'C'): 4, ('Y', 'D'): 4, ('Y', 'E'): 4, ('Y', 'F'): 1, ('Y', 'G'): 4, ('Y', 'H'): 2, ('Y', 'I'): 4, ('Y', 'K'): 4, ('Y', 'L'): 4, ('Y', 'M'): 4, ('Y', 'N'): 4, ('Y', 'P'): 4, ('Y', 'Q'): 4, ('Y', 'R'): 4, ('Y', 'S'): 4, ('Y', 'T'): 4, ('Y', 'V'): 4, ('Y', 'W'): 2, ('Y', 'Y'): 0}
tcrblosum_alpha_distance_matrix = {('A', 'A'): 0, ('A', 'R'): 4, ('A', 'N'): 4, ('A', 'D'): 4, ('A', 'C'): 4, ('A', 'Q'): 4, ('A', 'E'): 4, ('A', 'G'): 4, ('A', 'H'): 4, ('A', 'I'): 4, ('A', 'L'): 4, ('A', 'K'): 4, ('A', 'M'): 4, ('A', 'F'): 4, ('A', 'P'): 4, ('A', 'S'): 4, ('A', 'T'): 4, ('A', 'W'): 4, ('A', 'Y'): 4, ('A', 'V'): 4, ('R', 'A'): 4, ('R', 'R'): 0, ('R', 'N'): 4, ('R', 'D'): 4, ('R', 'C'): 3, ('R', 'Q'): 4, ('R', 'E'): 4, ('R', 'G'): 4, ('R', 'H'): 4, ('R', 'I'): 4, ('R', 'L'): 4, ('R', 'K'): 4, ('R', 'M'): 4, ('R', 'F'): 4, ('R', 'P'): 4, ('R', 'S'): 4, ('R', 'T'): 4, ('R', 'W'): 4, ('R', 'Y'): 4, ('R', 'V'): 4, ('N', 'A'): 4, ('N', 'R'): 4, ('N', 'N'): 0, ('N', 'D'): 4, ('N', 'C'): 4, ('N', 'Q'): 4, ('N', 'E'): 4, ('N', 'G'): 4, ('N', 'H'): 4, ('N', 'I'): 4, ('N', 'L'): 4, ('N', 'K'): 3, ('N', 'M'): 4, ('N', 'F'): 4, ('N', 'P'): 4, ('N', 'S'): 4, ('N', 'T'): 4, ('N', 'W'): 4, ('N', 'Y'): 4, ('N', 'V'): 4, ('D', 'A'): 4, ('D', 'R'): 4, ('D', 'N'): 4, ('D', 'D'): 0, ('D', 'C'): 4, ('D', 'Q'): 4, ('D', 'E'): 4, ('D', 'G'): 4, ('D', 'H'): 4, ('D', 'I'): 4, ('D', 'L'): 4, ('D', 'K'): 4, ('D', 'M'): 4, ('D', 'F'): 4, ('D', 'P'): 4, ('D', 'S'): 4, ('D', 'T'): 4, ('D', 'W'): 4, ('D', 'Y'): 4, ('D', 'V'): 4, ('C', 'A'): 4, ('C', 'R'): 3, ('C', 'N'): 4, ('C', 'D'): 4, ('C', 'C'): 0, ('C', 'Q'): 4, ('C', 'E'): 4, ('C', 'G'): 4, ('C', 'H'): 4, ('C', 'I'): 4, ('C', 'L'): 4, ('C', 'K'): 4, ('C', 'M'): 4, ('C', 'F'): 4, ('C', 'P'): 4, ('C', 'S'): 4, ('C', 'T'): 4, ('C', 'W'): 4, ('C', 'Y'): 4, ('C', 'V'): 4, ('Q', 'A'): 4, ('Q', 'R'): 4, ('Q', 'N'): 4, ('Q', 'D'): 4, ('Q', 'C'): 4, ('Q', 'Q'): 0, ('Q', 'E'): 4, ('Q', 'G'): 4, ('Q', 'H'): 4, ('Q', 'I'): 4, ('Q', 'L'): 4, ('Q', 'K'): 3, ('Q', 'M'): 4, ('Q', 'F'): 4, ('Q', 'P'): 4, ('Q', 'S'): 4, ('Q', 'T'): 4, ('Q', 'W'): 4, ('Q', 'Y'): 4, ('Q', 'V'): 4, ('E', 'A'): 4, ('E', 'R'): 4, ('E', 'N'): 4, ('E', 'D'): 4, ('E', 'C'): 4, ('E', 'Q'): 4, ('E', 'E'): 0, ('E', 'G'): 4, ('E', 'H'): 3, ('E', 'I'): 4, ('E', 'L'): 4, ('E', 'K'): 4, ('E', 'M'): 4, ('E', 'F'): 4, ('E', 'P'): 4, ('E', 'S'): 4, ('E', 'T'): 4, ('E', 'W'): 4, ('E', 'Y'): 4, ('E', 'V'): 4, ('G', 'A'): 4, ('G', 'R'): 4, ('G', 'N'): 4, ('G', 'D'): 4, ('G', 'C'): 4, ('G', 'Q'): 4, ('G', 'E'): 4, ('G', 'G'): 0, ('G', 'H'): 4, ('G', 'I'): 4, ('G', 'L'): 4, ('G', 'K'): 4, ('G', 'M'): 4, ('G', 'F'): 4, ('G', 'P'): 4, ('G', 'S'): 4, ('G', 'T'): 4, ('G', 'W'): 4, ('G', 'Y'): 4, ('G', 'V'): 4, ('H', 'A'): 4, ('H', 'R'): 4, ('H', 'N'): 4, ('H', 'D'): 4, ('H', 'C'): 4, ('H', 'Q'): 4, ('H', 'E'): 3, ('H', 'G'): 4, ('H', 'H'): 0, ('H', 'I'): 4, ('H', 'L'): 4, ('H', 'K'): 4, ('H', 'M'): 4, ('H', 'F'): 4, ('H', 'P'): 3, ('H', 'S'): 4, ('H', 'T'): 4, ('H', 'W'): 3, ('H', 'Y'): 4, ('H', 'V'): 4, ('I', 'A'): 4, ('I', 'R'): 4, ('I', 'N'): 4, ('I', 'D'): 4, ('I', 'C'): 4, ('I', 'Q'): 4, ('I', 'E'): 4, ('I', 'G'): 4, ('I', 'H'): 4, ('I', 'I'): 0, ('I', 'L'): 4, ('I', 'K'): 4, ('I', 'M'): 4, ('I', 'F'): 4, ('I', 'P'): 4, ('I', 'S'): 4, ('I', 'T'): 3, ('I', 'W'): 4, ('I', 'Y'): 4, ('I', 'V'): 4, ('L', 'A'): 4, ('L', 'R'): 4, ('L', 'N'): 4, ('L', 'D'): 4, ('L', 'C'): 4, ('L', 'Q'): 4, ('L', 'E'): 4, ('L', 'G'): 4, ('L', 'H'): 4, ('L', 'I'): 4, ('L', 'L'): 0, ('L', 'K'): 4, ('L', 'M'): 4, ('L', 'F'): 3, ('L', 'P'): 4, ('L', 'S'): 4, ('L', 'T'): 4, ('L', 'W'): 4, ('L', 'Y'): 4, ('L', 'V'): 4, ('K', 'A'): 4, ('K', 'R'): 4, ('K', 'N'): 3, ('K', 'D'): 4, ('K', 'C'): 4, ('K', 'Q'): 3, ('K', 'E'): 4, ('K', 'G'): 4, ('K', 'H'): 4, ('K', 'I'): 4, ('K', 'L'): 4, ('K', 'K'): 0, ('K', 'M'): 4, ('K', 'F'): 4, ('K', 'P'): 4, ('K', 'S'): 4, ('K', 'T'): 4, ('K', 'W'): 4, ('K', 'Y'): 4, ('K', 'V'): 4, ('M', 'A'): 4, ('M', 'R'): 4, ('M', 'N'): 4, ('M', 'D'): 4, ('M', 'C'): 4, ('M', 'Q'): 4, ('M', 'E'): 4, ('M', 'G'): 4, ('M', 'H'): 4, ('M', 'I'): 4, ('M', 'L'): 4, ('M', 'K'): 4, ('M', 'M'): 0, ('M', 'F'): 4, ('M', 'P'): 4, ('M', 'S'): 4, ('M', 'T'): 4, ('M', 'W'): 4, ('M', 'Y'): 4, ('M', 'V'): 4, ('F', 'A'): 4, ('F', 'R'): 4, ('F', 'N'): 4, ('F', 'D'): 4, ('F', 'C'): 4, ('F', 'Q'): 4, ('F', 'E'): 4, ('F', 'G'): 4, ('F', 'H'): 4, ('F', 'I'): 4, ('F', 'L'): 3, ('F', 'K'): 4, ('F', 'M'): 4, ('F', 'F'): 0, ('F', 'P'): 4, ('F', 'S'): 4, ('F', 'T'): 4, ('F', 'W'): 4, ('F', 'Y'): 4, ('F', 'V'): 4, ('P', 'A'): 4, ('P', 'R'): 4, ('P', 'N'): 4, ('P', 'D'): 4, ('P', 'C'): 4, ('P', 'Q'): 4, ('P', 'E'): 4, ('P', 'G'): 4, ('P', 'H'): 3, ('P', 'I'): 4, ('P', 'L'): 4, ('P', 'K'): 4, ('P', 'M'): 4, ('P', 'F'): 4, ('P', 'P'): 0, ('P', 'S'): 4, ('P', 'T'): 4, ('P', 'W'): 4, ('P', 'Y'): 4, ('P', 'V'): 4, ('S', 'A'): 4, ('S', 'R'): 4, ('S', 'N'): 4, ('S', 'D'): 4, ('S', 'C'): 4, ('S', 'Q'): 4, ('S', 'E'): 4, ('S', 'G'): 4, ('S', 'H'): 4, ('S', 'I'): 4, ('S', 'L'): 4, ('S', 'K'): 4, ('S', 'M'): 4, ('S', 'F'): 4, ('S', 'P'): 4, ('S', 'S'): 0, ('S', 'T'): 4, ('S', 'W'): 4, ('S', 'Y'): 4, ('S', 'V'): 4, ('T', 'A'): 4, ('T', 'R'): 4, ('T', 'N'): 4, ('T', 'D'): 4, ('T', 'C'): 4, ('T', 'Q'): 4, ('T', 'E'): 4, ('T', 'G'): 4, ('T', 'H'): 4, ('T', 'I'): 3, ('T', 'L'): 4, ('T', 'K'): 4, ('T', 'M'): 4, ('T', 'F'): 4, ('T', 'P'): 4, ('T', 'S'): 4, ('T', 'T'): 0, ('T', 'W'): 4, ('T', 'Y'): 4, ('T', 'V'): 4, ('W', 'A'): 4, ('W', 'R'): 4, ('W', 'N'): 4, ('W', 'D'): 4, ('W', 'C'): 4, ('W', 'Q'): 4, ('W', 'E'): 4, ('W', 'G'): 4, ('W', 'H'): 3, ('W', 'I'): 4, ('W', 'L'): 4, ('W', 'K'): 4, ('W', 'M'): 4, ('W', 'F'): 4, ('W', 'P'): 4, ('W', 'S'): 4, ('W', 'T'): 4, ('W', 'W'): 0, ('W', 'Y'): 4, ('W', 'V'): 4, ('Y', 'A'): 4, ('Y', 'R'): 4, ('Y', 'N'): 4, ('Y', 'D'): 4, ('Y', 'C'): 4, ('Y', 'Q'): 4, ('Y', 'E'): 4, ('Y', 'G'): 4, ('Y', 'H'): 4, ('Y', 'I'): 4, ('Y', 'L'): 4, ('Y', 'K'): 4, ('Y', 'M'): 4, ('Y', 'F'): 4, ('Y', 'P'): 4, ('Y', 'S'): 4, ('Y', 'T'): 4, ('Y', 'W'): 4, ('Y', 'Y'): 0, ('Y', 'V'): 4, ('V', 'A'): 4, ('V', 'R'): 4, ('V', 'N'): 4, ('V', 'D'): 4, ('V', 'C'): 4, ('V', 'Q'): 4, ('V', 'E'): 4, ('V', 'G'): 4, ('V', 'H'): 4, ('V', 'I'): 4, ('V', 'L'): 4, ('V', 'K'): 4, ('V', 'M'): 4, ('V', 'F'): 4, ('V', 'P'): 4, ('V', 'S'): 4, ('V', 'T'): 4, ('V', 'W'): 4, ('V', 'Y'): 4, ('V', 'V'): 0}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you make any modifications to the original TCRblosum matrix? e.g. the cap at 4?

Copy link
Copy Markdown
Collaborator Author

@felixpetschko felixpetschko Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, you always have this conversion from the substitution matrix (blosum62, tcrblosum) to a distance matrix.
In the tcrblosum paper , they just menttion "Firstly, we transformed the tcrBLOSUM similarity matrix into a distance matrix according to the rules of TCRdist [12]".

In the original TCRdist paper they write:
"The mismatch distance is defined based on the BLOSUM62 (ref. 37) substitution matrix as follows: distance (a, a)=0; distance (a, b)=min (4, 4-BLOSUM62 (a, b)), where 4 is 1 unit greater than the most favourable BLOSUM62 score for a mismatch, and a and b are amino acids".

Now there would be two options:

  1. also use the constant cap 4 like in the original tcrdist paper
  2. use a cap that is one unit greater than the most favourable score

However, the most favourable score would be 1 for tcrblosum_alpha and 2 for tcrblosum_beta which would result in a quite low cap (see matrices in notebook). Therefore I just went with the fixed cap of 4 for the transformation formula from substitution matrix to distance matrix of the original tcrdist implementation:
distance(a, a) = 0 and distance(a, b) = min(4, 4 - score)

Unfortunately, I wasn't able to find out how they did it exactly in the tcrblosum paper. Another thing to consider is, that the distance values for alpha and beta chain might be compared later during the clonotype clustering. Therefore having two different caps might cause problems.

What do you think would be the best option?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dear @apostovskaya,

we are working on integrating your tcrBLOSUM matrix into scirpy, which contains a reimplementation of the TCRdist algorithm.

We are unsure how you intended the matrix to be used with TCRdist. Could you please clarify what you mean with

Firstly, we transformed the tcrBLOSUM similarity matrix into a distance matrix according to the rules of TCRdist [12]

in your paper and how you would suggest to set the cap?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The mismatch distance is defined based on the BLOSUM62 (ref. 37) substitution matrix as follows: distance (a, a)=0; distance (a, b)=min (4, 4-BLOSUM62 (a, b)), where 4 is 1 unit greater than the most favourable BLOSUM62 score for a mismatch, and a and b are amino acids".

tbh, I never understood this part about TCRdist. It skews the matrix quite a bit, basically assigning a distance of 4 to all mismatches that have a negative distance in BLOSUM62 (even if it's just -1). So the only way to get a score $1 \leq s \leq 3$ is if you have one of the rare pairs with a positive mismatch score according to BLOSUM62.

Having a cap of 2 with TCRblosum doesn't make sense to me, because then the strong negative effect of mismatches with C would be completely gone which defeats the purpose of TCRblosum. Using 4 is just as arbitrary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

run-gpu-ci runs GPU CI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants