Add tcrblosum support to TCRdist#685
Conversation
for more information, see https://pre-commit.ci
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #685 +/- ##
==========================================
- Coverage 19.31% 19.20% -0.12%
==========================================
Files 51 51
Lines 4633 4645 +12
==========================================
- Hits 895 892 -3
- Misses 3738 3753 +15
🚀 New features to boost your workflow:
|
grst
left a comment
There was a problem hiding this comment.
Implementation-wise this looks great!
What's still missing is
- changelog update
- Documentation-update of the user-facing (pp.ir_dist) method. Probably best to add a new metric
tcrblosumortcrdist_tcrblosum. - Reference to the TCRblosum paper in the documentation
- Maybe tutorial update?
| # fmt: off | ||
| tcr_dict_distance_matrix = {('A', 'A'): 0, ('A', 'C'): 4, ('A', 'D'): 4, ('A', 'E'): 4, ('A', 'F'): 4, ('A', 'G'): 4, ('A', 'H'): 4, ('A', 'I'): 4, ('A', 'K'): 4, ('A', 'L'): 4, ('A', 'M'): 4, ('A', 'N'): 4, ('A', 'P'): 4, ('A', 'Q'): 4, ('A', 'R'): 4, ('A', 'S'): 3, ('A', 'T'): 4, ('A', 'V'): 4, ('A', 'W'): 4, ('A', 'Y'): 4, ('C', 'A'): 4, ('C', 'C'): 0, ('C', 'D'): 4, ('C', 'E'): 4, ('C', 'F'): 4, ('C', 'G'): 4, ('C', 'H'): 4, ('C', 'I'): 4, ('C', 'K'): 4, ('C', 'L'): 4, ('C', 'M'): 4, ('C', 'N'): 4, ('C', 'P'): 4, ('C', 'Q'): 4, ('C', 'R'): 4, ('C', 'S'): 4, ('C', 'T'): 4, ('C', 'V'): 4, ('C', 'W'): 4, ('C', 'Y'): 4, ('D', 'A'): 4, ('D', 'C'): 4, ('D', 'D'): 0, ('D', 'E'): 2, ('D', 'F'): 4, ('D', 'G'): 4, ('D', 'H'): 4, ('D', 'I'): 4, ('D', 'K'): 4, ('D', 'L'): 4, ('D', 'M'): 4, ('D', 'N'): 3, ('D', 'P'): 4, ('D', 'Q'): 4, ('D', 'R'): 4, ('D', 'S'): 4, ('D', 'T'): 4, ('D', 'V'): 4, ('D', 'W'): 4, ('D', 'Y'): 4, ('E', 'A'): 4, ('E', 'C'): 4, ('E', 'D'): 2, ('E', 'E'): 0, ('E', 'F'): 4, ('E', 'G'): 4, ('E', 'H'): 4, ('E', 'I'): 4, ('E', 'K'): 3, ('E', 'L'): 4, ('E', 'M'): 4, ('E', 'N'): 4, ('E', 'P'): 4, ('E', 'Q'): 2, ('E', 'R'): 4, ('E', 'S'): 4, ('E', 'T'): 4, ('E', 'V'): 4, ('E', 'W'): 4, ('E', 'Y'): 4, ('F', 'A'): 4, ('F', 'C'): 4, ('F', 'D'): 4, ('F', 'E'): 4, ('F', 'F'): 0, ('F', 'G'): 4, ('F', 'H'): 4, ('F', 'I'): 4, ('F', 'K'): 4, ('F', 'L'): 4, ('F', 'M'): 4, ('F', 'N'): 4, ('F', 'P'): 4, ('F', 'Q'): 4, ('F', 'R'): 4, ('F', 'S'): 4, ('F', 'T'): 4, ('F', 'V'): 4, ('F', 'W'): 3, ('F', 'Y'): 1, ('G', 'A'): 4, ('G', 'C'): 4, ('G', 'D'): 4, ('G', 'E'): 4, ('G', 'F'): 4, ('G', 'G'): 0, ('G', 'H'): 4, ('G', 'I'): 4, ('G', 'K'): 4, ('G', 'L'): 4, ('G', 'M'): 4, ('G', 'N'): 4, ('G', 'P'): 4, ('G', 'Q'): 4, ('G', 'R'): 4, ('G', 'S'): 4, ('G', 'T'): 4, ('G', 'V'): 4, ('G', 'W'): 4, ('G', 'Y'): 4, ('H', 'A'): 4, ('H', 'C'): 4, ('H', 'D'): 4, ('H', 'E'): 4, ('H', 'F'): 4, ('H', 'G'): 4, ('H', 'H'): 0, ('H', 'I'): 4, ('H', 'K'): 4, ('H', 'L'): 4, ('H', 'M'): 4, ('H', 'N'): 3, ('H', 'P'): 4, ('H', 'Q'): 4, ('H', 'R'): 4, ('H', 'S'): 4, ('H', 'T'): 4, ('H', 'V'): 4, ('H', 'W'): 4, ('H', 'Y'): 2, ('I', 'A'): 4, ('I', 'C'): 4, ('I', 'D'): 4, ('I', 'E'): 4, ('I', 'F'): 4, ('I', 'G'): 4, ('I', 'H'): 4, ('I', 'I'): 0, ('I', 'K'): 4, ('I', 'L'): 2, ('I', 'M'): 3, ('I', 'N'): 4, ('I', 'P'): 4, ('I', 'Q'): 4, ('I', 'R'): 4, ('I', 'S'): 4, ('I', 'T'): 4, ('I', 'V'): 1, ('I', 'W'): 4, ('I', 'Y'): 4, ('K', 'A'): 4, ('K', 'C'): 4, ('K', 'D'): 4, ('K', 'E'): 3, ('K', 'F'): 4, ('K', 'G'): 4, ('K', 'H'): 4, ('K', 'I'): 4, ('K', 'K'): 0, ('K', 'L'): 4, ('K', 'M'): 4, ('K', 'N'): 4, ('K', 'P'): 4, ('K', 'Q'): 3, ('K', 'R'): 2, ('K', 'S'): 4, ('K', 'T'): 4, ('K', 'V'): 4, ('K', 'W'): 4, ('K', 'Y'): 4, ('L', 'A'): 4, ('L', 'C'): 4, ('L', 'D'): 4, ('L', 'E'): 4, ('L', 'F'): 4, ('L', 'G'): 4, ('L', 'H'): 4, ('L', 'I'): 2, ('L', 'K'): 4, ('L', 'L'): 0, ('L', 'M'): 2, ('L', 'N'): 4, ('L', 'P'): 4, ('L', 'Q'): 4, ('L', 'R'): 4, ('L', 'S'): 4, ('L', 'T'): 4, ('L', 'V'): 3, ('L', 'W'): 4, ('L', 'Y'): 4, ('M', 'A'): 4, ('M', 'C'): 4, ('M', 'D'): 4, ('M', 'E'): 4, ('M', 'F'): 4, ('M', 'G'): 4, ('M', 'H'): 4, ('M', 'I'): 3, ('M', 'K'): 4, ('M', 'L'): 2, ('M', 'M'): 0, ('M', 'N'): 4, ('M', 'P'): 4, ('M', 'Q'): 4, ('M', 'R'): 4, ('M', 'S'): 4, ('M', 'T'): 4, ('M', 'V'): 3, ('M', 'W'): 4, ('M', 'Y'): 4, ('N', 'A'): 4, ('N', 'C'): 4, ('N', 'D'): 3, ('N', 'E'): 4, ('N', 'F'): 4, ('N', 'G'): 4, ('N', 'H'): 3, ('N', 'I'): 4, ('N', 'K'): 4, ('N', 'L'): 4, ('N', 'M'): 4, ('N', 'N'): 0, ('N', 'P'): 4, ('N', 'Q'): 4, ('N', 'R'): 4, ('N', 'S'): 3, ('N', 'T'): 4, ('N', 'V'): 4, ('N', 'W'): 4, ('N', 'Y'): 4, ('P', 'A'): 4, ('P', 'C'): 4, ('P', 'D'): 4, ('P', 'E'): 4, ('P', 'F'): 4, ('P', 'G'): 4, ('P', 'H'): 4, ('P', 'I'): 4, ('P', 'K'): 4, ('P', 'L'): 4, ('P', 'M'): 4, ('P', 'N'): 4, ('P', 'P'): 0, ('P', 'Q'): 4, ('P', 'R'): 4, ('P', 'S'): 4, ('P', 'T'): 4, ('P', 'V'): 4, ('P', 'W'): 4, ('P', 'Y'): 4, ('Q', 'A'): 4, ('Q', 'C'): 4, ('Q', 'D'): 4, ('Q', 'E'): 2, ('Q', 'F'): 4, ('Q', 'G'): 4, ('Q', 'H'): 4, ('Q', 'I'): 4, ('Q', 'K'): 3, ('Q', 'L'): 4, ('Q', 'M'): 4, ('Q', 'N'): 4, ('Q', 'P'): 4, ('Q', 'Q'): 0, ('Q', 'R'): 3, ('Q', 'S'): 4, ('Q', 'T'): 4, ('Q', 'V'): 4, ('Q', 'W'): 4, ('Q', 'Y'): 4, ('R', 'A'): 4, ('R', 'C'): 4, ('R', 'D'): 4, ('R', 'E'): 4, ('R', 'F'): 4, ('R', 'G'): 4, ('R', 'H'): 4, ('R', 'I'): 4, ('R', 'K'): 2, ('R', 'L'): 4, ('R', 'M'): 4, ('R', 'N'): 4, ('R', 'P'): 4, ('R', 'Q'): 3, ('R', 'R'): 0, ('R', 'S'): 4, ('R', 'T'): 4, ('R', 'V'): 4, ('R', 'W'): 4, ('R', 'Y'): 4, ('S', 'A'): 3, ('S', 'C'): 4, ('S', 'D'): 4, ('S', 'E'): 4, ('S', 'F'): 4, ('S', 'G'): 4, ('S', 'H'): 4, ('S', 'I'): 4, ('S', 'K'): 4, ('S', 'L'): 4, ('S', 'M'): 4, ('S', 'N'): 3, ('S', 'P'): 4, ('S', 'Q'): 4, ('S', 'R'): 4, ('S', 'S'): 0, ('S', 'T'): 3, ('S', 'V'): 4, ('S', 'W'): 4, ('S', 'Y'): 4, ('T', 'A'): 4, ('T', 'C'): 4, ('T', 'D'): 4, ('T', 'E'): 4, ('T', 'F'): 4, ('T', 'G'): 4, ('T', 'H'): 4, ('T', 'I'): 4, ('T', 'K'): 4, ('T', 'L'): 4, ('T', 'M'): 4, ('T', 'N'): 4, ('T', 'P'): 4, ('T', 'Q'): 4, ('T', 'R'): 4, ('T', 'S'): 3, ('T', 'T'): 0, ('T', 'V'): 4, ('T', 'W'): 4, ('T', 'Y'): 4, ('V', 'A'): 4, ('V', 'C'): 4, ('V', 'D'): 4, ('V', 'E'): 4, ('V', 'F'): 4, ('V', 'G'): 4, ('V', 'H'): 4, ('V', 'I'): 1, ('V', 'K'): 4, ('V', 'L'): 3, ('V', 'M'): 3, ('V', 'N'): 4, ('V', 'P'): 4, ('V', 'Q'): 4, ('V', 'R'): 4, ('V', 'S'): 4, ('V', 'T'): 4, ('V', 'V'): 0, ('V', 'W'): 4, ('V', 'Y'): 4, ('W', 'A'): 4, ('W', 'C'): 4, ('W', 'D'): 4, ('W', 'E'): 4, ('W', 'F'): 3, ('W', 'G'): 4, ('W', 'H'): 4, ('W', 'I'): 4, ('W', 'K'): 4, ('W', 'L'): 4, ('W', 'M'): 4, ('W', 'N'): 4, ('W', 'P'): 4, ('W', 'Q'): 4, ('W', 'R'): 4, ('W', 'S'): 4, ('W', 'T'): 4, ('W', 'V'): 4, ('W', 'W'): 0, ('W', 'Y'): 2, ('Y', 'A'): 4, ('Y', 'C'): 4, ('Y', 'D'): 4, ('Y', 'E'): 4, ('Y', 'F'): 1, ('Y', 'G'): 4, ('Y', 'H'): 2, ('Y', 'I'): 4, ('Y', 'K'): 4, ('Y', 'L'): 4, ('Y', 'M'): 4, ('Y', 'N'): 4, ('Y', 'P'): 4, ('Y', 'Q'): 4, ('Y', 'R'): 4, ('Y', 'S'): 4, ('Y', 'T'): 4, ('Y', 'V'): 4, ('Y', 'W'): 2, ('Y', 'Y'): 0} | ||
| blosum62_distance_matrix = {('A', 'A'): 0, ('A', 'C'): 4, ('A', 'D'): 4, ('A', 'E'): 4, ('A', 'F'): 4, ('A', 'G'): 4, ('A', 'H'): 4, ('A', 'I'): 4, ('A', 'K'): 4, ('A', 'L'): 4, ('A', 'M'): 4, ('A', 'N'): 4, ('A', 'P'): 4, ('A', 'Q'): 4, ('A', 'R'): 4, ('A', 'S'): 3, ('A', 'T'): 4, ('A', 'V'): 4, ('A', 'W'): 4, ('A', 'Y'): 4, ('C', 'A'): 4, ('C', 'C'): 0, ('C', 'D'): 4, ('C', 'E'): 4, ('C', 'F'): 4, ('C', 'G'): 4, ('C', 'H'): 4, ('C', 'I'): 4, ('C', 'K'): 4, ('C', 'L'): 4, ('C', 'M'): 4, ('C', 'N'): 4, ('C', 'P'): 4, ('C', 'Q'): 4, ('C', 'R'): 4, ('C', 'S'): 4, ('C', 'T'): 4, ('C', 'V'): 4, ('C', 'W'): 4, ('C', 'Y'): 4, ('D', 'A'): 4, ('D', 'C'): 4, ('D', 'D'): 0, ('D', 'E'): 2, ('D', 'F'): 4, ('D', 'G'): 4, ('D', 'H'): 4, ('D', 'I'): 4, ('D', 'K'): 4, ('D', 'L'): 4, ('D', 'M'): 4, ('D', 'N'): 3, ('D', 'P'): 4, ('D', 'Q'): 4, ('D', 'R'): 4, ('D', 'S'): 4, ('D', 'T'): 4, ('D', 'V'): 4, ('D', 'W'): 4, ('D', 'Y'): 4, ('E', 'A'): 4, ('E', 'C'): 4, ('E', 'D'): 2, ('E', 'E'): 0, ('E', 'F'): 4, ('E', 'G'): 4, ('E', 'H'): 4, ('E', 'I'): 4, ('E', 'K'): 3, ('E', 'L'): 4, ('E', 'M'): 4, ('E', 'N'): 4, ('E', 'P'): 4, ('E', 'Q'): 2, ('E', 'R'): 4, ('E', 'S'): 4, ('E', 'T'): 4, ('E', 'V'): 4, ('E', 'W'): 4, ('E', 'Y'): 4, ('F', 'A'): 4, ('F', 'C'): 4, ('F', 'D'): 4, ('F', 'E'): 4, ('F', 'F'): 0, ('F', 'G'): 4, ('F', 'H'): 4, ('F', 'I'): 4, ('F', 'K'): 4, ('F', 'L'): 4, ('F', 'M'): 4, ('F', 'N'): 4, ('F', 'P'): 4, ('F', 'Q'): 4, ('F', 'R'): 4, ('F', 'S'): 4, ('F', 'T'): 4, ('F', 'V'): 4, ('F', 'W'): 3, ('F', 'Y'): 1, ('G', 'A'): 4, ('G', 'C'): 4, ('G', 'D'): 4, ('G', 'E'): 4, ('G', 'F'): 4, ('G', 'G'): 0, ('G', 'H'): 4, ('G', 'I'): 4, ('G', 'K'): 4, ('G', 'L'): 4, ('G', 'M'): 4, ('G', 'N'): 4, ('G', 'P'): 4, ('G', 'Q'): 4, ('G', 'R'): 4, ('G', 'S'): 4, ('G', 'T'): 4, ('G', 'V'): 4, ('G', 'W'): 4, ('G', 'Y'): 4, ('H', 'A'): 4, ('H', 'C'): 4, ('H', 'D'): 4, ('H', 'E'): 4, ('H', 'F'): 4, ('H', 'G'): 4, ('H', 'H'): 0, ('H', 'I'): 4, ('H', 'K'): 4, ('H', 'L'): 4, ('H', 'M'): 4, ('H', 'N'): 3, ('H', 'P'): 4, ('H', 'Q'): 4, ('H', 'R'): 4, ('H', 'S'): 4, ('H', 'T'): 4, ('H', 'V'): 4, ('H', 'W'): 4, ('H', 'Y'): 2, ('I', 'A'): 4, ('I', 'C'): 4, ('I', 'D'): 4, ('I', 'E'): 4, ('I', 'F'): 4, ('I', 'G'): 4, ('I', 'H'): 4, ('I', 'I'): 0, ('I', 'K'): 4, ('I', 'L'): 2, ('I', 'M'): 3, ('I', 'N'): 4, ('I', 'P'): 4, ('I', 'Q'): 4, ('I', 'R'): 4, ('I', 'S'): 4, ('I', 'T'): 4, ('I', 'V'): 1, ('I', 'W'): 4, ('I', 'Y'): 4, ('K', 'A'): 4, ('K', 'C'): 4, ('K', 'D'): 4, ('K', 'E'): 3, ('K', 'F'): 4, ('K', 'G'): 4, ('K', 'H'): 4, ('K', 'I'): 4, ('K', 'K'): 0, ('K', 'L'): 4, ('K', 'M'): 4, ('K', 'N'): 4, ('K', 'P'): 4, ('K', 'Q'): 3, ('K', 'R'): 2, ('K', 'S'): 4, ('K', 'T'): 4, ('K', 'V'): 4, ('K', 'W'): 4, ('K', 'Y'): 4, ('L', 'A'): 4, ('L', 'C'): 4, ('L', 'D'): 4, ('L', 'E'): 4, ('L', 'F'): 4, ('L', 'G'): 4, ('L', 'H'): 4, ('L', 'I'): 2, ('L', 'K'): 4, ('L', 'L'): 0, ('L', 'M'): 2, ('L', 'N'): 4, ('L', 'P'): 4, ('L', 'Q'): 4, ('L', 'R'): 4, ('L', 'S'): 4, ('L', 'T'): 4, ('L', 'V'): 3, ('L', 'W'): 4, ('L', 'Y'): 4, ('M', 'A'): 4, ('M', 'C'): 4, ('M', 'D'): 4, ('M', 'E'): 4, ('M', 'F'): 4, ('M', 'G'): 4, ('M', 'H'): 4, ('M', 'I'): 3, ('M', 'K'): 4, ('M', 'L'): 2, ('M', 'M'): 0, ('M', 'N'): 4, ('M', 'P'): 4, ('M', 'Q'): 4, ('M', 'R'): 4, ('M', 'S'): 4, ('M', 'T'): 4, ('M', 'V'): 3, ('M', 'W'): 4, ('M', 'Y'): 4, ('N', 'A'): 4, ('N', 'C'): 4, ('N', 'D'): 3, ('N', 'E'): 4, ('N', 'F'): 4, ('N', 'G'): 4, ('N', 'H'): 3, ('N', 'I'): 4, ('N', 'K'): 4, ('N', 'L'): 4, ('N', 'M'): 4, ('N', 'N'): 0, ('N', 'P'): 4, ('N', 'Q'): 4, ('N', 'R'): 4, ('N', 'S'): 3, ('N', 'T'): 4, ('N', 'V'): 4, ('N', 'W'): 4, ('N', 'Y'): 4, ('P', 'A'): 4, ('P', 'C'): 4, ('P', 'D'): 4, ('P', 'E'): 4, ('P', 'F'): 4, ('P', 'G'): 4, ('P', 'H'): 4, ('P', 'I'): 4, ('P', 'K'): 4, ('P', 'L'): 4, ('P', 'M'): 4, ('P', 'N'): 4, ('P', 'P'): 0, ('P', 'Q'): 4, ('P', 'R'): 4, ('P', 'S'): 4, ('P', 'T'): 4, ('P', 'V'): 4, ('P', 'W'): 4, ('P', 'Y'): 4, ('Q', 'A'): 4, ('Q', 'C'): 4, ('Q', 'D'): 4, ('Q', 'E'): 2, ('Q', 'F'): 4, ('Q', 'G'): 4, ('Q', 'H'): 4, ('Q', 'I'): 4, ('Q', 'K'): 3, ('Q', 'L'): 4, ('Q', 'M'): 4, ('Q', 'N'): 4, ('Q', 'P'): 4, ('Q', 'Q'): 0, ('Q', 'R'): 3, ('Q', 'S'): 4, ('Q', 'T'): 4, ('Q', 'V'): 4, ('Q', 'W'): 4, ('Q', 'Y'): 4, ('R', 'A'): 4, ('R', 'C'): 4, ('R', 'D'): 4, ('R', 'E'): 4, ('R', 'F'): 4, ('R', 'G'): 4, ('R', 'H'): 4, ('R', 'I'): 4, ('R', 'K'): 2, ('R', 'L'): 4, ('R', 'M'): 4, ('R', 'N'): 4, ('R', 'P'): 4, ('R', 'Q'): 3, ('R', 'R'): 0, ('R', 'S'): 4, ('R', 'T'): 4, ('R', 'V'): 4, ('R', 'W'): 4, ('R', 'Y'): 4, ('S', 'A'): 3, ('S', 'C'): 4, ('S', 'D'): 4, ('S', 'E'): 4, ('S', 'F'): 4, ('S', 'G'): 4, ('S', 'H'): 4, ('S', 'I'): 4, ('S', 'K'): 4, ('S', 'L'): 4, ('S', 'M'): 4, ('S', 'N'): 3, ('S', 'P'): 4, ('S', 'Q'): 4, ('S', 'R'): 4, ('S', 'S'): 0, ('S', 'T'): 3, ('S', 'V'): 4, ('S', 'W'): 4, ('S', 'Y'): 4, ('T', 'A'): 4, ('T', 'C'): 4, ('T', 'D'): 4, ('T', 'E'): 4, ('T', 'F'): 4, ('T', 'G'): 4, ('T', 'H'): 4, ('T', 'I'): 4, ('T', 'K'): 4, ('T', 'L'): 4, ('T', 'M'): 4, ('T', 'N'): 4, ('T', 'P'): 4, ('T', 'Q'): 4, ('T', 'R'): 4, ('T', 'S'): 3, ('T', 'T'): 0, ('T', 'V'): 4, ('T', 'W'): 4, ('T', 'Y'): 4, ('V', 'A'): 4, ('V', 'C'): 4, ('V', 'D'): 4, ('V', 'E'): 4, ('V', 'F'): 4, ('V', 'G'): 4, ('V', 'H'): 4, ('V', 'I'): 1, ('V', 'K'): 4, ('V', 'L'): 3, ('V', 'M'): 3, ('V', 'N'): 4, ('V', 'P'): 4, ('V', 'Q'): 4, ('V', 'R'): 4, ('V', 'S'): 4, ('V', 'T'): 4, ('V', 'V'): 0, ('V', 'W'): 4, ('V', 'Y'): 4, ('W', 'A'): 4, ('W', 'C'): 4, ('W', 'D'): 4, ('W', 'E'): 4, ('W', 'F'): 3, ('W', 'G'): 4, ('W', 'H'): 4, ('W', 'I'): 4, ('W', 'K'): 4, ('W', 'L'): 4, ('W', 'M'): 4, ('W', 'N'): 4, ('W', 'P'): 4, ('W', 'Q'): 4, ('W', 'R'): 4, ('W', 'S'): 4, ('W', 'T'): 4, ('W', 'V'): 4, ('W', 'W'): 0, ('W', 'Y'): 2, ('Y', 'A'): 4, ('Y', 'C'): 4, ('Y', 'D'): 4, ('Y', 'E'): 4, ('Y', 'F'): 1, ('Y', 'G'): 4, ('Y', 'H'): 2, ('Y', 'I'): 4, ('Y', 'K'): 4, ('Y', 'L'): 4, ('Y', 'M'): 4, ('Y', 'N'): 4, ('Y', 'P'): 4, ('Y', 'Q'): 4, ('Y', 'R'): 4, ('Y', 'S'): 4, ('Y', 'T'): 4, ('Y', 'V'): 4, ('Y', 'W'): 2, ('Y', 'Y'): 0} | ||
| tcrblosum_alpha_distance_matrix = {('A', 'A'): 0, ('A', 'R'): 4, ('A', 'N'): 4, ('A', 'D'): 4, ('A', 'C'): 4, ('A', 'Q'): 4, ('A', 'E'): 4, ('A', 'G'): 4, ('A', 'H'): 4, ('A', 'I'): 4, ('A', 'L'): 4, ('A', 'K'): 4, ('A', 'M'): 4, ('A', 'F'): 4, ('A', 'P'): 4, ('A', 'S'): 4, ('A', 'T'): 4, ('A', 'W'): 4, ('A', 'Y'): 4, ('A', 'V'): 4, ('R', 'A'): 4, ('R', 'R'): 0, ('R', 'N'): 4, ('R', 'D'): 4, ('R', 'C'): 3, ('R', 'Q'): 4, ('R', 'E'): 4, ('R', 'G'): 4, ('R', 'H'): 4, ('R', 'I'): 4, ('R', 'L'): 4, ('R', 'K'): 4, ('R', 'M'): 4, ('R', 'F'): 4, ('R', 'P'): 4, ('R', 'S'): 4, ('R', 'T'): 4, ('R', 'W'): 4, ('R', 'Y'): 4, ('R', 'V'): 4, ('N', 'A'): 4, ('N', 'R'): 4, ('N', 'N'): 0, ('N', 'D'): 4, ('N', 'C'): 4, ('N', 'Q'): 4, ('N', 'E'): 4, ('N', 'G'): 4, ('N', 'H'): 4, ('N', 'I'): 4, ('N', 'L'): 4, ('N', 'K'): 3, ('N', 'M'): 4, ('N', 'F'): 4, ('N', 'P'): 4, ('N', 'S'): 4, ('N', 'T'): 4, ('N', 'W'): 4, ('N', 'Y'): 4, ('N', 'V'): 4, ('D', 'A'): 4, ('D', 'R'): 4, ('D', 'N'): 4, ('D', 'D'): 0, ('D', 'C'): 4, ('D', 'Q'): 4, ('D', 'E'): 4, ('D', 'G'): 4, ('D', 'H'): 4, ('D', 'I'): 4, ('D', 'L'): 4, ('D', 'K'): 4, ('D', 'M'): 4, ('D', 'F'): 4, ('D', 'P'): 4, ('D', 'S'): 4, ('D', 'T'): 4, ('D', 'W'): 4, ('D', 'Y'): 4, ('D', 'V'): 4, ('C', 'A'): 4, ('C', 'R'): 3, ('C', 'N'): 4, ('C', 'D'): 4, ('C', 'C'): 0, ('C', 'Q'): 4, ('C', 'E'): 4, ('C', 'G'): 4, ('C', 'H'): 4, ('C', 'I'): 4, ('C', 'L'): 4, ('C', 'K'): 4, ('C', 'M'): 4, ('C', 'F'): 4, ('C', 'P'): 4, ('C', 'S'): 4, ('C', 'T'): 4, ('C', 'W'): 4, ('C', 'Y'): 4, ('C', 'V'): 4, ('Q', 'A'): 4, ('Q', 'R'): 4, ('Q', 'N'): 4, ('Q', 'D'): 4, ('Q', 'C'): 4, ('Q', 'Q'): 0, ('Q', 'E'): 4, ('Q', 'G'): 4, ('Q', 'H'): 4, ('Q', 'I'): 4, ('Q', 'L'): 4, ('Q', 'K'): 3, ('Q', 'M'): 4, ('Q', 'F'): 4, ('Q', 'P'): 4, ('Q', 'S'): 4, ('Q', 'T'): 4, ('Q', 'W'): 4, ('Q', 'Y'): 4, ('Q', 'V'): 4, ('E', 'A'): 4, ('E', 'R'): 4, ('E', 'N'): 4, ('E', 'D'): 4, ('E', 'C'): 4, ('E', 'Q'): 4, ('E', 'E'): 0, ('E', 'G'): 4, ('E', 'H'): 3, ('E', 'I'): 4, ('E', 'L'): 4, ('E', 'K'): 4, ('E', 'M'): 4, ('E', 'F'): 4, ('E', 'P'): 4, ('E', 'S'): 4, ('E', 'T'): 4, ('E', 'W'): 4, ('E', 'Y'): 4, ('E', 'V'): 4, ('G', 'A'): 4, ('G', 'R'): 4, ('G', 'N'): 4, ('G', 'D'): 4, ('G', 'C'): 4, ('G', 'Q'): 4, ('G', 'E'): 4, ('G', 'G'): 0, ('G', 'H'): 4, ('G', 'I'): 4, ('G', 'L'): 4, ('G', 'K'): 4, ('G', 'M'): 4, ('G', 'F'): 4, ('G', 'P'): 4, ('G', 'S'): 4, ('G', 'T'): 4, ('G', 'W'): 4, ('G', 'Y'): 4, ('G', 'V'): 4, ('H', 'A'): 4, ('H', 'R'): 4, ('H', 'N'): 4, ('H', 'D'): 4, ('H', 'C'): 4, ('H', 'Q'): 4, ('H', 'E'): 3, ('H', 'G'): 4, ('H', 'H'): 0, ('H', 'I'): 4, ('H', 'L'): 4, ('H', 'K'): 4, ('H', 'M'): 4, ('H', 'F'): 4, ('H', 'P'): 3, ('H', 'S'): 4, ('H', 'T'): 4, ('H', 'W'): 3, ('H', 'Y'): 4, ('H', 'V'): 4, ('I', 'A'): 4, ('I', 'R'): 4, ('I', 'N'): 4, ('I', 'D'): 4, ('I', 'C'): 4, ('I', 'Q'): 4, ('I', 'E'): 4, ('I', 'G'): 4, ('I', 'H'): 4, ('I', 'I'): 0, ('I', 'L'): 4, ('I', 'K'): 4, ('I', 'M'): 4, ('I', 'F'): 4, ('I', 'P'): 4, ('I', 'S'): 4, ('I', 'T'): 3, ('I', 'W'): 4, ('I', 'Y'): 4, ('I', 'V'): 4, ('L', 'A'): 4, ('L', 'R'): 4, ('L', 'N'): 4, ('L', 'D'): 4, ('L', 'C'): 4, ('L', 'Q'): 4, ('L', 'E'): 4, ('L', 'G'): 4, ('L', 'H'): 4, ('L', 'I'): 4, ('L', 'L'): 0, ('L', 'K'): 4, ('L', 'M'): 4, ('L', 'F'): 3, ('L', 'P'): 4, ('L', 'S'): 4, ('L', 'T'): 4, ('L', 'W'): 4, ('L', 'Y'): 4, ('L', 'V'): 4, ('K', 'A'): 4, ('K', 'R'): 4, ('K', 'N'): 3, ('K', 'D'): 4, ('K', 'C'): 4, ('K', 'Q'): 3, ('K', 'E'): 4, ('K', 'G'): 4, ('K', 'H'): 4, ('K', 'I'): 4, ('K', 'L'): 4, ('K', 'K'): 0, ('K', 'M'): 4, ('K', 'F'): 4, ('K', 'P'): 4, ('K', 'S'): 4, ('K', 'T'): 4, ('K', 'W'): 4, ('K', 'Y'): 4, ('K', 'V'): 4, ('M', 'A'): 4, ('M', 'R'): 4, ('M', 'N'): 4, ('M', 'D'): 4, ('M', 'C'): 4, ('M', 'Q'): 4, ('M', 'E'): 4, ('M', 'G'): 4, ('M', 'H'): 4, ('M', 'I'): 4, ('M', 'L'): 4, ('M', 'K'): 4, ('M', 'M'): 0, ('M', 'F'): 4, ('M', 'P'): 4, ('M', 'S'): 4, ('M', 'T'): 4, ('M', 'W'): 4, ('M', 'Y'): 4, ('M', 'V'): 4, ('F', 'A'): 4, ('F', 'R'): 4, ('F', 'N'): 4, ('F', 'D'): 4, ('F', 'C'): 4, ('F', 'Q'): 4, ('F', 'E'): 4, ('F', 'G'): 4, ('F', 'H'): 4, ('F', 'I'): 4, ('F', 'L'): 3, ('F', 'K'): 4, ('F', 'M'): 4, ('F', 'F'): 0, ('F', 'P'): 4, ('F', 'S'): 4, ('F', 'T'): 4, ('F', 'W'): 4, ('F', 'Y'): 4, ('F', 'V'): 4, ('P', 'A'): 4, ('P', 'R'): 4, ('P', 'N'): 4, ('P', 'D'): 4, ('P', 'C'): 4, ('P', 'Q'): 4, ('P', 'E'): 4, ('P', 'G'): 4, ('P', 'H'): 3, ('P', 'I'): 4, ('P', 'L'): 4, ('P', 'K'): 4, ('P', 'M'): 4, ('P', 'F'): 4, ('P', 'P'): 0, ('P', 'S'): 4, ('P', 'T'): 4, ('P', 'W'): 4, ('P', 'Y'): 4, ('P', 'V'): 4, ('S', 'A'): 4, ('S', 'R'): 4, ('S', 'N'): 4, ('S', 'D'): 4, ('S', 'C'): 4, ('S', 'Q'): 4, ('S', 'E'): 4, ('S', 'G'): 4, ('S', 'H'): 4, ('S', 'I'): 4, ('S', 'L'): 4, ('S', 'K'): 4, ('S', 'M'): 4, ('S', 'F'): 4, ('S', 'P'): 4, ('S', 'S'): 0, ('S', 'T'): 4, ('S', 'W'): 4, ('S', 'Y'): 4, ('S', 'V'): 4, ('T', 'A'): 4, ('T', 'R'): 4, ('T', 'N'): 4, ('T', 'D'): 4, ('T', 'C'): 4, ('T', 'Q'): 4, ('T', 'E'): 4, ('T', 'G'): 4, ('T', 'H'): 4, ('T', 'I'): 3, ('T', 'L'): 4, ('T', 'K'): 4, ('T', 'M'): 4, ('T', 'F'): 4, ('T', 'P'): 4, ('T', 'S'): 4, ('T', 'T'): 0, ('T', 'W'): 4, ('T', 'Y'): 4, ('T', 'V'): 4, ('W', 'A'): 4, ('W', 'R'): 4, ('W', 'N'): 4, ('W', 'D'): 4, ('W', 'C'): 4, ('W', 'Q'): 4, ('W', 'E'): 4, ('W', 'G'): 4, ('W', 'H'): 3, ('W', 'I'): 4, ('W', 'L'): 4, ('W', 'K'): 4, ('W', 'M'): 4, ('W', 'F'): 4, ('W', 'P'): 4, ('W', 'S'): 4, ('W', 'T'): 4, ('W', 'W'): 0, ('W', 'Y'): 4, ('W', 'V'): 4, ('Y', 'A'): 4, ('Y', 'R'): 4, ('Y', 'N'): 4, ('Y', 'D'): 4, ('Y', 'C'): 4, ('Y', 'Q'): 4, ('Y', 'E'): 4, ('Y', 'G'): 4, ('Y', 'H'): 4, ('Y', 'I'): 4, ('Y', 'L'): 4, ('Y', 'K'): 4, ('Y', 'M'): 4, ('Y', 'F'): 4, ('Y', 'P'): 4, ('Y', 'S'): 4, ('Y', 'T'): 4, ('Y', 'W'): 4, ('Y', 'Y'): 0, ('Y', 'V'): 4, ('V', 'A'): 4, ('V', 'R'): 4, ('V', 'N'): 4, ('V', 'D'): 4, ('V', 'C'): 4, ('V', 'Q'): 4, ('V', 'E'): 4, ('V', 'G'): 4, ('V', 'H'): 4, ('V', 'I'): 4, ('V', 'L'): 4, ('V', 'K'): 4, ('V', 'M'): 4, ('V', 'F'): 4, ('V', 'P'): 4, ('V', 'S'): 4, ('V', 'T'): 4, ('V', 'W'): 4, ('V', 'Y'): 4, ('V', 'V'): 0} |
There was a problem hiding this comment.
Did you make any modifications to the original TCRblosum matrix? e.g. the cap at 4?
There was a problem hiding this comment.
First of all, you always have this conversion from the substitution matrix (blosum62, tcrblosum) to a distance matrix.
In the tcrblosum paper , they just menttion "Firstly, we transformed the tcrBLOSUM similarity matrix into a distance matrix according to the rules of TCRdist [12]".
In the original TCRdist paper they write:
"The mismatch distance is defined based on the BLOSUM62 (ref. 37) substitution matrix as follows: distance (a, a)=0; distance (a, b)=min (4, 4-BLOSUM62 (a, b)), where 4 is 1 unit greater than the most favourable BLOSUM62 score for a mismatch, and a and b are amino acids".
Now there would be two options:
- also use the constant cap 4 like in the original tcrdist paper
- use a cap that is one unit greater than the most favourable score
However, the most favourable score would be 1 for tcrblosum_alpha and 2 for tcrblosum_beta which would result in a quite low cap (see matrices in notebook). Therefore I just went with the fixed cap of 4 for the transformation formula from substitution matrix to distance matrix of the original tcrdist implementation:
distance(a, a) = 0 and distance(a, b) = min(4, 4 - score)
Unfortunately, I wasn't able to find out how they did it exactly in the tcrblosum paper. Another thing to consider is, that the distance values for alpha and beta chain might be compared later during the clonotype clustering. Therefore having two different caps might cause problems.
What do you think would be the best option?
There was a problem hiding this comment.
Dear @apostovskaya,
we are working on integrating your tcrBLOSUM matrix into scirpy, which contains a reimplementation of the TCRdist algorithm.
We are unsure how you intended the matrix to be used with TCRdist. Could you please clarify what you mean with
Firstly, we transformed the tcrBLOSUM similarity matrix into a distance matrix according to the rules of TCRdist [12]
in your paper and how you would suggest to set the cap?
There was a problem hiding this comment.
"The mismatch distance is defined based on the BLOSUM62 (ref. 37) substitution matrix as follows: distance (a, a)=0; distance (a, b)=min (4, 4-BLOSUM62 (a, b)), where 4 is 1 unit greater than the most favourable BLOSUM62 score for a mismatch, and a and b are amino acids".
tbh, I never understood this part about TCRdist. It skews the matrix quite a bit, basically assigning a distance of 4 to all mismatches that have a negative distance in BLOSUM62 (even if it's just -1). So the only way to get a score
Having a cap of 2 with TCRblosum doesn't make sense to me, because then the strong negative effect of mismatches with C would be completely gone which defeats the purpose of TCRblosum. Using 4 is just as arbitrary.
So far, the TCRdist metric used a distance matrix derived from the blosum62 substitution matrix. This PR extends TCRdistDistanceCalculator with a new base_matrix="tcrblosum" option alongside the existing default blosum62 behavior. This way, distance matrices based on the tcrblosum substitution matrices (different matrices for alpha and beta chain) are used for the TCRdist metric calculation.
I try to illustrate how I derived the tcrblosum based distance matrices in this google colab notebook.
The usage of the tcrblosum matrices was already discussed in #591.