Skip to content

kmayerb/tcrdist3

Repository files navigation

tcrdist3

Python application Coverage StatusDocumentation Status Docker Repository on Quay

Flexible distance measures for comparing T cell receptors

tcrdist3 is a python API-enabled toolkit for analyzing T-cell receptor repertoires. Some of the functionality and code is adapted from the original tcr-dist package which was released with the publication of Dash et al. Nature (2017) doi:10.1038/nature22383. This package contains a new API for computing tcrdistance measures as well as new features for biomarker development (bioRxiv (2020)). The package has been expanded to include gamma-delta TCRs; it has also been recoded to increase CPU efficiency using numba, a high-performance just-in-time compiler.

Installation

PyPI version

pip install tcrdist3

or

pip install git+https://github.com/kmayerb/tcrdist3.git@0.2.2

Docker

Docker Repository on Quay

docker pull quay.io/kmayerb/tcrdist3:0.2.2

User-Contributed Colab Notebook Examples Using tcrdist3

1. Example K Nearest Neighbor Classification using tcrdist3

open in colab (Author: Liel Cohen-Lavi). This notebook illustrates how to integrate tcrdist3 with scikit-learn's implementation of K Nearest Neighbor classification. TCRdist-based KNN classification performance on a set of labeled receptors is assessed with cross-validation or training/test splits This simple method is proposed as a quickly implementable benchmark for the performance of more computationally intensive TCR-epitope specificity prediction approaches.

Package Documentation

Documentation Status

More documentation can be found at tcrdist3.readthedocs.

Basic Usage

import pandas as pd
from tcrdist.repertoire import TCRrep

df = pd.read_csv("dash.csv")
tr = TCRrep(cell_df = df, 
            organism = 'mouse', 
            chains = ['alpha','beta'], 
            db_file = 'alphabeta_gammadelta_db.tsv')

tr.pw_alpha
tr.pw_beta
tr.pw_cdr3_a_aa
tr.pw_cdr3_b_aa

from tcrdist.public import _neighbors_fixed_radius
_neighbors_fixed_radius(tr.pw_beta, 50)         

Sparse Matrix Representation

import pandas as pd
from tcrdist.repertoire import TCRrep
from tcrdist.breadth import get_safe_chunk

df = pd.read_csv("dash.csv")
tr = TCRrep(cell_df = df[['subject','epitope','count','v_b_gene','j_b_gene','cdr3_b_aa','cdr3_b_nucseq']], 
            organism = 'mouse', 
            chains = ['beta'], 
            compute_distances = False)

# Set to desired number of CPUs
tr.cpus = 2

# Identify a safe chunk size based on input data shape and target number of 
# pairwise distance to be temporarily held in memory per node. 
safe_chunk_size = get_safe_chunk(
            tr.clone_df.shape[0], 
            tr.clone_df.shape[0], 
            target = 10**7) 

tr.compute_sparse_rect_distances(
        df = tr.clone_df, 
        radius=50,
        chunk_size = safe_chunk_size)

print(tr.rw_beta)

Citing

TCR meta-clonotypes for biomarker discovery with tcrdist3 enabled identification of public, HLA-restricted clusters of SARS-CoV-2 TCRs

Mayer-Blackwell K, Schattgen S, Cohen-Lavi L, Crawford JC, Souquette A, Gaevert JA, Hertz T, Thomas PG, Bradley PH, Fiore-Gartland A. eLife (2021).

Quantifiable predictive features define epitope-specific T cell receptor repertoires

Pradyot Dash, Andrew J. Fiore-Gartland, Tomer Hertz, George C. Wang, Shalini Sharma, Aisha Souquette, Jeremy Chase Crawford, E. Bridie Clemens, Thi H. O. Nguyen, Katherine Kedzierska, Nicole L. La Gruta, Philip Bradley & Paul G. Thomas Nature (2017).