# Structural Diversity Assessment

DeepTCR can be used to assess the structural diversity within a repertoire. One can think of this structural diversity assessment as being a measure of the number of antigens or concepts within a repertoire. A repertoire that has low diversity is recognizing few antigens while one with a high diversity can recognize more. We will assess two measures of diversity. The first one is the number of clusters/concepts of TCR sequences are present in each sample/repertoire. And the second is how entropic is the distribution of the sequence reads across these clusters. For example, a sample can have 12 clusters but if 90% of the repertoire is is one cluster, this would be a low entropic repertoire.

First, we will load data and train the VAE.

In [None]:
import sys
sys.path.append('../../')
from DeepTCR.DeepTCR_U import DeepTCR_U

# Instantiate training object
DTCRU = DeepTCR_U('Tutorial')

#Load Data from directories
DTCRU.Get_Data(directory='../../Data/Murine_Antigens',Load_Prev_Data=False,aggregate_by_aa=True,
               aa_column_beta=0,count_column=1,v_beta_column=2,j_beta_column=3)

#Train VAE
DTCRU.Train_VAE(Load_Prev_Data=False)

We will then execute the following command to generate diversity measurements.

In [None]:
DTCRU.Structural_Diversity()

Because the first step of this algorithm is to cluster the data via the phenograph algorithm, we can also change the sample parameter to sub-sample for our initial clustering before applying a K-nearest neighbors algorithm to assign the rest of the sequences. This is helpful in the case of very large TCRSeq file (like those collected in from Tumor-Infiltrating Lymphocytes (TIL)).

In [None]:
DTCRU.Structural_Diversity(sample=500)

We can then view the structural diversity metrics in the respective object variable.

In [None]:
print(DTCRU.Structural_Diversity_DF)

We can see the entropy and number of clusters for each sample.