## Data loading

For loading the data I mostly reuse the same pipeline, but the important thing here is that:

- invalid TCRs are removed
- only TCRs with v_calls that are able to be processed by Sebastiaans tools are kept
- each row is represents a TCR that arose individually: group by (patient_id, junction_nt, v_call, j_call)

In [2]:
import pandas as pd
from pathlib import Path

from raptcr.io.pipeline import ProcessingPipeline
from raptcr.io.mappers import RegexMapper

2025-02-14 14:09:35 - RepertoireReader - INFO - Logging initialized


In [3]:
data_path = Path('/home/vincent/Documents/projects/alex_hiv/data/mixcr')
patient_id_mapper = RegexMapper(pattern=r"\d+\_(C?\d+)\-")
repertoire_id_mapper = RegexMapper(pattern=r"\d+\_(C?\d+)(\-[vV]\d)?\-(EM|EMRA|CM|NAIVE|NC|ACT)", group=[1,3])

In [4]:
pipe = ProcessingPipeline(
    patient_id_mapping=patient_id_mapper,
    repertoire_id_mapping=repertoire_id_mapper
)

In [5]:
data = pipe.process_dataset(data_path)

2025-02-14 14:09:36 - RepertoireReader - INFO - Read 4629 sequences with reader:mixcr_umi
2025-02-14 14:09:36 - RepertoireReader - INFO - Filtered 239 (5.16%) rows with missing values


2025-02-14 14:09:36 - RepertoireReader - INFO - Filtered 0 (0.00%) rows with invalid V/J genes
2025-02-14 14:09:36 - RepertoireReader - INFO - Filtered 535 (12.19%) rows with invalid junction_aa's
2025-02-14 14:09:36 - RepertoireReader - INFO - Filtered 40 (1.04%) rows with non-productive V/J genes
2025-02-14 14:09:36 - RepertoireReader - INFO - Trimmed 3815 (1.00) junction_nt to CDR3 sequence
2025-02-14 14:09:36 - RepertoireReader - INFO - Grouped 1932 (50.64%) clonotypes based on (['junction', 'v_gene', 'junction_aa', 'j_gene'])
2025-02-14 14:09:36 - RepertoireReader - INFO - Read 2631 sequences with reader:mixcr_umi
2025-02-14 14:09:36 - RepertoireReader - INFO - Filtered 140 (5.32%) rows with missing values
2025-02-14 14:09:36 - RepertoireReader - INFO - Filtered 0 (0.00%) rows with invalid V/J genes
2025-02-14 14:09:36 - RepertoireReader - INFO - Filtered 219 (8.79%) rows with invalid junction_aa's
2025-02-14 14:09:36 - RepertoireReader - INFO - Filtered 33 (1.45%) rows with non-p

In [6]:
# for this example, I will only use the TRB
data["chain"] = data["v_gene"].str[:3]
data = data.query("chain == 'TRB'").reset_index(drop=True)

In [7]:
data["v_call"] = data["v_gene"]+"*01"
data["j_call"] = data["j_gene"]+"*01"

In [8]:
data

Unnamed: 0,junction,v_gene,junction_aa,j_gene,duplicate_count,repertoire_id,patient_id,chain,v_call,j_call
0,tgcgccagcagcaaccacagggcgggggagcagtacgtc,TRBV10-2,CASSNHRAGEQYV,TRBJ2-7,1954,111+EM,111,TRB,TRBV10-2*01,TRBJ2-7*01
1,tgcagtgctagcggggtgggcaatgagcagttcttc,TRBV20-1,CSASGVGNEQFF,TRBJ2-1,1754,111+EM,111,TRB,TRBV20-1*01,TRBJ2-1*01
2,tgtgccagcagccaagaatcaggggggatcgccggggagctgtttttt,TRBV3-1,CASSQESGGIAGELFF,TRBJ2-2,403,111+EM,111,TRB,TRBV3-1*01,TRBJ2-2*01
3,tgcgccagcagccaagaacccaggcccggggacggggagctgtttttt,TRBV4-1,CASSQEPRPGDGELFF,TRBJ2-2,316,111+EM,111,TRB,TRBV4-1*01,TRBJ2-2*01
4,tgtgccagcagcttggttgccggcacagatacgcagtatttt,TRBV5-6,CASSLVAGTDTQYF,TRBJ2-3,264,111+EM,111,TRB,TRBV5-6*01,TRBJ2-3*01
...,...,...,...,...,...,...,...,...,...,...
271434,tgtgccagcagtgaggtcctagccggggcctacgagcagtacttc,TRBV25-1,CASSEVLAGAYEQYF,TRBJ2-7,1,C21+NAIVE,C21,TRB,TRBV25-1*01,TRBJ2-7*01
271435,tgtgccagcagtgagggtggacaggcatcaccgtggtaccaagaga...,TRBV25-1,CASSEGGQASPWYQETQYF,TRBJ2-5,1,C21+NAIVE,C21,TRB,TRBV25-1*01,TRBJ2-5*01
271436,tgtgccagcagtgacgcgagagacagtgccctgggctacaccttc,TRBV2,CASSDARDSALGYTF,TRBJ1-2,1,C21+NAIVE,C21,TRB,TRBV2*01,TRBJ1-2*01
271437,tgtgccagcagtgacctaccggggggcactgaagctttcttt,TRBV6-4,CASSDLPGGTEAFF,TRBJ1-1,1,C21+NAIVE,C21,TRB,TRBV6-4*01,TRBJ1-1*01


## Background generation

In [9]:
from clustcrdist.background import BackgroundModel

2025-02-14 14:09:54 - faiss.loader - INFO - Loading faiss.
2025-02-14 14:09:54 - faiss.loader - INFO - Successfully loaded faiss.
2025-02-14 14:09:56 - matplotlib.font_manager - INFO - Failed to extract font properties from /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf: In FT2Font: Could not set the fontsize (invalid pixel size; error code 0x17)

Due to the on going maintenance burden of keeping command line application
wrappers up to date, we have decided to deprecate and eventually remove these
modules.

We instead now recommend building your command line and invoking it directly
with the subprocess module.


In [10]:
bg_depth = 1
bg_data = []
skipped = []

for repertoire_id, repertoire_df in data.groupby("repertoire_id"): # create a background per input repertoire

    if len(repertoire_df) < 20:
        print("Skipping, too few sequences")
        skipped.append(repertoire_id)
        continue

    bg_repertoire_id = f"{repertoire_id}_bg"
    bgm = BackgroundModel(repertoire=repertoire_df, factor=bg_depth, verbose=True)

    shuffled_rep = bgm.shuffle(chain="TRB")
    shuffled_rep["repertoire_id"] = bg_repertoire_id
    bg_data.append(shuffled_rep)

Skipping, too few sequences
B
Single chain detected.
parse_tcr_junctions: 0 2391
success_rate: 91.59
resample_background_tcrs: build nucseq_srclist 0 7173
desirable_Ncounts: [0, 1]
Ns: 0 fg_ncounts: 70 bg_ncounts: 48 bad_ncounts: 71 sum: 119
Ns: 1 fg_ncounts: 98 bg_ncounts: 86 bad_ncounts: 160 sum: 246
final Ndevs: 4 11 5 -12 6 13 7 8 8 -10 9 -5 10 2 11 11 12 -14 13 -2 15 5 16 -7 17 3 19 4 20 -3 21 -2 22 -2 23 -1 24 -2 25 1 26 -1 28 1 29 1 30 1 final Ldevs: B
Single chain detected.
parse_tcr_junctions: 0 2410
success_rate: 92.09
resample_background_tcrs: build nucseq_srclist 0 7230
dont have enough of all lens
desirable_Ncounts: [0, 1]
Ns: 0 fg_ncounts: 58 bg_ncounts: 46 bad_ncounts: 88 sum: 134
Ns: 1 fg_ncounts: 86 bg_ncounts: 93 bad_ncounts: 164 sum: 257
final Ndevs: 1 -3 2 -2 3 12 4 5 5 -18 6 17 7 23 8 -34 9 35 10 -19 11 -5 12 -11 13 9 14 -2 15 -11 16 -9 17 3 18 3 19 1 20 2 21 3 23 1 24 1 26 -1 final Ldevs: 7 1 9 -1 B
Single chain detected.
parse_tcr_junctions: 0 10420
success_rate:

In [11]:
bg_data = pd.concat(bg_data).drop_duplicates()
bg_data["patient_id"] = bg_data["repertoire_id"].str.split("+").str[0] + "_bg"

In [12]:
bg_data

Unnamed: 0,v_call,j_call,junction_aa,junction,repertoire_id,patient_id
0,TRBV20-1*01,TRBJ2-1*01,CSARVSGLAILNEQFF,tgcagtgctagagtatcgggactagcgattctgaatgagcagttcttc,101+CM_bg,101_bg
1,TRBV19*01,TRBJ1-1*01,CASNTGQNTEAFF,tgtgccagtaacaccgggcagaacactgaagctttcttt,101+CM_bg,101_bg
2,TRBV25-1*01,TRBJ1-4*01,CASSAYTGGNEKLFF,tgtgccagcagtgcctacaccggcggtaatgaaaaactgtttttt,101+CM_bg,101_bg
3,TRBV27*01,TRBJ2-7*01,CASSLGDRGPYEQYF,tgtgccagcagtttgggggacagggggccctacgagcagtacttc,101+CM_bg,101_bg
4,TRBV7-9*01,TRBJ2-1*01,CASSGDNYNEQFF,tgtgccagcagcggggacaactacaatgagcagttcttc,101+CM_bg,101_bg
...,...,...,...,...,...,...
590,TRBV29-1*01,TRBJ2-5*01,CSAWGGWAAETQYF,tgcagcgcctggggcggttgggcggcggagacccagtacttc,C38+NAIVE_bg,C38_bg
591,TRBV27*01,TRBJ2-7*01,CASSLDRGTHEQYF,tgtgccagcagcctagacaggggcacccacgagcagtacttc,C38+NAIVE_bg,C38_bg
592,TRBV29-1*01,TRBJ2-3*01,CSVEQESTDTQYF,tgcagcgtcgaacaggaatcgacagatacgcagtatttt,C38+NAIVE_bg,C38_bg
593,TRBV19*01,TRBJ2-1*01,CASSAGKNEQFF,tgtgccagtagtgccgggaaaaatgagcagttcttc,C38+NAIVE_bg,C38_bg


In [None]:
# concatenate foreground and background data
merged_data = (
    pd.concat([
        data.query("repertoire_id not in @skipped"), 
        bg_data
        
    ])
    [["junction", "v_call", "junction_aa", "j_call", "repertoire_id", "patient_id"]]
    .reset_index(drop=True)
)

merged_data["background"] = merged_data["repertoire_id"].str.contains("_bg")

In [14]:
merged_data

Unnamed: 0,junction,v_call,junction_aa,j_call,repertoire_id,patient_id,background
0,tgcgccagcagcaaccacagggcgggggagcagtacgtc,TRBV10-2*01,CASSNHRAGEQYV,TRBJ2-7*01,111+EM,111,False
1,tgcagtgctagcggggtgggcaatgagcagttcttc,TRBV20-1*01,CSASGVGNEQFF,TRBJ2-1*01,111+EM,111,False
2,tgtgccagcagccaagaatcaggggggatcgccggggagctgtttttt,TRBV3-1*01,CASSQESGGIAGELFF,TRBJ2-2*01,111+EM,111,False
3,tgcgccagcagccaagaacccaggcccggggacggggagctgtttttt,TRBV4-1*01,CASSQEPRPGDGELFF,TRBJ2-2*01,111+EM,111,False
4,tgtgccagcagcttggttgccggcacagatacgcagtatttt,TRBV5-6*01,CASSLVAGTDTQYF,TRBJ2-3*01,111+EM,111,False
...,...,...,...,...,...,...,...
542221,tgcagcgcctggggcggttgggcggcggagacccagtacttc,TRBV29-1*01,CSAWGGWAAETQYF,TRBJ2-5*01,C38+NAIVE_bg,C38_bg,True
542222,tgtgccagcagcctagacaggggcacccacgagcagtacttc,TRBV27*01,CASSLDRGTHEQYF,TRBJ2-7*01,C38+NAIVE_bg,C38_bg,True
542223,tgcagcgtcgaacaggaatcgacagatacgcagtatttt,TRBV29-1*01,CSVEQESTDTQYF,TRBJ2-3*01,C38+NAIVE_bg,C38_bg,True
542224,tgtgccagtagtgccgggaaaaatgagcagttcttc,TRBV19*01,CASSAGKNEQFF,TRBJ2-1*01,C38+NAIVE_bg,C38_bg,True


In [17]:
# really make sure to merge clonotypes not originate individually

data_unique = merged_data.groupby(["junction", "v_call", "junction_aa", "j_call", "patient_id"], as_index=False).agg({"repertoire_id": list})

In [18]:
data_unique

Unnamed: 0,junction,v_call,junction_aa,j_call,patient_id,repertoire_id
0,tgcaaacaggggttgaccagcacagatacgcagtatttt,TRBV10-3*01,CKQGLTSTDTQYF,TRBJ2-3*01,502,[502+NAIVE]
1,tgcaacaagacggggactgaagctttcttt,TRBV20-1*01,CNKTGTEAFF,TRBJ1-1*01,116,[116+EM]
2,tgcaacaagacggggggcgagcagtacgtc,TRBV20-1*01,CNKTGGEQYV,TRBJ2-7*01,116_bg,[116+EM_bg]
3,tgcaacacaggggatgcgagcggggaagctttcttt,TRBV29-1*01,CNTGDASGEAFF,TRBJ1-1*01,C35,[C35+NAIVE]
4,tgcaacattctcgggacagccgccacagatacgcagtatttt,TRBV29-1*01,CNILGTAATDTQYF,TRBJ2-3*01,131_bg,[131+CM_bg]
...,...,...,...,...,...,...
533581,tgttgtggcgagggtacgtttacctacgagcagtacttc,TRBV27*01,CCGEGTFTYEQYF,TRBJ2-7*01,115_bg,[115+NAIVE_bg]
533582,tgttgtggcgagggtacgtttagcgagcagtacttc,TRBV27*01,CCGEGTFSEQYF,TRBJ2-7*01,115_bg,[115+NAIVE_bg]
533583,tgttgtggcgagggtacgttttacaatgagcagttcttc,TRBV27*01,CCGEGTFYNEQFF,TRBJ2-1*01,115,[115+NAIVE]
533584,tgtttcgtgatccgggtagggaacactgaagctttcttt,TRBV7-6*01,CFVIRVGNTEAFF,TRBJ1-1*01,303,[303+CM]


## Convergence analysis

In [19]:
from raptcr.neighborhood import ConvergenceAnalysis, Fisher
from raptcr.hashing import TCRDistEmbedder

In [21]:
tcr_embedder = TCRDistEmbedder(full_tcr=False).fit() # full tcr not needed if grouping by v_call

  L = len(cdrs[0])


### Option 1: test one group



In [22]:
fisher = Fisher(
    group_column="background", # compare values in the background column
    positive_groups=[False] # compare non-background (positive group) to background (negative group)
)

### Option 2: test multiple groups at once

In [31]:
patients = data_unique["patient_id"].unique()

# e.g. compare hiv to non-hiv, AND foreground to background
fisher = Fisher(
    group_column="patient_id", # compare values in the background column
    positive_groups={
        "hiv": [x for x in patients if not "C" in x],
        "foreground": [x for x in patients if not "bg" in x],
    } 
)


In [32]:
# pass the embedder and the fisher method to the ConvergenceAnalysis object:

cva = ConvergenceAnalysis(
    tcr_embedder=tcr_embedder,
    convergence_metric=fisher,
    index_method="auto", # switch to approximate nearest neighbors for larger v_call groups
    verbose=True
)

In [33]:
data_unique["to_test"] = ~data_unique["patient_id"].str.contains("_bg")

cva_res = cva.batched_fit_transform(
    data_unique,
    test_selection_column="to_test" # only actually compute statistics for TCRs are not background
)

Processing V gene TRBV10-1*01 (0/48)
Finished searching neighbors (0.05 seconds).
Finished calculating convergence (0.03 seconds).
Processing V gene TRBV10-2*01 (1/48)




Finished searching neighbors (0.22 seconds).
Finished calculating convergence (0.03 seconds).
Processing V gene TRBV10-3*01 (2/48)




Finished searching neighbors (0.90 seconds).
Finished calculating convergence (0.44 seconds).
Processing V gene TRBV11-1*01 (3/48)
Finished searching neighbors (0.14 seconds).
Finished calculating convergence (0.02 seconds).
Processing V gene TRBV11-2*01 (4/48)




Finished searching neighbors (1.16 seconds).
Finished calculating convergence (0.46 seconds).
Processing V gene TRBV11-3*01 (5/48)
Finished searching neighbors (0.06 seconds).
Finished calculating convergence (0.05 seconds).
Processing V gene TRBV12-3*01 (6/48)




Finished searching neighbors (4.48 seconds).
Finished calculating convergence (1.74 seconds).
Processing V gene TRBV12-4*01 (7/48)
Finished searching neighbors (0.01 seconds).
Finished calculating convergence (0.02 seconds).
Processing V gene TRBV12-5*01 (8/48)




Finished searching neighbors (0.73 seconds).
Finished calculating convergence (0.17 seconds).
Processing V gene TRBV13*01 (9/48)
Finished searching neighbors (1.53 seconds).
Finished calculating convergence (0.66 seconds).
Processing V gene TRBV14*01 (10/48)
Finished searching neighbors (0.33 seconds).
Finished calculating convergence (0.07 seconds).
Processing V gene TRBV15*01 (11/48)
Finished searching neighbors (0.58 seconds).
Finished calculating convergence (0.10 seconds).
Processing V gene TRBV16*01 (12/48)
Finished searching neighbors (0.03 seconds).
Finished calculating convergence (0.04 seconds).
Processing V gene TRBV18*01 (13/48)




Finished searching neighbors (0.51 seconds).
Finished calculating convergence (0.10 seconds).
Processing V gene TRBV19*01 (14/48)
Training index (nlist=279) on 77914 vectors...
Exact search time (1000 vecs): 1.2698s
[1, 2, 3, 4, 7, 11, 16, 24, 37, 55, 83, 124, 186, 278]
	nprobe=1, recall@2048=0.1065, search_time=0.2004s
	nprobe=2, recall@2048=0.1951, search_time=0.1330s
	nprobe=3, recall@2048=0.2688, search_time=0.1276s
	nprobe=4, recall@2048=0.3358, search_time=0.1644s
	nprobe=7, recall@2048=0.4873, search_time=0.1921s
	nprobe=11, recall@2048=0.6255, search_time=0.1796s
	nprobe=16, recall@2048=0.7360, search_time=0.3039s
	nprobe=24, recall@2048=0.8390, search_time=0.2822s
	nprobe=37, recall@2048=0.9177, search_time=0.2571s
	nprobe=55, recall@2048=0.9625, search_time=0.2764s
	nprobe=83, recall@2048=0.9873, search_time=0.3380s
	nprobe=124, recall@2048=0.9972, search_time=0.4049s
	nprobe=186, recall@2048=0.9998, search_time=0.5308s
	nprobe=278, recall@2048=1.0000, search_time=0.6326s
Sel



Finished searching neighbors (0.92 seconds).
Finished calculating convergence (0.06 seconds).
Processing V gene TRBV6-2*01 (33/48)
Finished searching neighbors (0.67 seconds).
Finished calculating convergence (0.08 seconds).
Processing V gene TRBV6-3*01 (34/48)
Finished searching neighbors (0.00 seconds).
Finished calculating convergence (0.00 seconds).
Processing V gene TRBV6-4*01 (35/48)




Finished searching neighbors (0.22 seconds).
Finished calculating convergence (0.08 seconds).
Processing V gene TRBV6-5*01 (36/48)




Finished searching neighbors (1.37 seconds).
Finished calculating convergence (0.12 seconds).
Processing V gene TRBV6-6*01 (37/48)
Finished searching neighbors (0.77 seconds).
Finished calculating convergence (0.07 seconds).
Processing V gene TRBV6-8*01 (38/48)
Finished searching neighbors (0.00 seconds).
Finished calculating convergence (0.00 seconds).
Processing V gene TRBV6-9*01 (39/48)
Finished searching neighbors (0.01 seconds).
Finished calculating convergence (0.01 seconds).
Processing V gene TRBV7-2*01 (40/48)




Finished searching neighbors (3.14 seconds).
Finished calculating convergence (0.36 seconds).
Processing V gene TRBV7-3*01 (41/48)
Finished searching neighbors (1.34 seconds).
Finished calculating convergence (0.21 seconds).
Processing V gene TRBV7-4*01 (42/48)
Finished searching neighbors (0.00 seconds).
Finished calculating convergence (0.00 seconds).
Processing V gene TRBV7-6*01 (43/48)




Finished searching neighbors (0.31 seconds).
Finished calculating convergence (0.03 seconds).
Processing V gene TRBV7-7*01 (44/48)
Finished searching neighbors (0.01 seconds).
Finished calculating convergence (0.01 seconds).
Processing V gene TRBV7-8*01 (45/48)




Finished searching neighbors (0.53 seconds).
Finished calculating convergence (0.04 seconds).
Processing V gene TRBV7-9*01 (46/48)
Finished searching neighbors (2.37 seconds).
Finished calculating convergence (0.29 seconds).
Processing V gene TRBV9*01 (47/48)
Finished searching neighbors (4.49 seconds).
Finished calculating convergence (0.90 seconds).


### Result interpretation: 

- match_true, match_false: the number of highly similar clonotypes are from the positive and negative group, respectively
- background_true, background_false: the number of non-similar background repertoires from the positive and negative group, respectively

- pvalue: significance of difference of number of highly similar clonotypes in positive vs negative group, compared to background
- convergence: size of difference in number of highly similar clonotypes (it's a log2 fold ratio)

these columns will be added for each feature for which tests were done

In [42]:
cva_res.sort_values("hiv_pvalue").query("hiv_convergence > 0").head(100)

Unnamed: 0,junction,v_call,junction_aa,j_call,patient_id,repertoire_id,to_test,hiv_match_true,hiv_match_false,hiv_background_true,hiv_background_false,hiv_statistic,hiv_pvalue,hiv_convergence,foreground_match_true,foreground_match_false,foreground_background_true,foreground_background_false,foreground_statistic,foreground_pvalue,foreground_convergence
149487,tgtgccagcagtttatcgcgggagtcttacgagcagtacttc,TRBV27*01,CASSLSRESYEQYF,TRBJ2-7*01,139,[139+NAIVE],True,56,2,1576,418,14.561905,0.000065,2.892662,41,17,1091,903,2.068807,1.393532e-02,0.997236
149486,tgtgccagcagtttatcgcgggagagctacgagcagtacttc,TRBV27*01,CASSLSRESYEQYF,TRBJ2-7*01,139,[139+NAIVE],True,56,2,1576,418,14.561905,0.000065,2.892662,41,17,1091,903,2.068807,1.393532e-02,0.997236
148202,tgtgccagcagtttatccagggagtcctacgagcagtacttc,TRBV27*01,CASSLSRESYEQYF,TRBJ2-7*01,139,"[139+NAIVE, 139+CM]",True,56,2,1576,418,14.561905,0.000065,2.892662,41,17,1091,903,2.068807,1.393532e-02,0.997236
154523,tgtgccagcagtttgagcagggaatcctacgagcagtacttc,TRBV27*01,CASSLSRESYEQYF,TRBJ2-7*01,139,[139+CM],True,56,2,1576,418,14.561905,0.000065,2.892662,41,17,1091,903,2.068807,1.393532e-02,0.997236
156161,tgtgccagcagtttgtccagggaaagctacgagcagtacttc,TRBV27*01,CASSLSRESYEQYF,TRBJ2-7*01,139,[139+CM],True,56,2,1576,418,14.561905,0.000065,2.892662,41,17,1091,903,2.068807,1.393532e-02,0.997236
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
91614,tgcagtgctagagatacgagaagctacggctacaccttc,TRBV20-1*01,CSARDTRSYGYTF,TRBJ1-2*01,140,"[140+EM, 140+NC]",True,36,3,1524,489,5.607354,0.005517,1.945006,36,3,1007,1006,17.482604,9.089799e-09,3.583529
91616,tgcagtgctagagatacgaggagctatggctacaccttc,TRBV20-1*01,CSARDTRSYGYTF,TRBJ1-2*01,140,[140+EM],True,36,3,1524,489,5.607354,0.005517,1.945006,36,3,1007,1006,17.482604,9.089799e-09,3.583529
97944,tgcagtgctagggacaccagaagctatggctacaccttc,TRBV20-1*01,CSARDTRSYGYTF,TRBJ1-2*01,140,"[140+EM, 140+NC]",True,36,3,1524,489,5.607354,0.005517,1.945006,36,3,1007,1006,17.482604,9.089799e-09,3.583529
91625,tgcagtgctagagatacgcgctcctatggctacaccttc,TRBV20-1*01,CSARDTRSYGYTF,TRBJ1-2*01,117,[117+ACT],True,36,3,1524,489,5.607354,0.005517,1.945006,36,3,1007,1006,17.482604,9.089799e-09,3.583529
