# Basic usage

**Load some 10X data**

In [2]:
import tcrconvert
import pandas as pd
from importlib.resources import files

tcr_file = files('tcrconvert') / 'examples/example_10x.csv'

tcrs = pd.read_csv(tcr_file)[['barcode', 'v_gene' , 'd_gene', 'j_gene', 'c_gene', 'cdr3']]
tcrs

Unnamed: 0,barcode,v_gene,d_gene,j_gene,c_gene,cdr3
0,AAACCTGAGACCACGA-1,TRAV1-2,TRBD1,TRAJ12,TRAC,CAVMDSSYKLIF
1,AAACCTGAGACCACGA-1,TRBV6-1,TRBD2,TRBJ2-1,TRBC2,CASSGLAGGYNEQFF
2,AAACCTGAGGCTCTTA-1,TRBV6-4,TRBD2,TRBJ2-3,TRBC2,CASSGVAGGTDTQYF
3,AAACCTGAGGCTCTTA-1,TRAV1-2,TRBD1,TRAJ33,TRAC,CAVKDSNYQLIW
4,AAACCTGAGTGAACGC-1,TRBV2,TRBD1,TRBJ1-2,TRBC1,CASNQGLNYGYTF


**Convert gene names from the 10X format to the Adaptive format**

In [3]:
new_tcrs = tcrconvert.convert_gene(tcrs, frm='tenx', to='adaptive')
new_tcrs



Unnamed: 0,barcode,v_gene,d_gene,j_gene,c_gene,cdr3
0,AAACCTGAGACCACGA-1,TCRAV01-02*01,TCRBD01-01*01,TCRAJ12-01*01,,CAVMDSSYKLIF
1,AAACCTGAGACCACGA-1,TCRBV06-01*01,TCRBD02-01*01,TCRBJ02-01*01,,CASSGLAGGYNEQFF
2,AAACCTGAGGCTCTTA-1,TCRBV06-04*01,TCRBD02-01*01,TCRBJ02-03*01,,CASSGVAGGTDTQYF
3,AAACCTGAGGCTCTTA-1,TCRAV01-02*01,TCRBD01-01*01,TCRAJ33-01*01,,CAVKDSNYQLIW
4,AAACCTGAGTGAACGC-1,TCRBV02-01*01,TCRBD01-01*01,TCRBJ01-02*01,,CASNQGLNYGYTF


**Or convert to the IMGT format**

In [4]:
new_tcrs = tcrconvert.convert_gene(tcrs, frm='tenx', to='imgt')
new_tcrs



Unnamed: 0,barcode,v_gene,d_gene,j_gene,c_gene,cdr3
0,AAACCTGAGACCACGA-1,TRAV1-2*01,TRBD1*01,TRAJ12*01,TRAC*01,CAVMDSSYKLIF
1,AAACCTGAGACCACGA-1,TRBV6-1*01,TRBD2*01,TRBJ2-1*01,TRBC2*01,CASSGLAGGYNEQFF
2,AAACCTGAGGCTCTTA-1,TRBV6-4*01,TRBD2*01,TRBJ2-3*01,TRBC2*01,CASSGVAGGTDTQYF
3,AAACCTGAGGCTCTTA-1,TRAV1-2*01,TRBD1*01,TRAJ33*01,TRAC*01,CAVKDSNYQLIW
4,AAACCTGAGTGAACGC-1,TRBV2*01,TRBD1*01,TRBJ1-2*01,TRBC1*01,CASNQGLNYGYTF


**Convert from IMGT back to 10X and see that no data is lost**

In [5]:
back_tcrs = tcrconvert.convert_gene(new_tcrs, frm='imgt', to='tenx')

back_tcrs.equals(tcrs)

No column names provided for IMGT data, will assume 10X column names:
('v_gene', 'd_gene', 'j_gene', 'c_gene')


True

# Usage: Custom column names

TCRconvert assumes the column names for V, D, J, and C genes based on the input format.

`'tenx'`

* `v_gene`
* `d_gene`
* `j_gene`
* `c_gene`

Note that Adaptive does not capture C genes:

`'adaptive'`

* `v_resolved`
* `d_resolved`
* `j_resolved`

`'adaptivev2'`

* `vMaxResolved`
* `dMaxResolved`
* `jMaxResolved`

IMGT does not use standard column names, so we assume 10X names:

`'imgt'`

* `v_gene`
* `d_gene`
* `j_gene`
* `c_gene`

> *What if my Adaptive data doesn't have x_resolved or xMaxResolved columns?*
> 
> Simply make them yourself by combining the gene and allele columns using `*` as a seperator. Then proceed with TCRconvert.

**Sometimes you have your own column names**

In [10]:
custom_file = tcr_file = files('tcrconvert') / 'examples/example_columns.csv'

custom = pd.read_csv(custom_file)
custom

Unnamed: 0,myVgene,myDgene,myJgene,myCgene,myCDR3,antigen
0,TRAV1-2,TRBD1,TRAJ12,TRAC,CAVMDSSYKLIF,Flu
1,TRBV6-1,TRBD2,TRBJ2-1,TRBC2,CASSGLAGGYNEQFF,Flu
2,TRBV6-4,TRBD2,TRBJ2-3,TRBC2,CASSGVAGGTDTQYF,CMV
3,TRAV1-2,TRBD1,TRAJ33,TRAC,CAVKDSNYQLIW,CMV
4,TRBV2,TRBD1,TRBJ1-2,TRBC1,CASNQGLNYGYTF,CMV


**Specify the V, D, J, and C column names with** `frm_cols` 

In [14]:
custom_new = tcrconvert.convert_gene(custom, frm='tenx', to='imgt',
                                     frm_cols=['myVgene', 'myDgene', 'myJgene', 'myCgene'])
custom_new

 ['myVgene', 'myDgene', 'myJgene', 'myCgene']


Unnamed: 0,myVgene,myDgene,myJgene,myCgene,myCDR3,antigen
0,TRAV1-2*01,TRBD1*01,TRAJ12*01,TRAC*01,CAVMDSSYKLIF,Flu
1,TRBV6-1*01,TRBD2*01,TRBJ2-1*01,TRBC2*01,CASSGLAGGYNEQFF,Flu
2,TRBV6-4*01,TRBD2*01,TRBJ2-3*01,TRBC2*01,CASSGVAGGTDTQYF,CMV
3,TRAV1-2*01,TRBD1*01,TRAJ33*01,TRAC*01,CAVKDSNYQLIW,CMV
4,TRBV2*01,TRBD1*01,TRBJ1-2*01,TRBC1*01,CASNQGLNYGYTF,CMV


# Usage: Rhesus or mouse

**Specify that the species is rhesus macaque or mouse if needed**

In [None]:
new_tcrs = tcrconvert.convert_gene(tcrs, frm='tenx', to='adaptive', 
                                   species='rhesus')  # or 'mouse'

 {'TRBV2'}
