# Usage

## Basic usage

TCRconvert takes a Pandas DataFrame with at least one column of gene names as input. It produces a Pandas DataFrame with converted gene names as output.

**Load some 10X data.**

In [20]:
# Use importlib for this example notebook. Not necessary for normal usage.
from importlib.resources import files

In [19]:
import tcrconvert
import pandas as pd

tcr_file = files('tcrconvert') / 'data/examples/example_10x.csv'

tcrs = pd.read_csv(tcr_file)[['barcode', 'v_gene' , 'd_gene', 'j_gene', 'c_gene', 'cdr3']]
tcrs

Unnamed: 0,barcode,v_gene,d_gene,j_gene,c_gene,cdr3
0,AAACCTGAGACCACGA-1,TRAV1-2,TRBD1,TRAJ12,TRAC,CAVMDSSYKLIF
1,AAACCTGAGACCACGA-1,TRBV6-1,TRBD2,TRBJ2-1,TRBC2,CASSGLAGGYNEQFF
2,AAACCTGAGGCTCTTA-1,TRBV6-4,TRBD2,TRBJ2-3,TRBC2,CASSGVAGGTDTQYF
3,AAACCTGAGGCTCTTA-1,TRAV1-2,TRBD1,TRAJ33,TRAC,CAVKDSNYQLIW
4,AAACCTGAGTGAACGC-1,TRBV2,TRBD1,TRBJ1-2,TRBC1,CASNQGLNYGYTF


**Convert gene names from 10X to Adaptive...**

In [4]:
new_tcrs = tcrconvert.convert_gene(tcrs, frm='tenx', to='adaptive')
new_tcrs



Unnamed: 0,barcode,v_gene,d_gene,j_gene,c_gene,cdr3
0,AAACCTGAGACCACGA-1,TCRAV01-02*01,TCRBD01-01*01,TCRAJ12-01*01,,CAVMDSSYKLIF
1,AAACCTGAGACCACGA-1,TCRBV06-01*01,TCRBD02-01*01,TCRBJ02-01*01,,CASSGLAGGYNEQFF
2,AAACCTGAGGCTCTTA-1,TCRBV06-04*01,TCRBD02-01*01,TCRBJ02-03*01,,CASSGVAGGTDTQYF
3,AAACCTGAGGCTCTTA-1,TCRAV01-02*01,TCRBD01-01*01,TCRAJ33-01*01,,CAVKDSNYQLIW
4,AAACCTGAGTGAACGC-1,TCRBV02-01*01,TCRBD01-01*01,TCRBJ01-02*01,,CASNQGLNYGYTF


**...or to IMGT.**

In [5]:
new_tcrs = tcrconvert.convert_gene(tcrs, frm='tenx', to='imgt')
new_tcrs



Unnamed: 0,barcode,v_gene,d_gene,j_gene,c_gene,cdr3
0,AAACCTGAGACCACGA-1,TRAV1-2*01,TRBD1*01,TRAJ12*01,TRAC*01,CAVMDSSYKLIF
1,AAACCTGAGACCACGA-1,TRBV6-1*01,TRBD2*01,TRBJ2-1*01,TRBC2*01,CASSGLAGGYNEQFF
2,AAACCTGAGGCTCTTA-1,TRBV6-4*01,TRBD2*01,TRBJ2-3*01,TRBC2*01,CASSGVAGGTDTQYF
3,AAACCTGAGGCTCTTA-1,TRAV1-2*01,TRBD1*01,TRAJ33*01,TRAC*01,CAVKDSNYQLIW
4,AAACCTGAGTGAACGC-1,TRBV2*01,TRBD1*01,TRBJ1-2*01,TRBC1*01,CASNQGLNYGYTF


> Tip: Suppress warnings by setting `quiet=True`.

**Convert back to 10X to see that no data is lost.**

In [6]:
back_tcrs = tcrconvert.convert_gene(new_tcrs, frm='imgt', to='tenx')
back_tcrs.equals(tcrs)

True

## Custom column names

TCRconvert uses the gene column names below based on the `frm` parameter. Note that there are no standard IMGT column names and that Adaptive does not capture C genes.

* `frm='imgt'` : `['v_gene', 'd_gene', 'j_gene', 'c_gene']`
* `frm='tenx'` : `['v_gene', 'd_gene', 'j_gene', 'c_gene']`
* `frm='adaptive'` : `['v_resolved', 'd_resolved', 'j_resolved']`
* `frm='adaptivev2'` : `['vMaxResolved', 'dMaxResolved', 'jMaxResolved']`

At least one of the assumed columns needs to be in the input data. You can use your own columns with the `frm_cols` parameter.

If you're using AIRR-formatted files, use: `frm_cols=['v_call', 'd_call', 'j_call', 'c_call']`

> Tip: If your Adaptive data doesn't have `x_resolved` or `xMaxResolved` columns simply make them yourself by combining text from the gene and allele columns using `*` as a seperator.

**Load 10X data with custom column names:**

In [8]:
custom_file = tcr_file = files('tcrconvert') / 'data/examples/example_columns.csv'

custom = pd.read_csv(custom_file)
custom

Unnamed: 0,myVgene,myDgene,myJgene,myCgene,myCDR3,antigen
0,TRAV1-2,TRBD1,TRAJ12,TRAC,CAVMDSSYKLIF,Flu
1,TRBV6-1,TRBD2,TRBJ2-1,TRBC2,CASSGLAGGYNEQFF,Flu
2,TRBV6-4,TRBD2,TRBJ2-3,TRBC2,CASSGVAGGTDTQYF,CMV
3,TRAV1-2,TRBD1,TRAJ33,TRAC,CAVKDSNYQLIW,CMV
4,TRBV2,TRBD1,TRBJ1-2,TRBC1,CASNQGLNYGYTF,CMV


**Specify names using** `frm_cols` **and convert to IMGT**:

In [14]:
custom_new = tcrconvert.convert_gene(custom, frm='tenx', to='imgt', quiet=True,
                                     frm_cols=['myVgene', 'myDgene', 'myJgene', 'myCgene'])
custom_new

Unnamed: 0,myVgene,myDgene,myJgene,myCgene,myCDR3,antigen
0,TRAV1-2*01,TRBD1*01,TRAJ12*01,TRAC*01,CAVMDSSYKLIF,Flu
1,TRBV6-1*01,TRBD2*01,TRBJ2-1*01,TRBC2*01,CASSGLAGGYNEQFF,Flu
2,TRBV6-4*01,TRBD2*01,TRBJ2-3*01,TRBC2*01,CASSGVAGGTDTQYF,CMV
3,TRAV1-2*01,TRBD1*01,TRAJ33*01,TRAC*01,CAVKDSNYQLIW,CMV
4,TRBV2*01,TRBD1*01,TRBJ1-2*01,TRBC1*01,CASNQGLNYGYTF,CMV


## Rhesus or mouse data

**Specify the species if not human:**

In [15]:
new_tcrs = tcrconvert.convert_gene(tcrs, frm='tenx', to='imgt', quiet=True,
                                   species='rhesus')  # or 'mouse'

## Using a custom reference

**You may want to create a reference for a species that isn't already included**, such as rabbit. To do so, you'll need FASTA files that contain TCR gene names in the headers in this format:

```
>SomeText|TRBV10-1*02|MoreText|...
```

1. The simplest way to do this is to download reference FASTAs for every gene group from [IMGT](https://www.imgt.org/vquest/refseqh.html).

2. Save the FASTAs to a new species folder under `tcrconvert/data/` (e.g., `tcrconvert/data/rabbit/`).

3. Then, run `build_lookup_from_fastas()` from any location to create the lookup tables in that folder:

```python
import(tcrconvert)

tcrconvert.build_lookup_from_fastas('path/to/tcrconvert/tcrconvert/data/rabbit/')
```

4. Re-install TCRconvert:

```
$ pip install .
```

5. When using `convert_gene()`, specify the name of the new folder for ```species=```. E.g. ```species='rabbit'```

(**If you just want to add some gene(s) to the existing lookup tables...** you can edit the CSV files under `tcrconvert/data/` and then re-install tcrconvert. This is a bit hacky, and if you `git pull` your work may be overwritten or you'll have merge conflicts.)