# Usage (library)

## Basic usage

TCRconvert takes a Pandas DataFrame with at least one column of gene names as input. It produces a Pandas DataFrame with converted gene names as output.

**Load some 10X data.**

In [None]:
import tcrconvert
import pandas as pd

tcr_file = tcrconvert.get_example_path('tenx.csv')

tcrs = pd.read_csv(tcr_file)[
    ['barcode', 'v_gene', 'd_gene', 'j_gene', 'c_gene', 'cdr3']
]
tcrs

**Convert gene names from 10X to Adaptive...**

In [None]:
new_tcrs = tcrconvert.convert_gene(tcrs, frm='tenx', to='adaptive')
new_tcrs

**...or to IMGT.**

In [None]:
new_tcrs = tcrconvert.convert_gene(tcrs, frm='tenx', to='imgt')
new_tcrs

> Tip: Suppress INFO-level messages by setting `verbose=False`.

**Convert back to 10X to see that no data is lost.**

In [None]:
back_tcrs = tcrconvert.convert_gene(new_tcrs, frm='imgt', to='tenx')
back_tcrs.equals(tcrs)

## Custom column names

TCRconvert uses the gene column names below based on the `frm` parameter. Note that there are no standard IMGT column names and that Adaptive does not capture C genes.

* `frm='imgt'` uses `['v_gene', 'd_gene', 'j_gene', 'c_gene']`
* `frm='tenx'` uses `['v_gene', 'd_gene', 'j_gene', 'c_gene']`
* `frm='adaptive'` uses `['v_resolved', 'd_resolved', 'j_resolved']`
* `frm='adaptivev2'` uses `['vMaxResolved', 'dMaxResolved', 'jMaxResolved']`

At least one of the assumed columns needs to be in the input data. You can use your own columns with the `frm_cols` parameter.

If you're using AIRR-formatted files, use: `frm_cols=['v_call', 'd_call', 'j_call', 'c_call']`

> Tip: If your Adaptive data doesn't have `x_resolved` or `xMaxResolved` columns simply make them yourself by combining text from the gene and allele columns using `*` as a seperator.

**Load 10X data with custom column names:**

In [None]:
custom_file = tcrconvert.get_example_path('customcols.csv')

custom = pd.read_csv(custom_file)
custom

**Specify names using** `frm_cols` **and convert to IMGT**:

In [None]:
custom_new = tcrconvert.convert_gene(
    custom,
    frm='tenx',
    to='imgt',
    verbose=False,
    frm_cols=['myVgene', 'myDgene', 'myJgene', 'myCgene'],
)
custom_new

## Rhesus or mouse data

**Specify the species if not human:**

In [None]:
new_tcrs = tcrconvert.convert_gene(
    tcrs, frm='tenx', to='imgt', verbose=False, species='rhesus'
)  # or 'mouse'

## Using a custom reference

**You may want to create a reference for a species that isn't already included**, such as rabbit. To do so, you'll need FASTA files that contain TCR gene names in the headers in this format:

```
>SomeText|TRBV10-1*02|MoreText|...
```

1. The easiest way is to download the reference FASTAs for every gene group from [IMGT](https://www.imgt.org/vquest/refseqh.html) into a folder.

2. Build the lookup tables, specifying the species name you'll use when running TCRconvert:

```python
import(tcrconvert)
tcrconvert.build_lookup_from_fastas('path/to/fasta/dir/', 'rabbit')
```