# Pygor

[Pygor3](https://github.com/statbiophys/pygor3) is a python3 framework to analyze, vizualize, generate and infer V(D)J recombination models using a modified(forked) version of [IGoR](https://github.com/statbiophys/IGoR)

Pygor3 is part of [Statistical Biophysics Consortium @ ENS](https://github.com/statbiophys)

For further details checkout the [documentation](https://pygor3.readthedocs.io/en/latest/) and for [issues](https://github.com/statbiophys/pygor3/issues)

# Run IGoR using pygor3

Consider the following human TRB DNA sequence:

In [None]:
str_seq = "TTGAAATGTGAACAACATCTGGGTCATAACGCTATGTATTGGTACAAGCAAAGTGCTAAGAAGCCACTGGAGCTCATGTTTGTCTACAGTCTTGAAGAACGGGTTGAAAACAACAGTGTGCCAAGTCGCTTCTCACCTGAATGCCCCAACAGCTCTCACTTATCCCTTCACCTACACACCCTGCAGCCAGAAGACTCGGCCCTGTATCTCTGCGCCAGCAGCCGTAGGCGAGCGCGGCGGGGAGCTGTTTTTTGGAGAAGGCTCTAGGCTGACCGTACTGG"
str_seq

To determine which germline was chosen to produce the recombined sequence we can use IGoR.

First import pygor3 package:

In [None]:
import pygor3 as p3

IGoR is shipped with default models and pygor3 use them as default but can be change with pygor's configuration file

In [None]:
species="human"
chain="tcr_beta"
mdl = p3.IgorModel.load_default(species, chain)

After load an IgorModel an object (mdl) encapsulates all the necesary information of a recombination Model

In [None]:
mdl

In [None]:
mdl.genomic_dataframe_dict['J']

This genomic data will be used to align and assembly the sequence.

## Evaluate Sequences

To determine the scenarios we use IGoR evaluation process

In [None]:
str_seq = "TTGAAATGTGAACAACATCTGGGTCATAACGCTATGTATTGGTACAAGCAAAGTGCTAAGAAGCCACTGGAGCTCATGTTTGTCTACAGTCTTGAAGAACGGGTTGAAAACAACAGTGTGCCAAGTCGCTTCTCACCTGAATGCCCCAACAGCTCTCACTTATCCCTTCACCTACACACCCTGCAGCCAGAAGACTCGGCCCTGTATCTCTGCGCCAGCAGCCGTAGGCGAGCGCGGCGGGGAGCTGTTTTTTGGAGAAGGCTCTAGGCTGACCGTACTGG"
str_seq

In [None]:
help(p3.evaluate)

In [None]:
df_scens = p3.evaluate(str_seq, mdl, N_scenarios=1)

In [None]:
df_scens

In [None]:
mdl.parms['v_choice']

### Visualize a scenario

In [None]:
ps_scenario = df_scens.iloc[0]
ps_scenario
mdl.plot_scenario(ps_scenario)

In [None]:
mdl.plot_scenario(ps_scenario, nt_lim=(50, 80))

In [None]:
str_seq_small = str_seq[150:]
df_scens, df_V_offsets = p3.evaluate(str_seq_small, mdl, N_scenarios=10, b_V_offset=True)

In [None]:
df_scens

In [None]:
df_V_offsets

In [None]:
str_seq_small = str_seq[180:]
str_seq_small

In [None]:
df_scens, df_V_offsets = p3.evaluate(str_seq_small, mdl, N_scenarios=10, b_V_offset=True)

In [None]:
df_scens

In [None]:
df_V_offsets

In [None]:
ps_scenario = df_scens.iloc[0]
ps_scenario
seq_algn = (0, str_seq_small, -int(df_V_offsets.loc[0, ps_scenario['v_choice']]))
seq_algn
mdl.plot_scenario(ps_scenario, seq_aligned=seq_algn)

In [None]:
mdl.plot_scenario(ps_scenario, nt_lim=(220, 280), seq_aligned=seq_algn)

In [None]:
ps_scenario = df_scens.iloc[3]
seq_algn = (0, str_seq_small, -int(df_V_offsets.loc[0, ps_scenario['v_choice']]))
mdl.plot_scenario(ps_scenario, seq_aligned=seq_algn)

In [None]:
mdl.plot_scenario(ps_scenario, nt_lim=(220, 280), seq_aligned=seq_algn)

In [None]:
p3.evaluate_pgen(str_seq, mdl)

## Generating random sequences from model

In [None]:
df_gen_seqs = p3.generate(Nseqs=10, mdl=mdl)
df_gen_seqs

In [None]:
help(p3.generate)

In [None]:
df_gen_seqs = p3.generate(Nseqs=10, mdl=mdl, return_scenarios=True)
df_gen_seqs