# Pygor Tutorial

Welcome to the pygor3 Tutorial.

Pygor3 is an open source project and Python package that allows to analyze infer, evaluate and generate V(D)J sequences, by using IGoR's.

Pygor3 could help you to get simple calculations and visualizations of the statistics in VDJ recombination

## Introduction

![alt text](IGoR_diagram.png "Title")

An IGoR model's encapsulates the Bayesian network probabilistic parameters of a V(D)J recombination process. 
IGoR is shipped with a set of default models.

As an example lets load the recombination model for a human $\beta$ T-cell receptor

## Loading default IgorModel

In [None]:
import pygor3 as p3
mdl_hb = p3.get_default_IgorModel("human", "tcr_beta")

In [None]:
mdl_hb

### Conditional probabilites

In [None]:
mdl_hb.export_plot_Pconditionals('hb_CP')

In [None]:
P_J_g_V = mdl_hb['j_choice']
P_J_g_V

In [None]:
P_J_g_V[{'j_choice': 3, 'v_choice': 7}]

In [None]:
mdl_hb.plot_Event('j_choice')

### Marginal probabilities
Examples of marginals probabilities

$ P(J) = \sum_V P(J|V) P(V) $

$ P(D) = \sum_V P(D|V, J) P(V,J) $

In [None]:
mdl_hb.export_plot_Pmarginals('hb_MP')

In [None]:
mdl_hb.Pmarginal['j_choice']

In [None]:
mdl_hb.plot_Event_Marginal('j_choice')

### Joint Probabilities

In [None]:
P_V_J = mdl_hb.get_P_joint(['v_choice', 'j_choice'])
P_V_J

In [None]:
P_V_J.plot(cmap='gnuplot2_r')

### Entropy
$H = -P(\vec{E})\sum_{\vec{E}} \log_2 P(\vec{E})$

In [None]:
mdl_hb.plot_recombination_entropy()

In [None]:
mdl_hb.get_df_entropy_decomposition()

In [None]:
da_mi = mdl_hb.get_mutual_information()
mdl_hb.plot_mutual_information(da_mi)

## Evaluate Sequences

In [None]:
str_seq = "TTGAAATGTGAACAACATCTGGGTCATAACGCTATGTATTGGTACAAGCAAAGTGCTAAGAAGCCACTGGAGCTCATGTTTGTCTACAGTCTTGAAGAACGGGTTGAAAACAACAGTGTGCCAAGTCGCTTCTCACCTGAATGCCCCAACAGCTCTCACTTATCCCTTCACCTACACACCCTGCAGCCAGAAGACTCGGCCCTGTATCTCTGCGCCAGCAGCCGTAGGCGAGCGCGGCGGGGAGCTGTTTTTTGGAGAAGGCTCTAGGCTGACCGTACTGG"
str_seq

In [None]:
df_scens = p3.evaluate(str_seq, mdl_hb, N_scenarios=10, igor_wd='tmp', batch_clean=False)

The column scenario_proba_cond_seq gives us

$ P(\text{scenario}|\sigma) $

and the event columns provide us the selected parameter for the scenarios

$ P(\text{scenario}) = P(\text{v_choice}_{id}) \times P(\text{j_choice}_\text{id}|\text{v_choice}_{id}) \times P(\text{d_gene}_\text{id} | \text{j_choice}_\text{id}, \text{v_choice}_\text{id}) ... $

In [None]:
df_scens

### Visualize a scenario

In [None]:
ps_scenario = df_scens.iloc[0]
ps_scenario
mdl_hb.plot_scenario(ps_scenario)

In [None]:
# ps_scenario = df_scens.iloc[0]
for index, ps_scenario in df_scens.iterrows():
    mdl_hb.plot_scenario(ps_scenario)

In [None]:
mdl_hb.genomic_dataframe_dict['J']

## Generating random sequences from model

In [None]:
df_gen_seqs = p3.generate(Nseqs=10, mdl=mdl_hb)
df_gen_seqs

In [None]:
df_gen_seqs = p3.generate(Nseqs=10, mdl=mdl_hb, return_scenarios=True)
df_gen_seqs

In [None]:
df_gen_seqs['nt_sequence']

## Inferring a new model

In [None]:
# FIXME: For this example we use some Emerson's data, add the reference

In [None]:
import pandas as pd
df_input = pd.read_csv('HIP00110.tsv.gz', sep='\t')
df_input

### Get Genomic Germline templates from IMGT

In [None]:
imgt_species_list = p3.imgt.get_species_list()
print( imgt_species_list )

In [None]:
imgt_species = 'Homo+sapiens'
imgt_chain = 'TRB'
hb_genomic_dict = p3.imgt.download_ref_genome(imgt_species, imgt_chain, dropna=True)

In [None]:
hb_genomic_dict

### Create a new model

In [None]:
hb_mdl_0 = p3.IgorModel.make_default_from_Dataframe_dict(hb_genomic_dict)

In [None]:
df_input_test = df_input['nucleotide'].loc[:500]
df_input_test

In [None]:
df_functionality, df_CDR3 = p3.naive_align(df_input_test, hb_mdl_0)

In [None]:
df_functionality

In [None]:
df_CDR3

In [None]:
df_input_test_no_productive = df_input_test.loc[~df_functionality['functionality'] ]
df_input_test_no_productive

In [None]:
hb_mdl_new, df_likelihoods = p3.infer(df_input_test_no_productive, hb_mdl_0, N_iter=10, return_likelihoods=True)

In [None]:
df_likelihoods

In [None]:
df_likelihoods['mean_log_Likelihood'].plot()

In [None]:
hb_mdl_new.export_plot_Pconditionals('CP_hb_new')