# PitViper Notebook Report

This notebook was generated automatically by PitViper.

It can be used in two ways:

1. By using the functions already created and present in the following cells.

2. By creating new cells and writing python3 code in them.

Graphs are generated using the python library [Altair](https://altair-viz.github.io/index.html). It is possible to download each graph in SVG format from the drop-down menu at the top right of each graph.

The next two cells call the functions necessary to visualize the results, you should not modify them.

In [2]:
# Load necessary libraries
import sys
import os

# Load PitViper functions
modules_path = ['workflow/notebooks/', "../../../workflow/notebooks/"]
for module in modules_path:
    module_path = os.path.abspath(os.path.join(module))
    if module_path not in sys.path:
        sys.path.append(module_path)

from functions_pitviper_report import * 

# Change working directory
working_directory_update(snakemake.output[0])

# Initialize token
token = snakemake.params

HTML('''<script> code_show=true;  function code_toggle() {  if (code_show){  $('div.input').hide();  } else {  $('div.input').show();  }  code_show = !code_show }  $( document ).ready(code_toggle); </script> <form action="javascript:code_toggle()"><input type="submit" value="Toggle Code"></form>''')

## Import results

Next function scan `results/` directory to retrieve all results generated by PitViper.

`tools_available` is python dictionnary in which all data are stored in a comprehensive manner:


> tools_available[`tool`][`comparison`][`file`] = pandas dataframe

Example:

> tools_available["MAGeCK_MLE"]["D25_vs_D4"]["D25_vs_D4.genesummary.txt"] return a pandas dataframe

In [3]:
results_directory, tools_available = import_results(token)

## Mapping Quality Control

If available, mapping quality control metrics will be shown by `show_mapping_qc` function.

In [4]:
show_mapping_qc(token)

## Read count distribution

Normalized read count distribution for all replicates will be shown by calling `show_read_count_distribution` function.

In [5]:
alt.data_transformers.disable_max_rows()

show_read_count_distribution(token)

## Principal component analysis

PCA projection of normalized read counts from all replicates is shown using `pca_counts`.

In [6]:
pca_counts(token)

## Global results

The `snake_plot` function allow to easilly browse results for each tool.

- MAGeCk MLE: 

> The **beta score** describes how the gene is selected: a positive beta score indicates a positive selection, and a negative beta score indicates a negative selection. [source](https://www.bioconductor.org/packages/release/bioc/vignettes/MAGeCKFlute/inst/doc/MAGeCKFlute.html)

- MAGeCK RRA:

> lfc:  **Gene log fold changes** (LFC) from sgRNA LFCs. Median by default. [source](https://sourceforge.net/p/mageck/wiki/Home/)

- BAGEL:

> BF: evaluates the **likelihood** that the observed fold changes for gRNA targeting the gene were drawn from either the essential or the nonessential training distributions. [source](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1015-8)

- CRISPhieRmix:

> locfdr: a mixture deconvolution approach to estimate **local false discovery rates**. [source](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1538-6)

- GSEA-like:

> NES: **normalized enrichment score** (NES) is the primary statistic for examining gene set enrichment results. By normalizing the enrichment score, GSEA accounts for differences in gene set size and in correlations between gene sets and the expression dataset; therefore, the normalized enrichment scores (NES) can be used to compare analysis results across gene sets. In this context, genesets are replaced by lists of sgRNAs targeting the same element. [source](https://www.gsea-msigdb.org/gsea/doc/GSEAUserGuideFrame.html)

- In-house method:

> score: TODO

In [7]:
tool_results(results_directory, tools_available)

## sgRNA read counts by element

`show_sgRNA_counts` function allow to visualize a row-normalized heatmap of read counts by guide.

Replicates can be discarded or rearranged by dragging and dropping from right to left.

Once heatmap is shown, click on col `INI` to reorder columns.

In [8]:
show_sgRNA_counts(token)

In [9]:
show_sgRNA_counts_lines(token)

## Results by tool and by element

In [1]:
# tool_results_by_element(results_directory, tools_available)
tool_results(results_directory, tools_available)

NameError: name 'tool_results' is not defined

## EnrichR

In [16]:
enrichr_plots(token, tools_available)

## GeneMania

In [17]:
genemania_link_results(token, tools_available)

## Data exploration charts

In [18]:
multiple_tools_results(tools_available, token)

In [19]:
call_form(tools_available)