# **kallisto | bustools** Report
This Jupyter notebook was auto-generated by `kb`.

### Install packages
Uncomment and run this cell to install packages used in this notebook.

In [None]:
{{ packages }}

### Import packages

In [None]:
import json

import numpy as np
import plotly.graph_objects as go
import plotly.offline as py
import scanpy as sc
from sklearn.decomposition import PCA

from kb_python.report import (
    dict_to_table,
    elbow_plot,
    genes_detected_plot,
    knee_plot,
    pca_plot
)
from kb_python.utils import import_matrix_as_anndata

py.init_notebook_mode(connected=True)

def load_json(path):
    with open(path, 'r') as f:
        return json.load(f)

## Basic run statistics

This section contains basic run statistics collected from `kb`, kallisto, and the output BUS file.

### `kb` run info
Overall run statistics

In [None]:
stats = load_json('{{ stats_path }}')
py.iplot(dict_to_table(stats))

### kallisto run info
Read from kallisto log (`run_info.json`)

In [None]:
kallisto_info = load_json('{{ info_path }}')
py.iplot(dict_to_table(kallisto_info))

### Bus file info
From `bustools inspect` command (`inspect.json`)

In [None]:
inspect = load_json('{{ inspect_path }}')
py.iplot(dict_to_table(inspect))

## Count matrix statistics
This section contains information on the count matrix.

### Load and process the matrix

In [None]:
adata = import_matrix_as_anndata('{{ matrix_path }}', '{{ barcodes_path }}', '{{ genes_path }}', t2g_path='{{ t2g_path }}')

# Filter barcodes and UMIs with 0 counts
sc.pp.filter_cells(adata, min_genes=1e-3)
sc.pp.filter_cells(adata, min_counts=1e-3)
n_counts = adata.obs['n_counts']
n_genes = adata.obs['n_genes']

# Run PCA
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)
pca = PCA(n_components=10)
pc = pca.fit_transform(adata.X.todense())

### Gene matrix info
Statistics calculated from gene matrix

In [None]:
py.iplot(dict_to_table({
    'Median UMIs per gene': np.median(n_counts),
    'Mean UMIs per gene': np.mean(n_counts),
    'Median genes per cell': np.median(n_genes),
    'Mean genes per cell': np.mean(n_genes),
}))

## Plots

### Knee plot
For a given UMI count (x-axis), the number of cells that contain at least that many UMI counts (y-axis).

In [None]:
py.iplot(knee_plot(n_counts))

### Genes detected
Number of genes detected as a function of distinct UMI counts per cell.

In [None]:
py.iplot(genes_detected_plot(n_counts, n_genes))

### Elbow plot
Ratio of variance in data explained by first ten principal components.

In [None]:
py.iplot(elbow_plot(pca.explained_variance_ratio_))

### Principal component analysis
First two principal components.

In [None]:
py.iplot(pca_plot(pc))