# Quality control analysis of the 10X FASTQ files
This Python Jupyter notebook performs quality control analysis on the FASTQ files created by [cellranger mkfastq](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/using/mkfastq).

## Parameters for notebook
First, set the parameters for the notebook.
That should be done in the next cell, which is tagged as a `parameters` cell to enable [papermill parameterization](https://papermill.readthedocs.io/en/latest/usage-parameterize.html):

In [None]:
# parameters cell; in order for notebook to run this cell must define:
#  - illumina_runs_10x: list of names of Illumina 10X runs
#  - input_qc_stats: list of `cellranger mkfastq` QC stats for Illumina 10X runs

## Import Python modules
We use [plotnine](https://plotnine.readthedocs.io/) for ggplot2-style plotting:

In [None]:
import mizani
from IPython.display import display, HTML
import pandas as pd
from plotnine import *

Set [plotnine theme](https://plotnine.readthedocs.io/en/stable/api.html#themes):

In [None]:
_ = theme_set(theme_classic)

## Read and aggregate stats
Read the QC stats for each Illumina run:

In [None]:
print('Reading 10X FASTQ QC stats from:\n\t' +
              '\n\t'.join(input_qc_stats))
stats = pd.concat([(pd.read_csv(statfile,
                                names=['statistic', 'value'])
                    .assign(run10x=run10x)
                    )
                   for statfile, run10x in zip(input_qc_stats,
                                               illumina_runs_10x)
                   ])

display(HTML(
    stats
    .pivot_table(index='statistic', values='value', columns='run10x')
    .to_html()
    ))

## Plot the QC stats

In [None]:
p = (ggplot(stats, aes('run10x', 'value')) +
     geom_point(size=2) +
     facet_wrap('~ statistic', ncol=4, scales='free_y') +
     theme(axis_text_x=element_text(angle=90),
           figure_size=(12, 4), panel_spacing_x=0.6) +
           expand_limits(y=(0, 1)) +
     scale_y_continuous(labels=mizani.formatters.custom_format('{:.2g}'))
     )
_ = p.draw()