# Analysis pipeline results

***

## Introduction

Once your sample data is in the Pathogen Informatics databases, it becomes available to the automated analysis pipelines. Pathogen Informatics maintain the following automated analysis pipelines:

  * [Quality control (QC)](http://mediawiki.internal.sanger.ac.uk/index.php/Pathogen_Sequencing_Informatics#QC_Pipeline)
  * [Mapping](http://mediawiki.internal.sanger.ac.uk/index.php/Pathogen_Informatics_Mapping_Pipeline)
  * [SNP calling](http://mediawiki.internal.sanger.ac.uk/index.php/Pathogen_Informatics_SNP_Calling_Pipeline)
  * [Bacterial](http://mediawiki.internal.sanger.ac.uk/index.php/Pathogen_Informatics_Bacterial_Assembly_Pipeline), [Eukaryote](http://mediawiki.internal.sanger.ac.uk/index.php/Pathogen_Informatics_Eukaryote_Assembly_Pipeline) and [Pacbio](http://mediawiki.internal.sanger.ac.uk/index.php/Pathogen_Informatics_Automated_PacBio_Assembly_Pipeline) assembly
  * [Annotation](http://mediawiki.internal.sanger.ac.uk/index.php/Pathogen_Informatics_Automated_Annotation_Pipeline)
  * [RNA-Seq expression](http://mediawiki.internal.sanger.ac.uk/index.php/Pathogen_Informatics_RNA-Seq_Expression_Pipeline)

Once the pipelines have been requested and run, you can use the `pf` scripts to return the results of each of the automated analysis pipelines.

| Command | Description |
| ---     | ---         |
| **pf qc**         | used to find the Kraken results for a given study, sample or lane | 
| **pf map**        | used to find the location of BAM files produced by the mapping pipeline |           
| **pf snp**        | used to find the location of VCF files produced by the SNP calling pipeline |
| **pf assembly**   | used to find the location of the contig FASTA files produced by the assembly pipeline | 
| **pf annotation** | used to find the location of the GFF files produced by the annotation pipeline | 
| **pf rnaseq**     | used to find the location of expression counts produced by the RNA-Seq analysis pipeline |

In this section of the tutorial we will cover:

  * using `pf qc` to get quality control (QC) pipeline results
  * using `pf map` to get mapping pipeline results
  * using `pf snp` to get SNP calling pipeline results
  * using `pf assembly` to get assembly pipeline results
  * using `pf annotation` to get annotation pipeline results
  * using `pf rnaseq` to get RNA-Seq expression pipeline results

***

## Exercise 5

**First, let's tell the system the location of our tutorial configuration file.**

In [None]:
export PF_CONFIG_FILE=$PWD/data/pathfind.conf

###  Getting quality control (QC) pipeline results

First up, we're going to look at how to get the output from the QC pipeline. The QC pipeline generates a series of QC statistics about your data and runs [Kraken](https://ccb.jhu.edu/software/kraken/) which assigns each read to a taxon and will broadly tell you what's been sequenced.

To get the QC results, we will be using `pf qc` which returns the location of the Kraken report for a given study, sample or lane.

**First, let's take a look at the `pf qc` usage.**

In [None]:
pf qc -h

**Let's get the QC results for lane 5477_6#1.**

In [None]:
pf qc -t lane -i 5477_6#1

This returned the location of the Kraken report on disk.

We can get a summary of the Kraken report using the `--summary` or `-s` option. This will generate a new file called "qc_summary.csv" which contains the taxon level Kraken results.

**Let's get our taxon level QC summary for lane 5477_6#1.**

In [None]:
pf qc -t lane -i 5477_6#1 -s

**Let's take a look at "qc_summary.csv".**

In [None]:
head qc_summary.csv

Here you can see the taxon level Kraken results i.e 1.08% of the reads were assigned to the _Streptococcus pneumoniae_ strain Hungary19A-6.

We can look at the results for different taxonomic levels using the `--level` or `-L` option.

**Let's try looking at the species level QC results for lane 5477_6#1.**

In [None]:
pf qc -t lane -i 5477_6#1 -L S -s qc_species_summary.csv

In [None]:
head qc_species_summary.csv

Here we can see that 87.58% of the reads were classified as _Streptococcus pneumoniae_. This is promising as the sample is from _Streptococcus pneumoniae_.

### Getting mapping pipeline results




In [None]:
pf map -t lane -i 5477_6#1 --details

***

## Questions

***

## What's next?

For a quick recap of how to get metadata and accessions, head back to [analysis pipeline status](pipeline-status.ipynb).

Otherwise, let's move on to how to get your [analysis pipeline results](pipeline-results.ipynb).