# Pangenome Construction with Roary

## Introduction

When analysing prokaryotic genomes it is important to take into account the differences in gene content resulting from horizontal gene transfer, gene duplication and gene loss. Such differences can play a role in increased virulence or antimicrobial drug resistance. Aligning sequence data for an isolate to a single reference genome can fail to incorporate non-reference sequences and fail to account for these types of differences. 

One approach to ovecome this limitation is to perform a pangenome analysis of your isolates. The pangenome is the set of all genes that have been found in a set of isolates. Within the pangenome, genes are often then described as being part of the ‘core’ genome, the set of genes present in all isolates, or the non-core (‘accessory’) genome. Gaining a better picture of the conserved genes of a species, and the accessory genome, can lead to a better understanding of key processes such as selection and evolution. 

There are several tools avaiable for pangenome analysis, including Panaroo and Roary. In this tutorial we will demonstarte how Roary can be used to perform a pangenome analysis. For more in depth information about Roary, please feel free to have a look the Roary paper:

> **Roary: Rapid large-scale prokaryote pan genome analysis**  
> Andrew J. Page, Carla A. Cummins, Martin Hunt, Vanessa K. Wong, Sandra Reuter, Matthew T. G. Holden, Maria Fookes, Daniel Falush, Jacqueline A. Keane, Julian Parkhill  
> _Bioinformatics, 2015;31(22):3691-3693 doi:[10.1093/bioinformatics/btv421](http://bioinformatics.oxfordjournals.org/content/31/22/3691)_

A copy of the paper can be found at 

`~/course_data/pangenome/roary_paper.pdf`

Or visit the [Roary manual at http://sanger-pathogens.github.io/Roary/](http://sanger-pathogens.github.io/Roary/).

## Learning outcomes
By the end of this tutorial you can expect to be able to:

* Describe what a pangenome is
* Use Roary to construct a pangenome 
* Understand the different output files produced by Roary
* Draw a basic tree from the core gene alignment produced by Roary
* Query the pangenome results produced by Roary
* Use Phandango to visualise the results produced by Roary
* Generate an annotated genome assembly

## Tutorial sections
This tutorial comprises the following sections:   
 1. [The pangenome explained](pan_genome.ipynb)   
 2. [Preparing the input data for Roary](prepare_data.ipynb)   
 3. [Performing QC on your data](qc.ipynb)   
 4. [Constructing a pangenome with Roary](run_roary.ipynb)   
 5. [Exploring the results](results.ipynb)   
 6. [Visualising the results with Phandango](phandango.ipynb)   
 7. [Creating genome assemblies](assembly.ipynb)   

## Authors and License
This tutorial was created by [Jacqui Keane](https://github.com/ssjunnebo) and [Sara Sjunnebo](https://github.com/ssjunnebo).

The content is licensed under a [Creative Commons Attribution 4.0 International License (CC-By 4.0)](https://creativecommons.org/licenses/by/4.0/).

## Running the commands from this tutorial
You can follow this tutorial by running all the commands you see in a terminal window on your computer. Remember, the terminal window is similar to the "Command Prompt" window on MS Windows systems, which allows the user to type DOS commands to manage files.

To get started, open a terminal window and type the command below folowed by the `Enter` key:

In [None]:
cd ~/course_data/pangenome/data

## Prerequisites
This tutorial assumes that you have the following software and their dependencies installed on your computer. The software used in this tutorial may be updated from time to time so, we have also given you the version which was used when writing this tutorial.


| Package name  | Link for download/installation instructions                          | Version |
| :----------:  | :------------------------------------------------------------------: |:------: |
| Prokka        | https://github.com/tseemann/prokka                                   | 1.14.6  |
| Roary         | https://github.com/sanger-pathogens/roary                            | 3.13.0  |
| Fastree       | http://www.microbesonline.org/fasttree/                              | 2.1.11  |
| Spades        | https://github.com/ablab/spades                                      | 3.15.5  |
| assembly-stats| https://github.com/sanger-pathogens/assembly-stats                   | 1.0.1   |

The easiest way to install the required software is using `conda`, a software package manager. These software have already been installed on the computer for you. To activate them run:

In [None]:
conda activate pangenome

After the software is activated run the following commands:

In [None]:
prokka --help
roary --help
fasttree -h
spades.py -h
assembly-stats

This should return the help messages for all the software tools you will use in this tutorial.

To get started with the tutorial, go to the first section: [The pangenome explained](pan_genome.ipynb)  