# Constructing a Pangenome with Roary

At this stage you should have three GFF files generated by `Prokka`, each in its own directory. Provided your QC looked ok, you are now ready to run `Roary` to generate the pangenome. 

We are going to run `Roary` twice, first with the default settings, and then using `MAFFT` to generate a core gene alignment. For both of these runs we will want all the annotation files in the same directory, so lets take a copy of them to our current directory:

In [None]:
cp annotated_sample*/*.gff .

## Run Roary with default settings
To run `Roary` with the default settings all you need to do is run `roary *.gff` and it will create a pangenome using all GFF files in the current directory. Try the following command:

In [None]:
roary -f output_no_alignment *.gff

We want to run Roary twice with different settings, so in order to keep track of our output files from each run we will use the __-f__ option to specify the name of the ouptut directory where `Roary` should put the results. This will run for a few minutes.

We will have a closer look at the results in the next section, so for now let us just see that there are some output files in the directroy we asked `Roary` to create:

In [None]:
ls -l output_no_alignment

## Run Roary with MAFFT
But we want to generate a multi-FASTA alignment of the core genes so that we can draw a phylogenetic tree. So try: 

In [None]:
roary -f output_with_alignment -e --mafft -p 2 *.gff

Here we have run `Roary` again, but this time with some more options.

|Option    |Description                                     |
|------    |-----------                                     |
|`-e`      |Create a multi-FASTA alignment of the core genes|
|`--mafft` |Use with -e to use MAFFT instead of PRANK       |
|`-p`      |Number of threads to use                        |
  
By default, `Roary` will use `PRANK` when the `-e` option is speified. It is accurate but slow. `MAFFT` is less accurate but very fast so we are going to use this instead by specifying the `--mafft` option. To further speed things up, we are going to use 2 threads (the `-p` option). For all usage options, you can have a look at the [Roary website (https://sanger-pathogens.github.io/Roary/)](https://sanger-pathogens.github.io/Roary/).

This will take a bit longer to run than the previous command, maybe 5 or 10 minutes, perhaps answer the questions at the end of this section while waiting for this to complete. 

Once finished you should have a directory called `output_with_alignment` containing the output files, this time including a core_gene_alignment.aln file. Just quickly check that this is the case.

In [None]:
ls -l output_with_alignment

## Check your understanding
**Q9: Why do we want to run Roary with MAFFT?**  
a) Because it's quicker than to run Roary without the -e option  
b) To get more accurate results  
c) To generate a core gene alignment  
  
**Q10: Why do we use the -p otion?**  
a) We have to when we use MAFFT  
b) To speed up the run  
c) To get a nice tree  
  
Now go to the next section: [Exploring the results](results.ipynb).