# Viewing ARIBA results in Phandango

This section describes how to use [Phandango](http://phandango.net/) to view a summary of ARIBA results from many samples.

The most important output file from ARIBA is the report called `report.tsv`. For this tutorial, we have all 1517 reports in the directory `data/ARIBA_reports/`.

In [4]:
ls data/ARIBA_reports | wc -l

1517


See the [previous section](run_ariba.ipynb) for how to generate a report file for each sample.

ARIBA has a functon called "[summary](https://github.com/sanger-pathogens/ariba/wiki/Task:-summary)" that can summarise presence/absence of sequences and/or SNPs across samples. It takes at least two ariba reports as input, and makes a CSV file that can be opened in your favourite spreadsheet program, and also makes input files for Phandango. The two Phandango files (a tree and a CSV file) can be dropped straight into the Phandango page for viewing.

The tree that ARIBA makes is based on the CSV file, which contains results of presence/absence of sequence and SNPs, and other information such as percent identity bewteen contigs and reference sequences. This means that it does not necessarily represent the real phylogeny of the samples. It is more accurate to provide a tree built from the sequencing data. For this reason, we will use a pre-computed tree file `tree_for_phandango.tre`.

## Basic usage of ariba summary

First, let's run `ariba summary` using the default settings, except we will skip making the tree:

In [3]:
ariba summary --no_tree out data/ARIBA_reports/*.tsv

We can see that this made two files:

In [4]:
ls out.*

out.csv  out.phandango.csv


They are the same except for the first line, which has Phandango-specific information. ARIBA uses the filenames as sample names in the output: 

In [5]:
head -n 2 out.phandango.csv

name,16S.match,16S.match:colour,23S.match,23S.match:colour,blaTEM.match,blaTEM.match:colour,folP.match,folP.match:colour,gyrA.match,gyrA.match:colour,mtrR.match,mtrR.match:colour,mtrR_promoter.match,mtrR_promoter.match:colour,parC.match,parC.match:colour,parE.match,parE.match:colour,penA.match,penA.match:colour,ponA.match,ponA.match:colour,porA.match,porA.match:colour,porB1a.match,porB1a.match:colour,porB1b.match,porB1b.match:colour,rpoB.match,rpoB.match:colour,rpsJ.match,rpsJ.match:colour,tetM.match,tetM.match:colour
data/ARIBA_reports/ERR1067709.tsv,no,#fb9a99,no,#fb9a99,no,#fb9a99,yes,#33a02c,no,#fb9a99,no,#fb9a99,yes,#33a02c,no,#fb9a99,no,#fb9a99,no,#fb9a99,no,#fb9a99,no,#fb9a99,no,#fb9a99,yes,#33a02c,yes,#33a02c,yes,#33a02c,no,#fb9a99


The first name is "data/ARIBA_reports/ERR1067709.tsv", and the rest are named similarly. This is not ideal, as it will look ugly in Phandango. Further, the names must exactly match the names in the tree file for Phandango to work (have a look in the tree file `tree_for_phandango.tre`). You could do a little hacking here using the Unix command `sed` on the CSV file. Instead, we can supply ARIBA with a file of filenames that also tells ariba what to call the samples in its output CSV files. Instead of "data/ARIBA_reports/ERR1067709.tsv", we would like to simply use "ERR1067709", which is cleaner and matches the tree file. It also means we can (and will) repeatedly run `ariba summary` with different options, and get output files that can be loaded straight into Phandango. This is one way to make the file with the naming information:

In [6]:
ls data/ARIBA_reports/* | awk -F/ '{print $0,$NF}' | sed 's/.tsv$//' > filenames.fofn

The file is quite simple. Column 1 is the filename, and column 2 is the name we would like to use in the output.

In [7]:
head filenames.fofn

data/ARIBA_reports/ERR1067709.tsv ERR1067709
data/ARIBA_reports/ERR1067710.tsv ERR1067710
data/ARIBA_reports/ERR1067711.tsv ERR1067711
data/ARIBA_reports/ERR1067712.tsv ERR1067712
data/ARIBA_reports/ERR1067713.tsv ERR1067713
data/ARIBA_reports/ERR1067714.tsv ERR1067714
data/ARIBA_reports/ERR1067715.tsv ERR1067715
data/ARIBA_reports/ERR1067716.tsv ERR1067716
data/ARIBA_reports/ERR1067717.tsv ERR1067717
data/ARIBA_reports/ERR1067718.tsv ERR1067718


Now we can rerun summary using this input file. Note the use of the new option `--fofn`.

In [8]:
ariba summary --no_tree --fofn filenames.fofn out data/ARIBA_reports/*.tsv

Check that the renaming worked:

In [9]:
head -n 2 out.phandango.csv

name,16S.match,16S.match:colour,23S.match,23S.match:colour,blaTEM.match,blaTEM.match:colour,folP.match,folP.match:colour,gyrA.match,gyrA.match:colour,mtrR.match,mtrR.match:colour,mtrR_promoter.match,mtrR_promoter.match:colour,parC.match,parC.match:colour,parE.match,parE.match:colour,penA.match,penA.match:colour,ponA.match,ponA.match:colour,porA.match,porA.match:colour,porB1a.match,porB1a.match:colour,porB1b.match,porB1b.match:colour,rpoB.match,rpoB.match:colour,rpsJ.match,rpsJ.match:colour,tetM.match,tetM.match:colour
ERR1067709,no,#fb9a99,no,#fb9a99,no,#fb9a99,yes,#33a02c,no,#fb9a99,no,#fb9a99,yes,#33a02c,no,#fb9a99,no,#fb9a99,no,#fb9a99,no,#fb9a99,no,#fb9a99,no,#fb9a99,yes,#33a02c,yes,#33a02c,yes,#33a02c,no,#fb9a99


Now go to [Phandango](http://phandango.net/) and drag and drop the files `out.phandango.csv` and `tree_for_phandango.tre` into the window. The result should like this:

![title](Screenshots/screenshot.phandango.default.png)