# Investigating MIC data

We will use the *N. gonorrhoeae* dataset. This tutorial includes pre-computed output of running ARIBA on all the samples, and the ARIBA database that was made in the [first section](make_custom_db.ipynb). Do not worry if you did not follow that part of the tutorial - we will use a pre-computed version of the database called `data/Ref/Ngo_ARIBAdb/`.

ARIBA has a function called "micplot" that generates plots showing the distribution of MICs across samples with different combinations of genotypes. To use it, a file is required of MIC data for each sample and at least one drug. It looks like this:

In [1]:
head data/mic_data.tsv

Sample	Names	Reference	Azithromycin	Cefixime	Ceftriaxone	Ciprofloxacin
ERR1067709	GCGS0277	grad2014_2016	0.25	0.008	0.008	0.015
ERR1067710	GCGS0839	grad2014_2016	0.25	0.125	0.015	0.002
ERR1067711	GCGS0605	grad2014_2016	0.25	0.03	0.03	0.015
ERR1067712	GCGS0287	grad2014_2016	0.25	0.008	0.008	0.008
ERR1067713	GCGS0613	grad2014_2016	0.5	0.03	0.03	0.004
ERR1067714	GCGS0766	grad2014_2016	0.25	0.06	0.008	0.004
ERR1067715	GCGS0624	grad2014_2016	0.25	0.03	0.015	2
ERR1067716	GCGS0625	grad2014_2016	0.25	0.03	0.015	2
ERR1067717	GCGS0440	grad2014_2016	0.25	0.015	0.004	0.002


The first column must be named "Sample" and have names that exactly match those in ARIBA summary files used as input to micplot (we will see this shortly). The remaining columns should contain drug names and MIC scores, however, note that the first two columns contain other data that will be ignored by ARIBA. When ARIBA loads the file, it tries to convert everything in columns 2 onwards to numbers and assign a value of "NA" when this is not possible.

To run micplot, we need an MIC file, like the one above, and an ARIBA summary file (as described in the [previous section](phandango.ipynb)). This generates a summary of known 23S and mtrR mutations and includes the "assembled" cluster column, so that interrupted mtrR can be identified:

In [2]:
ariba summary --row_filter n --cluster_cols assembled,known_var --only_clusters 23S,mtrR \
  --v_groups --no_tree --fofn data/filenames.fofn summary.AZMknowngroups

Now we can run micplot using the new file `summary.AZMknowngroups.csv` and the MIC file `data/mic_data.tsv`, showing the MIC data for azithromicin compared with the different combinations of sequences and known mutations in 23S and mtrR:

In [3]:
ariba micplot data/Ref/Ngo_ARIBAdb/ --interrupted Azithromycin \
    data/mic_data.tsv summary.AZMknowngroups.csv micplot.AZMknowngroups

This produced a pdf file `micplot.AZMknowngroups.pdf` that looks like this:

![micplot AZMknowngroups](Screenshots/screenshot.micplot.AZMknowngroups.png)

There are various options that can be changed. We will show some of them here, but try running `ariba micplot --help` to see all the options.


## Horizontal lines

Horizontal lines can be added to indicate import cutoffs for MIC data, in this case 0.25 and 2, using the option `--hlines`.

In [4]:
ariba micplot data/Ref/Ngo_ARIBAdb/ --interrupted --hlines 0.25,2 \
 Azithromycin data/mic_data.tsv summary.AZMknowngroups.csv micplot.AZMknowngroups

Here is the result:

![micplot hlines](Screenshots/screenshot.micplot.AZMknowngroups.hlines.png)

## Plot styles

In the plots above, there is one point per sample. It can be hard to see how many points there are, despite there being jittering applied to the horizontal position. We can change the style to group the points together and plot circles of sizes proportional to the number of samples, using the option `--point_size`. This option determines the size of the points, but when set to zero if groups the points together. 

In [5]:
ariba micplot data/Ref/Ngo_ARIBAdb/ --interrupted --hlines 0.25,2 \
 --point_size 0 \
 Azithromycin data/mic_data.tsv summary.AZMknowngroups.csv micplot.AZMknowngroups

Here is the result:

![micplot point size zero](Screenshots/screenshot.micplot.AZMknowngroups.point_zero.png)

You can choose to not show the violin plots or the dots in the upper plot, using the option `--plot_types`. The default is `violin,point`, which means show both. To only show the dots:

In [6]:
ariba micplot data/Ref/Ngo_ARIBAdb/ --interrupted --hlines 0.25,2 \
 --plot_types point --point_size 0 \
 Azithromycin data/mic_data.tsv summary.AZMknowngroups.csv micplot.AZMknowngroups

Here is the result:

![micplot dots only](Screenshots/screenshot.micplot.AZMknowngroups.dots_only.png)

## Colours

There are various colour options - see the [matplotlib colourmaps page](http://matplotlib.org/users/colormaps.html) for all of the available colour palettes. The default is "Accent", which has 8 colours. ARIBA will cycle through these, repeating colours if there are more than 8 columns in the plot. The palette can be changed using the option `--colourmap`.

In [7]:
ariba micplot data/Ref/Ngo_ARIBAdb/ --interrupted --hlines 0.25,2 \
 --colourmap PiYG \
 --point_size 0 \
 Azithromycin data/mic_data.tsv summary.AZMknowngroups.csv micplot.AZMknowngroups

Here is the result:

![micplot PiYG](Screenshots/screenshot.micplot.AZMknowngroups.PiYG.png)

The palette PiYG is continuous, and is almost white in the middle. This is not ideal. We can skip the range in the middle, specifically 40-60%, using the option `--colour_skip`:

In [8]:
ariba micplot data/Ref/Ngo_ARIBAdb/ --interrupted --hlines 0.25,2 \
 --colourmap PiYG --colour_skip 0.35,0.65 \
 --point_size 0 \
 Azithromycin data/mic_data.tsv summary.AZMknowngroups.csv micplot.AZMknowngroups

Here is the new plot:

![micplot colour_skip](Screenshots/screenshot.micplot.AZMknowngroups.colour_skip.png)

The number of colours can be set to less than the number of columns using the option `--number_of_colours`. This makes ARIBA cycle the colours. Here is an example using the first three colours from the "Dark2" colour palette:

In [9]:
ariba micplot data/Ref/Ngo_ARIBAdb/ --interrupted --hlines 0.25,2 \
 --colourmap Dark2 --number_of_colours 3 \
 --point_size 0 \
 Azithromycin data/mic_data.tsv summary.AZMknowngroups.csv micplot.AZMknowngroups

And we only have thre colours:

![micplot 3 colours](Screenshots/screenshot.micplot.AZMknowngroups.3colours.png)

Setting the number of colours to one results in a black and white figure.

In [10]:
ariba micplot data/Ref/Ngo_ARIBAdb/ --interrupted --hlines 0.25,2 \
 --number_of_colours 1 \
 --point_size 0 \
 Azithromycin data/mic_data.tsv summary.AZMknowngroups.csv micplot.AZMknowngroups

Here is the black and white figure:

![micplot black and white](Screenshots/screenshot.micplot.AZMknowngroups.blackwhite.png)