# Plot taxonomy profiles

## README

This notebook allows you to import taxonomic annotations and abundance information for genes and to interactively explore the taxonomic profiles in your dataset.

**DO THIS FIRST:**
Start off by selecting 'Cell->Run All' from the menu. 

Once all cells have executed proceed to the [Select data](#Select_data) section to select and load your files stored in the 'datadir/' directory. You can then use the [barplot](#Barplot) and [clustermap](#Clustermap) interactive widgets to plot your data.

## Import packages

In [None]:
%config InlineBackend.figure_format = 'svg'
%matplotlib inline
import seaborn as sns, pandas as pd, matplotlib.pyplot as plt, numpy as np
from glob import glob
from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets

plt.style.use('ggplot')

## Initialize empty dataframe for the plotdata
data = pd.DataFrame()

## Functions

In [None]:
## Function for generating dataframe for plotting
def make_plotdata(taxfile, covfile, sep):
    print("Loading files")
    global taxdf
    global covdf
    global data
    if sep=="csv": sep = ","
    else: sep = "\t"
    taxdf = pd.read_csv(taxfile, header=0, sep=sep, index_col=0)
    covdf = pd.read_csv(covfile, header=0, sep=sep, index_col=0)
    print("Merging dataframes")
    data = pd.merge(taxdf,covdf,left_index=True,right_index=True)
    ## Populate the widgets for plotting
    rank_select.options=list(data.columns[0:len(taxdf.columns)])
    sample_select.options=list(data.columns[len(taxdf.columns):])
    taxa_select.options = list(set(data[rank_select.value]))
    print("Data ready")

## Function for the barplot
def barplot(rank, taxa, unc, samples, ylim, renorm, w, h, title, f, f_format):
    plt.close("all")
    df = data
    ## Groupby rank and normalize
    dg = df.groupby(rank).sum()
    dg = dg.div(dg.sum())*100
    
    ## Limit to samples
    if list(samples)!=[]: dg = dg.loc[:,list(samples)]
    ## Limit to taxa
    if list(taxa)!=[]: dg = dg.loc[list(taxa)]
    
    if renorm: dg = dg.div(dg.sum())*100
    ax = dg.T.plot(kind="bar", stacked=True, ylim=ylim, title=title, figsize=(w,h))
    ax.set_ylabel("%")
    plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.);
    if f!="": 
        outfile = f+"."+f_format
        plt.savefig(outfile,format=f_format,dpi=300,bbox_inches="tight")
        print("Saved figure to "+outfile)

## Function for the clustermap
def clustermap(rank, taxa, unc, samples, renorm, row_clust, col_clust, clust_metric, clust_method, w, h, z_c, z_r, title, f, f_format):
    if z_c: z = 1
    if z_r: z = 0
    if not z_c and not z_r: z = None
    plt.close("all")
    df = data
    ## Groupby rank and normalize
    dg = df.groupby(rank).sum()
    dg = dg.div(dg.sum())*100
    
    ## Limit to samples
    if list(samples)!=[]: dg = dg.loc[:,list(samples)]
    ## Limit to taxa
    if list(taxa)!=[]: dg = dg.loc[list(taxa)]
    
    if renorm: dg = dg.div(dg.sum())*100
    
    ax = sns.clustermap(dg, col_cluster=col_clust, z_score=z, row_cluster=row_clust, method=clust_method,
                        metric=clust_metric,figsize=(w,h))
    ax.ax_heatmap.set_title(title)
    plt.setp(ax.ax_heatmap.yaxis.get_majorticklabels(), rotation=0)
    if f!="": 
        outfile = f+"."+f_format
        plt.savefig(outfile,format=f_format,dpi=300,bbox_inches="tight")
        print("Saved figure to "+outfile)
    
## Function to update taxa on changes of the rank dropdown
def update_taxa(*args): 
    newrank = args[0]['new']
    top = list(data.groupby(newrank).sum().sum(axis=1).sort_values(ascending=False, inplace=False).index)
    if unc_radio.value=='exclude': top = [t for t in top if not "Unclassified" in t]
    top = top[0:14]
    taxa_select.options = top

## Widgets

In [None]:
## Widgets for make_plotdata
taxfile_select = widgets.Select(options=glob("datadir/*"), description="Taxonomy")
covfile_select = widgets.Select(options=glob("datadir/*"), description="Coverage")
sep_select = widgets.RadioButtons(options=["tab","csv"], description="Separator")    

## Widgets for barplot
rank_select = widgets.Dropdown(options=[], description="Rank")
sample_select = widgets.SelectMultiple(options=[], description="Samples")
taxa_select = widgets.SelectMultiple(options=[], description="Taxa")
rank_select.observe(update_taxa, 'value')
ylim_select = widgets.IntRangeSlider(min=0,max=100,value=(0,100), description="y-axis range")
title_box = widgets.Text(description="Title")
file_box = widgets.Text(description="Outfile")
file_type = widgets.RadioButtons(options=["png","pdf"], description="Format")
unc_radio = widgets.RadioButtons(options=["exclude","include"], description="Unknown")
re_normcheck = widgets.Checkbox(description="Renorm",value=False)
width_select = widgets.IntSlider(min=1,max=30,value=7, description="Width")
height_select = widgets.IntSlider(min=1,max=30,value=7, description="Height")

## Widgets for clustermap
clustmetric_select = widgets.Select(options=["euclidian","braycurtis","correlation","jaccard"], value="braycurtis", description="Metric")
clustermethod_select = widgets.Select(options=["complete","average","single","weighted","centroid"], value="complete", description="Method")
z_col = widgets.Checkbox(value=False, description="z-score columns")
z_row = widgets.Checkbox(value=False, description="z-score rows")
col_clust = widgets.Checkbox(value=False, description="Cluster columns")
row_clust = widgets.Checkbox(value=True, description="Cluster rows")

## Plotting

<a id='Select_data'></a>
### Select datafiles

Choose one file that holds the taxonomic annotation for ORFs and one that contains the abundance values for ORFs in samples. In both files, ORFs are rows.

In [None]:
interact(make_plotdata, taxfile=taxfile_select, covfile=covfile_select, sep=sep_select, __manual=True)

### Barplot

#### README

Create a barplot of taxa in your samples.

**Rank**: Selects a rank for which the data will be summed and plotted.

**Taxa**: Select from the (currently) top 14 most abundant taxa (shown in decreasing order of abundance) for the selected rank. 

**Unknown**: Include/Exclude "Unclassified" taxa in the Taxa field (requires you to refresh the Rank dropdown).

**Samples**: Select samples. If no selection, all will be shown.

**y-axis range**: Change the limits of the y-axis.

**Renorm**: Normalize data again after filtering by taxa. This is helpful if you want to see changes in relative abundance for a few select taxa.

**Title**: Set a title for the plot.

**Outfile** and **Format**: Save the resulting plot to file with either png or pdf format.

In [None]:
interact(barplot,rank=rank_select, taxa=taxa_select, unc=unc_radio,samples=sample_select, ylim=ylim_select, renorm=re_normcheck, w=width_select, h=height_select, title=title_box, f=file_box, f_format=file_type, __manual=True)

### Clustermap

The first few widgets work as in the [barplot](#Barplot). Specific to the clustermap are:

**Cluster rows and Cluster columns**: Let's you cluster the rows and/or columns by correlation. This can highlight patterns in your samples.

**Metric**: Selects the cluster metric. See [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html) for a description of the different metrics.

**Method**: Selects the cluster method. See [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html) for a description of the different methods.

**z-score columns and z-score rows**: Let's you calculate z-scores for rows or columns (if both are checked, the former takes precedence. Z-scores are: z = (x - mean)/std, so values in each row (column) will get the mean of the row (column) subtracted, then divided by the standard deviation of the row (column). This ensures that each row (column) has mean of 0 and variance of 1.

In [None]:
interact(clustermap, rank=rank_select, taxa=taxa_select, unc=unc_radio, samples=sample_select, renorm=re_normcheck, row_clust=row_clust, col_clust=col_clust, clust_method=clustermethod_select, clust_metric=clustmetric_select, w=width_select, h=height_select, z_c=z_col, z_r=z_row, title=title_box, f=file_box, f_format=file_type, __manual=True)