# Habitat Switching in Amphibian Larvae
## Based on data from:
Bletz MC, Goedbloed DJ, Sanchez E, Reinhardt T, Tebbe CC, Bhuju S, Geffers R, Jarek M, Vences M, Steinfartz S. 2016.

Amphibian gut microbiota shifts differentially in community structure but converges on habitat-specific predicted functions.

Nat Commun 7:13699.

# Initial setup

In [1]:
import pandas as pd

## import calour

In [2]:
import calour as ca

  from ._conv import register_converters as _register_converters


## Set the warning level to INFO
(can be DEBUG/INFO/WARN/ERROR/CRITICAL)

In [3]:
ca.set_log_level('INFO')

## We need %matplotlib notebook for the interactive jupyter gui heatmaps

In [4]:
%matplotlib notebook

# Load the dataset (amphibians)
## Get rid of samples with < 1000 reads, normalize each sample to 10000 reads

In [5]:
dat=ca.read_amplicon('./all.fixids.biom','./map.txt', normalize=10000, min_reads=1000)

2018-07-12 14:38:42 INFO loaded 370 samples, 2442 features
2018-07-12 14:38:42 INFO After filtering, 340 remaining


# Analysis

## Reorder features by clustering
So similarly behaving bacteria are close to each other

Also get rid of features with < 10 reads total over all samples (non-informative)

In [6]:
datc=dat.cluster_features(min_abundance=10)

2018-07-12 14:38:43 INFO After filtering, 2404 remaining


## Quick look at the experiment

In [7]:
datc

AmpliconExperiment ("all.fixids.biom") with 340 samples, 2404 features

In [8]:
datc.sample_metadata.columns

Index(['extraction', 'BarcodeSequence', 'LinkerPrimerSequence', 'Country',
       'Date', 'Species', 'Location', 'Study.ID', 'Survey.Habitat',
       'Start.Loc', 'End.Loc', 'Experimental.Cat', 'startloc', 'endloc',
       'Experiment.ID', 'Life.Stage', 'Pond.Ind', 'Individual_ID', 'Swab_Type',
       'Extraction_ID', 'SequenceRun_ID', 'Description', '_sample_id',
       '_calour_original_abundance'],
      dtype='object')

## Reorder the samples by sorting by field "Extraction_ID"

In [9]:
datc=datc.sort_samples('Extraction_ID')

## fix values in the mapping file (sample_metadata)
To get a nicer label in the plot

In [10]:
datc.sample_metadata.endloc = ['Prior' if pd.isnull(i) else i for i in datc.sample_metadata.endloc]

## Get Just the transfer experiment samples
(pond<->stream transfer)

In [11]:
trans=datc.filter_samples('Study.ID','Transplant')

In [12]:
trans=trans.cluster_features(min_abundance=10)

2018-07-12 14:38:48 INFO After filtering, 2054 remaining


In [13]:
trans=trans.sort_samples('Experimental.Cat')

## just the skin samples

In [14]:
tskin=trans.filter_samples('Swab_Type','skin').cluster_features(min_abundance=10).normalize(100, axis='s')

2018-07-12 14:38:50 INFO After filtering, 1650 remaining


## Plot all features (fig1D)
Some plot parameters:

- sample_field : str, name of the x (sample) field used to draw the separating lines and x labels (or none to not draw)
- barx_fields : list of names for the x (sample) fields to plot at top color bars
- barx_label: bool, True to show the barx value texts, False to just show the colors
- feature_field : str, name of the y (feature) field used to display the per-feature label or None to not show
- clim : tuple of [min, max] for the heatmap color range
- gui : str, name of heatmap GUI to use. can be 'jupyter' for interactive plot inside the notebook, 'qt5' for separate qt5 window GUI, 'none' for static plot

In [15]:
f = tskin.sort_samples('Experimental.Cat').plot(sample_field='Experimental.Cat',gui='jupyter',
                                                barx_fields=['endloc','startloc'], feature_field=None, 
                                                clim=[0, 20])
f.ax_hm.set_xlabel('experiment category')

<IPython.core.display.Javascript object>

Text(0.5,7.02824,'experiment category')

## And save to PDF

In [17]:
f.save_figure('figure-1D-skin-all.pdf')

## pond/stream cluster (fig S1A)

In [20]:
f = tskin.sort_samples('Experimental.Cat').plot(sample_field='Experimental.Cat',gui='jupyter',barx_fields=['endloc','startloc'],
                                        feature_field=None, clim=[0, 20], rect=[-0.5, 118.5, 1572, 1505])
f.ax_hm.set_xlabel('experiment category')

<IPython.core.display.Javascript object>

Text(0.5,7.02824,'experiment category')

In [21]:
f.save_figure('figure-S1A-skin-source.pdf')

## extraction plate cluster (fig1e zoom2)
This is a zoom from the previous window. We specify the coordinates using the rect=[] parameter.

To obtain the coordinates, we zoom and pan (using "," , "." , "-" , "=") and then press the "print axes range" button.

This allows to recreate an interactive view zoom in the notebook

In [22]:
f = tskin.sort_samples('Experimental.Cat').plot(sample_field='Experimental.Cat',gui='jupyter',
                                                barx_fields=['endloc','startloc'], feature_field=None,
                                                clim=[0,50],
                                                rect=[-0.5, 118.5, 1648, 1598])
f.ax_hm.set_xlabel('experiment category')

<IPython.core.display.Javascript object>

Text(0.5,7.02824,'experiment category')

In [23]:
f.save_figure('figure-1E-skin-extraction.pdf')

## extraction cluster ordered by extraction (fig1F)

In [24]:
ttskin=tskin.sort_samples('extraction')

In [25]:
f = ttskin.plot(sample_field='extraction', gui='jupyter',feature_field=None,
                clim=[0,20],
                rect=[-0.5, 118.5, 1648, 1598])
f.ax_hm.set_xlabel('extraction plate')

<IPython.core.display.Javascript object>

Text(0.5,15.0716,'extraction plate')

In [26]:
f.save_figure('figure-1F-skin-extraction-ordered-v3.pdf')

## supervised difference between pond and stream
We use diff_abundance to get difference in values in a given field.
Default test is rank-mean difference with dsFDR correction

Results are stored in a new experiment, with features soreted by effect size

We look for bacteria different between only pond ("P", "P>P") and only stream ("S" , "S>S") since we don't know what to expect with the pond to stream / stream to pond samples.

We use the random_seed parameter to allow repeatability when re-running since the significance is calculated using a random permutations test.

In [27]:
dd=tskin.diff_abundance('Experimental.Cat',['P','P>P'],['S','S>S'], random_seed=2018)

2018-07-12 14:39:27 INFO 69 samples with both values
2018-07-12 14:39:27 INFO After filtering, 1472 remaining
2018-07-12 14:39:27 INFO 39 samples with value 1 (['P', 'P>P'])
2018-07-12 14:39:27 INFO method meandiff. number of higher in ['P', 'P>P'] : 74. number of higher in ['S', 'S>S'] : 39. total 113


In [28]:
f = dd.sort_samples('Experimental.Cat').plot(sample_field='Experimental.Cat',gui='jupyter',
                                     barx_fields=['endloc','startloc'], feature_field=None,
                                     clim=[0,20])
f.ax_hm.set_xlabel('experiment category')

<IPython.core.display.Javascript object>

Text(0.5,7.02824,'experiment category')

In [29]:
f.save_figure('figure-S1B-supervised-pond-stream.pdf')

## Gut

In [30]:
tgut=trans.filter_samples('Swab_Type','Intestines').cluster_features(min_abundance=10).normalize(100, axis='s')
tgut

2018-07-12 14:39:32 INFO After filtering, 975 remaining


AmpliconExperiment ("all.fixids.biom") with 154 samples, 975 features

## All features (fig1A)

In [31]:
f = tgut.sort_samples('Experimental.Cat').plot(sample_field='Experimental.Cat',gui='jupyter',
                                               barx_fields=['endloc','startloc'], feature_field=None,
                                               clim=[0, 20])
f.ax_hm.set_xlabel('experiment category')

<IPython.core.display.Javascript object>

Text(0.5,7.02824,'experiment category')

In [32]:
f.save_figure('figure-1A-gut-all.pdf')

## pond/stream cluster (fig1B)

In [33]:
f = tgut.sort_samples('Experimental.Cat').plot(sample_field='Experimental.Cat',gui='jupyter',
                                       barx_fields=['endloc','startloc'], feature_field=None,
                                       clim=[0, 20],
                                       rect=[-0.5, 153.5, 415, 300])
f.ax_hm.set_xlabel('experiment category')

<IPython.core.display.Javascript object>

Text(0.5,7.02824,'experiment category')

In [34]:
f.save_figure('figure-1B-gut-zoom.pdf')

## supervised pond/stream bacteria (fig1C)

In [35]:
tgut=trans.filter_samples('Swab_Type','Intestines')

### remove effect of large outliers on compositionallity
Since a difference between the two groups in a high freq. bacteria can cause artificial differences in the other bacteria, we want to correct for these high freq. bacteria.

normalize_compositional does a per sample normalization (to total reads=total) when ignoring all bacteria with mean frequency > min_frac. The ignored bacteria are also scaled according to the same factor (so the per-sample sum can be > 100).

In [36]:
tgut = tgut.normalize_compositional(min_frac=0.05, total=100)

2018-07-12 14:39:43 INFO After filtering, 2 remaining
2018-07-12 14:39:43 INFO ignoring 2 features


In [37]:
dd=tgut.diff_abundance('Experimental.Cat',['P','P>P'],['S','S>S'], random_seed=2018)

2018-07-12 14:39:44 INFO 91 samples with both values
2018-07-12 14:39:44 INFO After filtering, 1070 remaining
2018-07-12 14:39:44 INFO 47 samples with value 1 (['P', 'P>P'])
2018-07-12 14:39:44 INFO method meandiff. number of higher in ['P', 'P>P'] : 178. number of higher in ['S', 'S>S'] : 89. total 267


In [38]:
f = dd.sort_samples('Experimental.Cat').plot(sample_field='Experimental.Cat',gui='jupyter',
                                     barx_fields=['endloc','startloc'], feature_field=None,
                                     clim=[0, 20])
f.ax_hm.set_xlabel('experiment category')

<IPython.core.display.Javascript object>

Text(0.5,7.02824,'experiment category')

In [39]:
f.save_figure('figure-1C-gut-supervised-pond-stream.pdf')