# Chroinic Trichuris muris infection in mice
## Using data from: 
Holm JB, Sorobetea D, Kiilerich P, Ramayo-Caldas Y, Estellé J, Ma T, Madsen L, Kristiansen K, Svensson-Frej M. 2015.

Chronic Trichuris muris Infection Decreases Diversity of the Intestinal Microbiota and Concomitantly Increases the Abundance of Lactobacilli.

PloS One 10:e0125495


## Setup
### Load calour

In [1]:
import calour as ca


  from ._conv import register_converters as _register_converters


### Set the verbosity of calour
Can be 'DEBUG' / 'INFO' / 'WARN' / 'ERROR'

In [2]:
ca.set_log_level('INFO')

### We want interactive figures inside the notebook
so need to allow interactive matplotlib figures

In [3]:
%matplotlib notebook

## Load the data
We load the deblurred biom table and corresponding mapping file into a new calour AmpliconExperiment

We throw away samples with < 1000 reads total, and normalize each sample to 10000 reads (using TSS - not rarefaction)

In [4]:
dat=ca.read_amplicon('./all.withtax.biom','./map.txt',normalize=10000, min_reads=1000)

2018-07-23 18:46:31 INFO loaded 140 samples, 43716 features
2018-07-23 18:46:31 INFO After filtering, 140 remaining


## Reorder the features (bacteria)
features will be ordered by similarity.

Also get rid of features with < 10 reads summed over all samples (since they are non interesting)

In [5]:
datc=dat.cluster_features(min_abundance=10)

2018-07-23 18:46:31 INFO After filtering, 1094 remaining


## What we have

In [6]:
datc

AmpliconExperiment ("all.withtax.biom") with 140 samples, 1094 features

In [7]:
datc.sample_metadata.columns

Index(['BioSample_s', 'Experiment_s', 'MBases_l', 'MBytes_l', 'Run_s',
       'SRA_Sample_s', 'Sample_Name_s', 'host_body_site_s',
       'miscellaneous_parameter_s', 'disease', 'day', 'Assay_Type_s',
       'AvgSpotLen_l', 'BioProject_s', 'Center_Name_s', 'Consent_s',
       'InsertSize_l', 'Instrument_s', 'LibraryLayout_s', 'LibrarySelection_s',
       'LibrarySource_s', 'Library_Name_s', 'LoadDate_s', 'Organism_s',
       'Platform_s', 'ReleaseDate_s', 'SRA_Study_s', 'collection_date_s',
       'environment_biome_s', 'environment_feature_s',
       'environment_material_s', 'environmental_package_s',
       'geographic_location_country_and_or_sea_s',
       'geographic_location_latitude_s', 'geographic_location_longitude_s',
       'host_common_name_s', 'host_diet_s', 'host_sex_s', 'host_taxid_s',
       'investigation_type_s', 'library_construction_method_s',
       'nucleic_acid_extraction_s', 'project_name_s',
       'sample_storage_temperature_s', 'sequencing_method_s', '_sample

## Reorder the samples
sort by disease column values, within it by day, and within it by host_body_site_s

(since sorting keeps the previous order in the case of ties, we sort first by the least important field (host_body_site) etc.

In [8]:
datc=datc.sort_samples('host_body_site_s').sort_samples('day').sort_samples('disease')

## Looking at all the data

In [9]:
f = datc.normalize(100, axis='s').plot(sample_field='day',gui='jupyter',
                                       feature_field=None, barx_fields=['disease'],
                                       clim=[0,10])

<IPython.core.display.Javascript object>

## Get the features (bacteria) different between infected and uninfected mice
Since we see the difference is mainly in the late infected time points (27, 35) we will use them for comparison to all noninfected timepoints
We use the random_seed to allow reproducibility since the differential abundance p-value calculation and dsFDR control are based on random permutations

In [10]:
dd=datc.diff_abundance('miscellaneous_parameter_s',
                       val1=['Uninfected day 0','Uninfected day 13','Uninfected day 20','Uninfected day 27','Uninfected day 35'],
                       val2='Infected day 35',
                       random_seed=2018)

2018-07-23 18:46:41 INFO 100 samples with both values
2018-07-23 18:46:41 INFO After filtering, 1091 remaining
2018-07-23 18:46:41 INFO 80 samples with value 1 (['Uninfected day 0', 'Uninfected day 13', 'Uninfected day 20', 'Uninfected day 27', 'Uninfected day 35'])
2018-07-23 18:46:41 INFO method meandiff. number of higher in ['Uninfected day 0', 'Uninfected day 13', 'Uninfected day 20', 'Uninfected day 27', 'Uninfected day 35'] : 313. number of higher in ['Infected day 35'] : 130. total 443


## Plot to see the results
only significant features are kept, sorted by the effect size

In [11]:
dd=dd.sort_samples('host_body_site_s')

In [12]:
dd.sort_samples('miscellaneous_parameter_s').normalize(100).plot(sample_field='miscellaneous_parameter_s',
                                                                 gui='jupyter',
                                                                 barx_fields=['host_body_site_s'],
                                                                 clim=[0,100])

<IPython.core.display.Javascript object>

<calour.heatmap.plotgui_jupyter.PlotGUI_Jupyter at 0x1a11edd898>

## looking only in colon samples

In [13]:
colon=datc.filter_samples('host_body_site_s','Colon content')

In [14]:
colon=colon.cluster_features(min_abundance=10)

2018-07-23 18:46:52 INFO After filtering, 1005 remaining


## plot all features (fig S2A)

In [15]:
f=colon.normalize(100, axis='s').plot(sample_field='day',gui='jupyter', feature_field=None,
                                      barx_fields=['disease'], clim=[0, 10])

<IPython.core.display.Javascript object>

In [20]:
f.save_figure('figure-S2A-mouse-worm-all.pdf')

## plot differentially abundant features (fig 2A)
between healthy and infected at late timepoints

In [16]:
dd=colon.diff_abundance('miscellaneous_parameter_s',
                        val1=['Uninfected day 0','Uninfected day 13','Uninfected day 20','Uninfected day 27','Uninfected day 35'],
                        val2='Infected day 35',
                        random_seed=2018)

2018-07-23 18:46:57 INFO 70 samples with both values
2018-07-23 18:46:57 INFO After filtering, 999 remaining
2018-07-23 18:46:57 INFO 60 samples with value 1 (['Uninfected day 0', 'Uninfected day 13', 'Uninfected day 20', 'Uninfected day 27', 'Uninfected day 35'])
2018-07-23 18:46:58 INFO method meandiff. number of higher in ['Uninfected day 0', 'Uninfected day 13', 'Uninfected day 20', 'Uninfected day 27', 'Uninfected day 35'] : 170. number of higher in ['Infected day 35'] : 44. total 214


In [17]:
fig=dd.normalize(100, axis='s'). plot(sample_field='day',gui='jupyter', feature_field=None,
            barx_fields=['disease'])

<IPython.core.display.Javascript object>

In [25]:
fig.save_figure('figure-2A-mouse-worm-diff.pdf')

## Plot the enriched terms between the 2 groups
blue is enriched in "higher in control" group, orange is enriched in "lower in control" group

We ignore experiment 119 since this is the experiment we are analysing (don't want to get annotations from this experiment)

In [18]:
f, d = dd.plot_diff_abundance_enrichment(ignore_exp=[119],
                                         colors=['blue','orange'],
                                         show_legend=False,
                                         transform_type='rankdata', 
                                         max_len=60,
                                         min_exps=2)

2018-07-17 15:20:23 INFO Getting dbBact annotations for 214 sequences, please wait...
2018-07-17 15:20:34 INFO Got 5154 annotations
2018-07-17 15:20:34 INFO Added annotation data to experiment. Total 1667 annotations, 214 terms
2018-07-17 15:20:34 INFO removed 1682 terms


<IPython.core.display.Javascript object>

In [32]:
f.figure.savefig('figure-2C-worm-terms.pdf')

In [19]:
fig=dd.normalize(100, axis='s'). plot(sample_field='day',gui='qt5', feature_field=None,
            barx_fields=['disease'])

2018-07-17 15:21:09 INFO removed 0 terms
2018-07-17 15:21:22 INFO found 23 annotations with term
2018-07-17 15:21:22 INFO After filtering, 18 remaining
2018-07-17 15:22:50 INFO found 4 annotations with term
2018-07-17 15:22:50 INFO After filtering, 4 remaining


In [20]:
db=ca.database._get_database_class('dbbact')

In [None]:
db.show_term_details_diff('mus musculus',dd,gui='qt5')

2018-07-17 15:24:16 INFO found 169 annotations with term
2018-07-17 15:24:16 INFO After filtering, 154 remaining


## Create the interactive HTML map

In [86]:
dd.export_html(sample_field='disease',title='bacteria different between infected and non-infected mice',output_file='./mice-worm-diff.html',feature_field='taxonomy')

<IPython.core.display.Javascript object>

2018-07-25 17:46:42 INFO exported experiment to html file ./mice-worm-diff.html


In [20]:
res