# Parkinson's Mouse Tutorial - Refernce Frames

Run this notebook in `qiime2-2020.6`.

Continuing the [pd-mouse tutorial](https://docs.qiime2.org/2021.11/tutorials/pd-mice/). We'll discuss non-rarefaction based betadiversity and differential abundance. 

This notebook will focus on [DEICODE](https://github.com/biocore/DEICODE), [songbird](https://github.com/biocore/songbird/), and [qurro](https://github.com/biocore/qurro). For more in-depth information see these forum posts:

- [Question on Deicode and songbird](https://forum.qiime2.org/t/question-on-deicode-and-songbird/11829/2)

In [None]:
from os import getcwd, listdir, chdir, mkdir
import qiime2 as q2

In [None]:
getcwd()

In [None]:
chdir('../processed')
getcwd()

## DEICODE

Non-rarefaction based Beta-diversity.

- [DEICODE GitHub page and tutorial links](https://github.com/biocore/DEICODE). 
- [Questions about interpreting DEICODE and Qurro output](https://forum.qiime2.org/t/questions-about-interpreting-deicode-and-qurro-output/14888/4)
- [Help understanding DEICODE](https://forum.qiime2.org/t/help-understanding-deicode/8803)

Visualize ranks with [qurro](https://github.com/biocore/qurro).



In [None]:
! qiime deicode auto-rpca \
    --i-table ./table-no-ecmu-hits.qza \
    --o-biplot ./deicode-biplot.qza \
    --o-distance-matrix ./deicode-matrix.qza

In [None]:
! qiime emperor biplot \
    --i-biplot ./deicode-biplot.qza \
    --m-sample-metadata-file ./metadata.tsv \
    --m-feature-metadata-file ./taxonomy.qza \
    --p-number-of-features 8 \
    --o-visualization ./biplot.qzv \

In [None]:
q2.Visualization.load('./biplot.qzv')

In [None]:
!qiime diversity beta-group-significance \
    --i-distance-matrix ./deicode-matrix.qza \
    --m-metadata-file metadata.tsv \
    --m-metadata-column donor \
    --o-visualization ./deicode-donor-significance.qzv

In [None]:
q2.Visualization.load('./deicode-donor-significance.qzv')

In [None]:
! qiime qurro loading-plot \
    --i-table ./table-no-ecmu-hits.qza \
    --i-ranks ./deicode-biplot.qza \
    --m-sample-metadata-file ./metadata.tsv \
    --m-feature-metadata-file ./taxonomy.qza \
    --o-visualization ./deicode-qurro-plot.qzv

In [None]:
q2.Visualization.load('./deicode-qurro-plot.qzv')

**Post-hoc test**

In [None]:
# post-hoc testing numerator and denominator by comparing 
#   p__Firmicutes (numerator) vs  p__Bacteroidota (denominator)
import pandas as pd
import itertools
from scipy.stats import ttest_ind

# import the taxonomy metadata
lrdf = pd.read_csv('firm-vs-bact.tsv',
                   sep='\t', index_col=0).dropna(subset=['Current_Natural_Log_Ratio'])

# split data by GroupType
lrs = {type_:df_.Current_Natural_Log_Ratio
       for type_, df_ in lrdf.groupby('donor')}

# get all combos
ids_ = list(itertools.combinations(lrs.keys(), 2))

# take t-test
tst = pd.DataFrame({(id1_, id2_):ttest_ind(lrs[id1_], lrs[id2_])
                    for id1_, id2_ in ids_},
                   ['test-stat','p-value']).T
tst.index.names = ['group one vs.', 'group two']

# view results
tst

## Songbird

Supervised non-rarefaction based differential abundance.

- [Songbird Github page](https://github.com/biocore/songbird/)
- [Songbird differentials interpretation](https://forum.qiime2.org/t/songbird-differentials-interpretation/17558)


Visualize differentials with [qurro](https://github.com/biocore/qurro).

**Model data based on 'donor'**

In [None]:
! qiime songbird multinomial \
	--i-table ./table-no-ecmu-hits.qza \
	--m-metadata-file ./metadata.tsv \
	--p-formula "donor" \
	--p-epochs 10000 \
	--p-differential-prior 0.5 \
	--p-summary-interval 1 \
	--o-differentials songbird-differentials.qza \
	--o-regression-stats songbird-regression-stats.qza \
	--o-regression-biplot songbird-regression-biplot.qza \
    --verbose

In [None]:
! qiime metadata tabulate \
	--m-input-file songbird-differentials.qza \
	--o-visualization songbird-differentials-viz.qzv

In [None]:
q2.Visualization.load('songbird-differentials-viz.qzv')

**Run a null model**

In [None]:
# Generate a null model
! qiime songbird multinomial \
	--i-table ./table-no-ecmu-hits.qza \
	--m-metadata-file ./metadata.tsv \
	--p-formula "1" \
	--p-epochs 10000 \
	--p-differential-prior 0.5 \
	--p-summary-interval 1 \
	--o-differentials songbird-null-diff.qza \
	--o-regression-stats songbird-null-stats.qza \
	--o-regression-biplot songbird-null-biplot.qza \
    --verbose

**Compare null model vs our actual model**

How much more are we learning compared to a null model?

In [None]:
! qiime songbird summarize-paired \
	--i-regression-stats songbird-regression-stats.qza \
	--i-baseline-stats songbird-null-stats.qza \
	--o-visualization songbird-paired-summary.qzv

In [None]:
q2.Visualization.load('songbird-paired-summary.qzv')

q2.Visualization.load('songbird-biplot.qzv')

**Let's look at our differentials.**

In [None]:
! qiime qurro differential-plot \
    --i-table ./table-no-ecmu-hits.qza \
    --i-ranks songbird-differentials.qza \
    --m-sample-metadata-file metadata.tsv \
    --m-feature-metadata-file taxonomy.qza \
    --o-visualization songbird-differentials-qurro.qzv \
    --verbose

In [None]:
q2.Visualization.load('songbird-differentials-qurro.qzv')

In [None]:
# post-hoc testing numerator and denominator by comparing 
#   p__Firmicutes (numerator) vs  p__Bacteroidota (denominator)
import pandas as pd
import itertools
from scipy.stats import ttest_ind

# import the taxonomy metadata
lrdf = pd.read_csv('songbird-firm-bact-data.tsv',
                   sep='\t', index_col=0).dropna(subset=['Current_Natural_Log_Ratio'])
# split data by VirusGroupType
lrs = {type_:df_.Current_Natural_Log_Ratio
       for type_, df_ in lrdf.groupby('donor')}
# get all combos
ids_ = list(itertools.combinations(lrs.keys(), 2))
# take t-test
tst = pd.DataFrame({(id1_, id2_):ttest_ind(lrs[id1_], lrs[id2_])
                    for id1_, id2_ in ids_},
                   ['test-stat','p-value']).T
tst.index.names = ['group one vs.', 'group two']
# view results
tst