# Parkinson's Mouse Tutorial - Differential Abundance

Run this notebook in `qiime2-2020.6`.

Continuing the [pd-mouse tutorial](https://docs.qiime2.org/2021.11/tutorials/pd-mice/), [here](https://docs.qiime2.org/2021.11/tutorials/pd-mice/#differential-abundance-with-ancom).

In [1]:
from os import getcwd, listdir, chdir, mkdir
import pandas as pd
import qiime2 as q2

In [2]:
getcwd()

'/home/mrobeson/projects/pd_mouse_tutorial'

In [4]:
chdir('./processed')
getcwd()

'/home/mrobeson/projects/pd_mouse_tutorial/processed'

## ANCOM (ANalysis of COmposition of Microbiomes)

*Non-rarefaction based on compositional differential abundance.*

ANCOM table result interpretation is well described [here](https://forum.qiime2.org/t/interpreting-values-from-an-ancom-percentile-abundance-table/1497/14).

- [Paper](https://dx.doi.org/10.3402%2Fmehd.v26.27663).
- [ANCOM Explained](http://mortonjt.blogspot.com/2016/06/ancom-explained.html).
- [AMCOM Explained 2](https://forum.qiime2.org/t/specify-w-cutoff-for-anacom/1844/10).
- [ANCOM Presentation](http://weallseqtoseq.blogspot.com/2018/07/microbiome-abundance-or-relative.html?).
- [QIIME 2: ANCOM video](https://www.youtube.com/watch?v=A6o2nOnDsJU&list=PLbVDKwGpb3XmkQmoBy1wh3QfWlWdn_pTT).

In [5]:
# filter features
! qiime feature-table filter-features \
    --i-table ./table-no-ecmu-hits.qza \
    --p-min-frequency 50 \
    --p-min-samples 4 \
    --o-filtered-table ./table-no-ecmu-hits-abund.qza

[32mSaved FeatureTable[Frequency] to: ./table-no-ecmu-hits-abund.qza[0m


In [6]:
# add pseudo-count
! qiime composition add-pseudocount \
    --i-table ./table-no-ecmu-hits-abund.qza \
    --o-composition-table ./table-no-ecmu-hits-abund-comp.qza

[32mSaved FeatureTable[Composition] to: ./table-no-ecmu-hits-abund-comp.qza[0m


In [7]:
! qiime composition ancom \
    --i-table ./table-no-ecmu-hits-abund-comp.qza \
    --m-metadata-file ./metadata.tsv \
    --m-metadata-column donor \
    --o-visualization ./ancom-donor.qzv

! qiime composition ancom \
    --i-table ./table-no-ecmu-hits-abund-comp.qza \
    --m-metadata-file ./metadata.tsv \
    --m-metadata-column genotype \
    --o-visualization ./ancom-genotype.qzv

[32mSaved Visualization to: ./ancom-donor.qzv[0m
[32mSaved Visualization to: ./ancom-genotype.qzv[0m


In [None]:
q2.Visualization.load('./ancom_donor.qzv')

In [None]:
q2.Visualization.load('./ancom_genotype.qzv')

## OPTIONAL:
### ALDEx2 (Anova-Like Differential EXpression, v2)

*Non-rarefaction based compositional differential abundance.*

- [ALDEx2 tutorial](https://library.qiime2.org/plugins/q2-aldex2/24/).
- [ALDEx2 Documentation](https://rdrr.io/bioc/ALDEx2/f/vignettes/ALDEx2_vignette.Rmd), see "Explaining the outputs" section for output descriptions.


Will work for versions `qiime2-2020.6` thorugh `qiime2-2021.2`
```Install:
R -e 'install.packages("BiocManager", repos="http://cran.us.r-project.org")' 
R -e 'BiocManager::install("ALDEx2")' 
conda install -c dgiguere q2-aldex2 -y
qiime dev refresh-cache
```

[Paper](https://doi.org/10.1186/2049-2618-2-15).

In [8]:
! qiime aldex2 aldex2 \
    --i-table ./table-no-ecmu-hits.qza  \
    --m-metadata-file ./metadata.tsv \
    --m-metadata-column donor \
    --output-dir aldex2-donor

[32mSaved FeatureData[Differential] to: aldex2-donor/differentials.qza[0m


In [9]:
! qiime aldex2 effect-plot \
    --i-table aldex2-donor/differentials.qza \
    --o-visualization aldex2-donor/differentials.qzv

[32mSaved Visualization to: aldex2-donor/differentials.qzv[0m


In [10]:
! qiime aldex2 extract-differences \
    --i-table aldex2-donor/differentials.qza \
    --p-sig-threshold 0.1 \
    --p-effect-threshold 0 \
    --p-difference-threshold 0 \
    --o-differentials aldex2-donor/differentials-sig.qza 

[32mSaved FeatureData[Differential] to: aldex2-donor/differentials-sig.qza[0m


*You can export the tsv and view in the terminal or Excel, etc...*

In [11]:
! qiime tools export \
    --input-path aldex2-donor/differentials-sig.qza \
    --output-path aldex2-donor/differentials-sig-export/

[32mExported aldex2-donor/differentials-sig.qza as DifferentialDirectoryFormat to directory aldex2-donor/differentials-sig-export/[0m


In [12]:
! head aldex2-donor/differentials-sig-export/differentials.tsv

featureid	rab.all	rab.win.hc_1	rab.win.pd_1	diff.btw	diff.win	effect	overlap	we.ep	we.eBH	wi.ep	wi.eBH
#q2:types	numeric	numeric	numeric	numeric	numeric	numeric	numeric	numeric	numeric	numeric	numeric
60c57911662a9159dfdd0fc05d975a55	0.552951638066072	4.16064310757047	-0.484237826253939	-4.53523543324383	5.82467267266369	-0.697060547097619	0.259596627816163	0.00727913493447486	0.0292281533456087	0.0142666257634695	0.044958563294386
c5b4c6b372dbc13b6a7f2d466fc7335f	7.49862023978907	6.40531878979824	7.90913705655747	1.53200712539108	5.77587350709854	0.355687299919167	0.26302084412879	0.0109326537835353	0.0480947320884806	0.00510374301703793	0.0260801246165414
0f4cadbeaa245b98e599c6d662ae9c19	2.258949831832	-0.34923550321395	9.26029397311959	9.29586049265328	3.9820940953238	1.96842709982578	0.0657124891873758	2.06984422571497e-08	2.59232745717264e-07	7.26024150277244e-08	9.33991052293633e-07
dafe809740d0545dc25c6939a84a1820	5.68754838703497	4.89108666960638	6.15028312185463	1.445638564006

*Or... you can use the QIIME 2 API to view the results...*

In [13]:
aldex_stats = q2.Artifact.load('aldex2-donor/differentials-sig.qza')

In [14]:
pd.set_option('display.max_rows', None)
df = aldex_stats.view(pd.DataFrame)
df

Unnamed: 0_level_0,rab.all,rab.win.hc_1,rab.win.pd_1,diff.btw,diff.win,effect,overlap,we.ep,we.eBH,wi.ep,wi.eBH
featureid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
60c57911662a9159dfdd0fc05d975a55,0.552952,4.160643,-0.484238,-4.535235,5.824673,-0.697061,0.259597,0.007279135,0.02922815,0.01426663,0.04495856
c5b4c6b372dbc13b6a7f2d466fc7335f,7.49862,6.405319,7.909137,1.532007,5.775874,0.355687,0.263021,0.01093265,0.04809473,0.005103743,0.02608012
0f4cadbeaa245b98e599c6d662ae9c19,2.25895,-0.349236,9.260294,9.29586,3.982094,1.968427,0.065712,2.069844e-08,2.592327e-07,7.260242e-08,9.339911e-07
dafe809740d0545dc25c6939a84a1820,5.687548,4.891087,6.150283,1.445639,2.462563,0.492158,0.195836,0.006682281,0.02926429,0.0005466409,0.003483005
3d766867c97f511431bf97e059d6498f,6.182103,-0.10555,8.417866,8.387335,4.055537,1.860054,0.048146,1.823702e-06,1.455458e-05,3.790236e-09,7.131326e-08
3f55a8ca784ae7190790023a30cd859c,0.605001,4.124664,-0.383462,-4.321328,5.936039,-0.652224,0.255042,0.009409469,0.03737263,0.012994,0.04750446
d39f37b814ebd81091a949f8bd9d1710,2.858053,8.434923,-0.534015,-8.907173,2.805022,-3.103294,0.022121,1.142194e-10,2.871002e-09,8.966655e-11,3.486608e-09
d2d1d9d57e61a764383ea2c84cef04c5,1.492857,-0.473731,5.431675,5.531877,4.57814,1.081886,0.141927,0.0001589836,0.001086041,3.842123e-05,0.0003037841
599ba6458e91e2527f358f547ea39261,2.14818,7.245565,-0.338755,-7.517825,3.916554,-1.762403,0.0625,1.965152e-06,1.747633e-05,7.482394e-08,9.511225e-07
eea6b86c0c75e740670ccc50613b1b23,1.065443,5.673306,-0.513326,-5.912153,5.313039,-0.972136,0.169161,0.000375157,0.002238411,0.0002563649,0.001666356


In [15]:
df[df.loc[:,'wi.eBH'] < 0.05]

Unnamed: 0_level_0,rab.all,rab.win.hc_1,rab.win.pd_1,diff.btw,diff.win,effect,overlap,we.ep,we.eBH,wi.ep,wi.eBH
featureid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
60c57911662a9159dfdd0fc05d975a55,0.552952,4.160643,-0.484238,-4.535235,5.824673,-0.697061,0.259597,0.007279135,0.02922815,0.01426663,0.04495856
c5b4c6b372dbc13b6a7f2d466fc7335f,7.49862,6.405319,7.909137,1.532007,5.775874,0.355687,0.263021,0.01093265,0.04809473,0.005103743,0.02608012
0f4cadbeaa245b98e599c6d662ae9c19,2.25895,-0.349236,9.260294,9.29586,3.982094,1.968427,0.065712,2.069844e-08,2.592327e-07,7.260242e-08,9.339911e-07
dafe809740d0545dc25c6939a84a1820,5.687548,4.891087,6.150283,1.445639,2.462563,0.492158,0.195836,0.006682281,0.02926429,0.0005466409,0.003483005
3d766867c97f511431bf97e059d6498f,6.182103,-0.10555,8.417866,8.387335,4.055537,1.860054,0.048146,1.823702e-06,1.455458e-05,3.790236e-09,7.131326e-08
3f55a8ca784ae7190790023a30cd859c,0.605001,4.124664,-0.383462,-4.321328,5.936039,-0.652224,0.255042,0.009409469,0.03737263,0.012994,0.04750446
d39f37b814ebd81091a949f8bd9d1710,2.858053,8.434923,-0.534015,-8.907173,2.805022,-3.103294,0.022121,1.142194e-10,2.871002e-09,8.966655e-11,3.486608e-09
d2d1d9d57e61a764383ea2c84cef04c5,1.492857,-0.473731,5.431675,5.531877,4.57814,1.081886,0.141927,0.0001589836,0.001086041,3.842123e-05,0.0003037841
599ba6458e91e2527f358f547ea39261,2.14818,7.245565,-0.338755,-7.517825,3.916554,-1.762403,0.0625,1.965152e-06,1.747633e-05,7.482394e-08,9.511225e-07
eea6b86c0c75e740670ccc50613b1b23,1.065443,5.673306,-0.513326,-5.912153,5.313039,-0.972136,0.169161,0.000375157,0.002238411,0.0002563649,0.001666356


In [16]:
df.shape

(47, 11)

## OPTIONAL: 
### dsFDR (Discrete False-Discovery Rate)

*Warning: may not work with current and later versions of QIIME 2 due to changes in the dependencies that the plugin requires. Try earlier versions, e.g. `qiime2-2020.6` through `qiime2-2021.2`.*

Rarefaction based differential abundance through [q2-dsfdr plugin](https://forum.qiime2.org/t/q2-dsfdr-community-tutorial/5559).

[Paper](https://msystems.asm.org/content/2/6/e00092-17).

*Rarefy the table first!*


In [None]:
! qiime feature-table rarefy \
    --i-table ./table-no-ecmu-hits.qza \
    --p-sampling-depth 2000 \
    --o-rarefied-table ./table-no-ecmu-hits-rar.qza

In [None]:
! qiime dsfdr permutation-fdr \
    --i-table ./table-no-ecmu-hits-rar.qza \
    --m-metadata-file metadata.tsv \
    --m-metadata-column 'donor' \
    --o-visualization dsfdr-donor.qzv \
    --verbose

In [None]:
dsfdr_results = q2.Visualization.load('dsfdr-donor.qzv')

In [None]:
# export csv file. You can look at in excel or view within the noteook below.
dsfdr_results.export_data('./dsfdr-export')
listdir('./dsfdr-export')

In [None]:
pd.set_option('display.max_rows', None)
dsfdr_sig_results = pd.read_csv('./dsfdr-export/dsfdr.csv')
dsfdr_sig_results

In [None]:
dsfdr_sig_results[dsfdr_sig_results.loc[:,'Reject'] == True]