# Run this notebook in qiime2-2023.2. Dokdo has been installed within this environment.

Following this tutorial to run lefse from for plots on the cladogram and 
`https://forum.qiime2.org/t/lefse-after-qiime2-to-test-at-all-taxonomic-levels/15462/3`

In [1]:
from os import getcwd, listdir, chdir, mkdir
import pandas as pd
import qiime2 as q2

In [2]:
getcwd()

'/mnt/e/NASA_microbiome'

In [3]:
chdir('./processed')
getcwd()

'/mnt/e/NASA_microbiome/processed'

# LEfSe Preparation with Dokdo
# We'll prepare our files with dokdo for use in LEfSe-Galaxy. You can read the LEfSe paper for more details.

In [4]:
! dokdo prepare-lefse --h

usage: dokdo prepare-lefse -t PATH -x PATH -m PATH -o PATH -c TEXT [-s TEXT]
                           [-u TEXT] [-w TEXT] [-h]

Create a TSV file which can be used as input for the LEfSe tool. This command
1) collapses the input feature table at the genus level, 2) computes relative
frequency of the features, 3) performs sample filtration if requested, 4)
changes the format of feature names, 5) adds the relevant metadata as 'Class',
'Subclass', and 'Subject', and 6) writes a text file which can be used as
input for LEfSe.

Arguments:
  -t PATH, --table-file PATH
                        Path to the table file with the
                        'FeatureTable[Frequency]' type. [required]
  -x PATH, --taxonomy-file PATH
                        Path to the taxonomy file with the
                        'FeatureData[Taxonomy]' type. [required]
  -m PATH, --metadata-file PATH
                        Path to the metadata file. [required]
  -o PATH, --output-file PATH
                        Pa

In [7]:
! dokdo prepare-lefse \
    -t ./table-no-ecmu-hits-abund.qza \
    -x ./taxonomy.qza \
    -m ./NASA-Metadata.tsv \
    -c TreatmentGroup \
    -u Dose-Gy \
    -o lefse_table.tsv \
    -w "[TreatmentGroup] IN ('sham', 'IR', 'IR+HLU')"

[0m

In [None]:
lefse-format_input.py \
/mnt/e/NASA_microbiome/processed/lefse_table.tsv \
/mnt/e/NASA_microbiome/processed/formatted_table.in \
-c 1 \
-u 2 \
-o 1000000 \
--output_table /mnt/e/NASA_microbiome/processed/formatted_table.tsv

In [None]:
run_lefse.py \
/mnt/e/NASA_microbiome/processed/formatted_table.in \
/mnt/e/NASA_microbiome/processed/output.res

In [None]:
lefse-plot_res.py \
/mnt/e/NASA_microbiome/processed/output.res \
/mnt/e/NASA_microbiome/processed/output.pdf \
--format pdf

In [None]:
lefse-plot_cladogram.py \
/mnt/e/NASA_microbiome/processed/output.res \
/mnt/e/NASA_microbiome/processed/output.cladogram.pdf \
--format pdf

### Run all

In [None]:
LEfSe
In this section, I will walk you through how I run the LEfSe (linear discriminant analysis effect size) tool. But before I do that, it is important for you to acknowledge this:

LEfSe method is more a discriminant analysis method rather than a DA method. (Lin and Peddada, 2020; PMID: 33268781)

In order to use LEfSe, you will need to open two Terminal windows: one for your usual QIIME 2 environment and another for running LEfSe. For the latter, you should create a new conda environment and install LEfSe as described below.

Terminal for running QIIME 2 and Dokdo:
$ conda activate qiime2-2020.8
Terminal for running LEfSe:
$ conda create -n lefse -c conda-forge python=2.7.15
$ conda activate lefse
$ conda install -c bioconda -c conda-forge lefse
After you have both terminals set up, you can create an input file for LEfSe from a QIIME 2 feature table. We will use the "Moving Pictures" tutorial as an example (run below in the QIIME 2 terminal).

$ dokdo prepare-lefse \
-t data/moving-pictures-tutorial/table.qza \
-x data/moving-pictures-tutorial/taxonomy.qza \
-m data/moving-pictures-tutorial/sample-metadata.tsv \
-o output/Useful-Information/input_table.tsv \
-c body-site \
-u subject \
-w "[body-site] IN ('tongue', 'gut', 'left palm')"
Click here 218 to view the input_table.tsv file.

Next, we need to format the input table (run below in the LEfSe terminal):

$ lefse-format_input.py \
output/Useful-Information/input_table.tsv \
output/Useful-Information/formatted_table.in \
-c 1 \
-u 2 \
-o 1000000 \
--output_table output/Useful-Information/formatted_table.tsv
Click here 90 to view the formatted_table.in file. Click here 75 to view the formatted_table.tsv file.

We can run LEfSe with (run below in the LEfSe terminal):

$ run_lefse.py \
output/Useful-Information/formatted_table.in \
output/Useful-Information/output.res
Which will give:

Number of significantly discriminative features: 199 ( 199 ) before internal wilcoxon
Number of discriminative features with abs LDA score > 2.0 : 199
Click here 47 to view the output.res file.

We can then list the discriminative features and their LDA scores (run below in the LEfSe terminal):

$ lefse-plot_res.py \
output/Useful-Information/output.res \
output/Useful-Information/output.pdf \
--format pdf
Click here 65 to view the output.pdf file.

Finally, you can create a cladogram for the discriminative features (run below in the LEfSe terminal):

$ lefse-plot_cladogram.py \
output/Useful-Information/output.res \
output/Useful-Information/output.cladogram.pdf \
--format pdf