We need to combine a few datasets:

* ICU (work with Paul Wischmeyer).
* Infant time series (work with Ruth Ley).
* Fecal Microbiome Transplant (work with Mike Sadowsky).
* American Gut Data.

All the files are provided in the `data-for-animations` folder in the American Gut FTP site. These commands also assume that the greengenes database is located in a folder `gg_13_8_otus` in this current path.

While seemingly trivial, this requires some finesse to properly combine
the data.

# 1) Deblur ICU and FMT data.

We need to start by deblurring the ICU and FMT data, so first we need to grab the data from Qiita (HDF5 formatted sequences) and import them to QIIME2.


In [None]:
%%bash


scripts/make_importable.sh data-for-animations/fmt-seqs.demux data-for-animations/fmt-sequences-qiime2 1
qiime tools import --input-path data-for-animations/fmt-sequences-qiime2/ \
--type 'SampleData[SequencesWithQuality]' \
--output-path data-for-animations/fmt-seqs.qza

scripts/make_importable.sh data-for-animations/icu-seqs.demux data-for-animations/icu-sequences-qiime2 1
qiime tools import --input-path data-for-animations/icu-sequences-qiime2/ \
--type 'SampleData[SequencesWithQuality]' \
--output-path data-for-animations/icu-seqs.qza

# 1.1) Deblur and trim at 125 NT.

In [None]:
%%bash

qiime deblur denoise-16S \
--output-dir data-for-animations/deblur-fmt \
--i-demultiplexed-seqs data-for-animations/fmt-seqs.qza \
--p-trim-length 125 \
--p-no-hashed-feature-ids \
--verbose --p-jobs-to-start 16

qiime deblur denoise-16S \
--output-dir data-for-animations/deblur-icu \
--i-demultiplexed-seqs data-for-animations/icu-seqs.qza \
--p-trim-length 125 \
--p-no-hashed-feature-ids \
--verbose --p-jobs-to-start 16

# 1.2) Merge sequences

In [None]:
%%bash

qiime feature-table merge \
--i-table1 data-for-animations/deblur-fmt/table.qza \
--i-table2 data-for-animations/deblur-icu/table.qza \
--o-merged-table data-for-animations/deblur-fmt-and-icu/table.qza

# 1.3) Remove blooms

Download the blooms FASTA file and remove the sequences that match the representative sequences.

In [None]:
%%bash

wget https://raw.githubusercontent.com/knightlab-analyses/bloom-analyses/master/data/newbloom.all.fna
scripts/remove-blooms.py
qiime tools import \
--input-path data-for-animations/deblur-fmt-and-icu/representative-sequences.upper.fna \
--output-path data-for-animations/deblur-fmt-and-icu/representative_sequences.qza \
--type 'FeatureData[Sequence]'

# 2)  Process the ITS data 

# 2.1) Trim to 125 NTs

In [None]:
%%bash

trim_fasta.py \
-i data-for-animations/its-seqs.fna \
-o data-for-animations/its.seqs.125nt.fna \
-l 125

# 2.2) pick OTUs at 99%.

**Note** that this means that you will have to switch to an enviornment that has QIIME 1.9.1 running.

`qsub` command used:

```bash
qsub -l mem=128gb,nodes=1:ppn=32 -l walltime=120:00:00 -e sortmerna.e -o sortmerna.o -N closed commands.sh
```

In [None]:
%%bash

pick_closed_reference_otus.py \
-i data-for-animations/its.seqs.125nt.fna \
-o data-for-animations/closed-ref-its \
-p data-for-animations/sortmerna-params.txt \
-r gg_13_8_otus/rep_set/99_otus.fasta \
-t gg_13_8_otus/taxonomy/99_otu_taxonomy.txt

# 3) Combine all the Illumina data (FMT, ICU and AGP).

# 3.1) Get American gut data

In [None]:
%%bash

mkdir -p deblur-ag
qiime tools import \
--input-path data-for-animations/otu_table_no_blooms_125nt_with_tax_min1250.biom \
--output-path data-for-animations/deblur-ag/table.qza \
--type 'FeatureTable[Frequency]'

scripts/make-representative-sequences.py

qiime tools import \
--input-path data-for-animations/deblur-ag/representative-sequences.upper.fna \
--output-path data-for-animations/deblur-ag/representative_sequences.qza \
--type 'FeatureData[Sequence]'

# 3.2) Combining the data from Illumina and 454

In [None]:
%%bash

qiime feature-table merge \
--i-table1 data-for-animations/deblur-fmt-and-icu/table.noblooms.qza \
--i-table2 data-for-animations/deblur-ag/table.qza \
--o-merged-table data-for-animations/deblur-ag-fmt-icu/table.qza

qiime feature-table merge-seq-data \
--i-data1 data-for-animations/deblur-ag/representative_sequences.qza \
--i-data2 data-for-animations/deblur-fmt-and-icu/representative_sequences.qza \
--o-merged-data data-for-animations/deblur-ag-fmt-icu/representative_sequences.qza

# 3.2) Re-pick OTUs from the deblurred sequences.

This will help reconcile the differences between the two technologies and sequence processing algorithms.

In [None]:
%%bash

qiime tools export data-for-animations/deblur-ag-fmt-icu/representative_sequences.qza \
--output-dir data-for-animations/deblur-ag-fmt-icu/

`qsub` command:

```bash
qsub -l mem=128gb,nodes=1:ppn=32 -l walltime=120:00:00 -e sortmerna.e -o sortmerna.o -N closed commands.sh
```

In [None]:
%%bash

pick_closed_reference_otus.py \
-i data-for-animations/deblur-ag-fmt-icu/dna-sequences.fasta \
-o data-for-animations/deblur-ag-fmt-icu/closed-ref/ \
-p data-for-animations/sortmerna-params.txt \
-r gg_13_8_otus/rep_set/99_otus.fasta \
-t gg_13_8_otus/taxonomy/99_otu_taxonomy.txt

# 3.3) Re-map into an OTU table using Daniel's script

In [None]:
%%bash

scripts/expand.py \
data-for-animations/deblur-ag-fmt-icu/table.qza \
data-for-animations/deblur-ag-fmt-icu/closed-ref/sortmerna_picked_otus/dna-sequences_otus.txt \
data-for-animations/deblur-ag-fmt-icu/expanded-otu-table.qza

# 3.4) Import GG tree

In [None]:
%%bash

qiime tools import \
--input-path gg_13_8_otus/trees/99_otus.tree \
--output-path data-for-animations/deblur-ag-fmt-icu/closed-ref/greengenes.99.qza \
--type 'Phylogeny[Rooted]'

# 4) Combine re-mapped OTU table and ITS OTU table.

# 4.1) Import ITS table into QIIME2

In [None]:
%%bash

qiime tools import \
--input-path data-for-animations/closed-ref-its/otu_table.biom  \
--output-path data-for-animations/closed-ref-its/table.qza \
--type 'FeatureTable[Frequency]'

# 4.2) merge OTU tables

In [None]:
%%bash

mkdir -p remapped-ag-fmt-icu-its

qiime feature-table merge \
--i-table1 data-for-animations/deblur-ag-fmt-icu/expanded-otu-table.qza \
--i-table2 data-for-animations/closed-ref-its/table.qza \
--o-merged-table data-for-animations/remapped-ag-fmt-icu-its/table.qza

qiime feature-table rarefy \
--i-table data-for-animations/remapped-ag-fmt-icu-its/table.qza \
--p-sampling-depth 1250 \
--o-rarefied-table data-for-animations/remapped-ag-fmt-icu-its/table.even1250.qza

# 5) Use Greengenes 99% to compute UniFrac

# 5.1) Compute UniFrac

`qsub` command:
    
```bash
qsub -l mem=64gb,nodes=1:ppn=10 -l walltime=120:00:00 -o state-unifrac.o -e state-unifrac.e -N state commands.sh
```

In [None]:
%%bash

qiime state-unifrac unweighted \
--i-table data-for-animations/remapped-ag-fmt-icu-its/table.even1250.qza \
--i-phylogeny data-for-animations/deblur-ag-fmt-icu/closed-ref/greengenes.99.qza \
--p-threads 10 \
--o-distance-matrix data-for-animations/remapped-ag-fmt-icu-its/unweighted-unifrac.even1250.qza

# 5.2) Ordinate distance matrix

In [None]:
qiime diversity pcoa \
--i-distance-matrix data-for-animations/remapped-ag-fmt-icu-its/unweighted-unifrac.even1250.qza  \
--o-pcoa data-for-animations/remapped-ag-fmt-icu-its/pcoa.unweighted-unifrac.even1250.qza