# Parkinson's Mouse Tutorial - Import & Demux

Run this notebook in `qiime2-2021.11`.

Well be working through the [pd-mouse tutorial](https://docs.qiime2.org/2021.11/tutorials/pd-mice/).

*Note: did you run `jupyter serverextension enable --py qiime2 --sys-prefix` before getting here?*

Also, see the [Jupyter Markdown documentation](https://jupyter.brynmawr.edu/services/public/dblank/Jupyter%20Notebook%20Users%20Manual.ipynb).

In [1]:
from os import getcwd, listdir, chdir, mkdir
import qiime2 as q2

In [2]:
getcwd()

'/home/raotoo/omics/MultiOmics/Microbiome_Qiime2'

In [3]:
listdir()

['01-Parkinson-Mouse-Tutorial-Import-Demux.ipynb',
 '.ipynb_checkpoints',
 '02-Parkinson-Mouse-Tutorial-Taxonomy-Phylogeny.ipynb',
 'readme.md',
 '03-Parkinson-Mouse-Tutorial-Diversity.ipynb']

In [14]:
mkdir('../Microbiome_Qiime2/processed')

In [16]:
chdir('../Microbiome_Qiime2/processed')
getcwd()

'/home/raotoo/omics/MultiOmics/Microbiome_Qiime2/processed'

## Download and View Metadata

We'll use `wget` to download the metadata file, and then visualize it in onw of two ways:
 - [QIIME 2 View Website](https://view.qiime2.org/)
 - [QIIME 2 CLI / Utilities](https://docs.qiime2.org/2021.11/tutorials/utilities/)
 - [QIIME 2 API](https://docs.qiime2.org/2021.11/interfaces/artifact-api/)
 
 *Note: If you are running this notebook on the HPC, you may need to copy and paste these commands into the "Grace Shell Access" under the "Clusters" menu of the Grace HPC Portal page. Make sure you are downloading the files into the appropriate directory. Aalternatively, simply download the files to you computer and use Jupyter Lab to upload the files.*

In [17]:
# Download Metadata
! wget \
    -O "metadata.tsv" \
    "https://data.qiime2.org/2021.11/tutorials/pd-mice/sample_metadata.tsv"

--2022-09-06 14:24:52--  https://data.qiime2.org/2021.11/tutorials/pd-mice/sample_metadata.tsv
Resolving data.qiime2.org (data.qiime2.org)... 54.200.1.12
Connecting to data.qiime2.org (data.qiime2.org)|54.200.1.12|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://docs.google.com/spreadsheets/d/e/2PACX-1vS6QgFSVey6INsa6vQLSPNGOyg18sck918nszO-cY7WE5llesrZtKhMIeI2xXY462S5-0EeW1H9MmNF/pub?gid=1509704122&single=true&output=tsv [following]
--2022-09-06 14:24:53--  https://docs.google.com/spreadsheets/d/e/2PACX-1vS6QgFSVey6INsa6vQLSPNGOyg18sck918nszO-cY7WE5llesrZtKhMIeI2xXY462S5-0EeW1H9MmNF/pub?gid=1509704122&single=true&output=tsv
Resolving docs.google.com (docs.google.com)... 142.251.32.206, 2607:f8b0:4000:80a::200e
Connecting to docs.google.com (docs.google.com)|142.251.32.206|:443... connected.
HTTP request sent, awaiting response... 307 Temporary Redirect
Location: https://doc-0k-6o-sheets.googleusercontent.com/pub/3sm34aofsvmt5ehut1q6bddr3o/tk6d0lrrs

In [18]:
# Peek at the metadata
! qiime tools inspect-metadata metadata.tsv

[1m              COLUMN NAME  TYPE       [0m
                  barcode  categorical
                 mouse_id  categorical
                 genotype  categorical
                  cage_id  categorical
                    donor  categorical
             donor_status  categorical
     days_post_transplant  numeric    
genotype_and_donor_status  categorical
[1m                     IDS:  [0m48
[1m                 COLUMNS:  [0m8


**Make metadata Visualization**

In [19]:
! qiime metadata tabulate \
  --m-input-file metadata.tsv \
  --o-visualization metadata.qzv

[32mSaved Visualization to: metadata.qzv[0m
[0m

In [20]:
! qiime tools peek metadata.qzv

[32mUUID[0m:        43838896-9962-48e5-92b2-b3e6d0e69253
[32mType[0m:        Visualization


In [21]:
# Visualize via API
q2.Visualization.load('metadata.qzv')

## Import data into QIIME 2

We will import:
 - [Manifest File](https://docs.qiime2.org/2021.11/tutorials/importing/#fastq-manifest-formats)
 - Demultiplexed Sequences (contrast to Multiplexed Sequences)
 
See the [Importing Data Tutorial](https://docs.qiime2.org/2021.11/tutorials/importing/#importing-data) for more information.

In [23]:
# get manifest file
!wget \
  -O "manifest.tsv" \
  "https://data.qiime2.org/2021.11/tutorials/pd-mice/manifest"

--2022-09-06 14:28:48--  https://data.qiime2.org/2021.11/tutorials/pd-mice/manifest
Resolving data.qiime2.org (data.qiime2.org)... 54.200.1.12
Connecting to data.qiime2.org (data.qiime2.org)|54.200.1.12|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2021.11/tutorials/pd-mice/manifest [following]
--2022-09-06 14:28:49--  https://s3-us-west-2.amazonaws.com/qiime2-data/2021.11/tutorials/pd-mice/manifest
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.220.8
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.220.8|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4640 (4.5K) [binary/octet-stream]
Saving to: 'manifest.tsv'


2022-09-06 14:28:49 (17.6 MB/s) - 'manifest.tsv' saved [4640/4640]



In [24]:
# get demultiplexed sequences
!wget \
  -O "demultiplexed_seqs.zip" \
  "https://data.qiime2.org/2021.11/tutorials/pd-mice/demultiplexed_seqs.zip"

--2022-09-06 14:28:52--  https://data.qiime2.org/2021.11/tutorials/pd-mice/demultiplexed_seqs.zip
Resolving data.qiime2.org (data.qiime2.org)... 54.200.1.12
Connecting to data.qiime2.org (data.qiime2.org)|54.200.1.12|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2021.11/tutorials/pd-mice/demultiplexed_seqs.zip [following]
--2022-09-06 14:28:52--  https://s3-us-west-2.amazonaws.com/qiime2-data/2021.11/tutorials/pd-mice/demultiplexed_seqs.zip
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.220.8
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.220.8|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21508775 (21M) [application/zip]
Saving to: 'demultiplexed_seqs.zip'


2022-09-06 14:28:53 (21.9 MB/s) - 'demultiplexed_seqs.zip' saved [21508775/21508775]



In [25]:
# unzip sequences
! unzip demultiplexed_seqs.zip

Archive:  demultiplexed_seqs.zip
   creating: demultiplexed_seqs/
  inflating: demultiplexed_seqs/10483.recip.539.ASO.PD4.D7_4_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.539.ASO.PD4.D14_5_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.413.WT.HC2.D7_12_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.220.WT.OB1.D7_30_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.458.ASO.HC3.D49_2_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.538.WT.PD4.D21_4_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.459.WT.HC3.D14_2_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.461.ASO.HC3.D7_20_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.465.ASO.PD3.D14_16_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.461.ASO.HC3.D21_11_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.540.ASO.HC4.D7_7_L001_R1_001.fastq.gz  
  i

In [26]:
! head manifest.tsv

sample-id	absolute-filepath
recip.220.WT.OB1.D7	$PWD/demultiplexed_seqs/10483.recip.220.WT.OB1.D7_30_L001_R1_001.fastq.gz
recip.290.ASO.OB2.D1	$PWD/demultiplexed_seqs/10483.recip.290.ASO.OB2.D1_27_L001_R1_001.fastq.gz
recip.389.WT.HC2.D21	$PWD/demultiplexed_seqs/10483.recip.389.WT.HC2.D21_1_L001_R1_001.fastq.gz
recip.391.ASO.PD2.D14	$PWD/demultiplexed_seqs/10483.recip.391.ASO.PD2.D14_5_L001_R1_001.fastq.gz
recip.391.ASO.PD2.D21	$PWD/demultiplexed_seqs/10483.recip.391.ASO.PD2.D21_1_L001_R1_001.fastq.gz
recip.391.ASO.PD2.D7	$PWD/demultiplexed_seqs/10483.recip.391.ASO.PD2.D7_15_L001_R1_001.fastq.gz
recip.400.ASO.HC2.D14	$PWD/demultiplexed_seqs/10483.recip.400.ASO.HC2.D14_32_L001_R1_001.fastq.gz
recip.401.ASO.HC2.D7	$PWD/demultiplexed_seqs/10483.recip.401.ASO.HC2.D7_22_L001_R1_001.fastq.gz
recip.403.ASO.PD2.D21	$PWD/demultiplexed_seqs/10483.recip.403.ASO.PD2.D21_31_L001_R1_001.fastq.gz


**Import and Summarize Data**

In [27]:
! qiime tools import \
  --type "SampleData[SequencesWithQuality]" \
  --input-format SingleEndFastqManifestPhred33V2 \
  --input-path ./manifest.tsv \
  --output-path ./demux_seqs.qza

[32mImported ./manifest.tsv as SingleEndFastqManifestPhred33V2 to ./demux_seqs.qza[0m
[0m

In [28]:
! qiime demux summarize \
  --i-data ./demux_seqs.qza \
  --o-visualization ./demux_seqs.qzv

[32mSaved Visualization to: ./demux_seqs.qzv[0m
[0m

In [29]:
q2.Visualization.load('demux_seqs.qzv')

## Denoising Sequence data

 - DADA2 approach as outlined in the tutorial.
 - Alternate trimming w/ DADA2.
 - Using deblur w/ default trimming.

### Default

In [30]:
getcwd()

'/home/raotoo/omics/MultiOmics/Microbiome_Qiime2/processed'

In [31]:
! qiime dada2 denoise-single \
    --i-demultiplexed-seqs ./demux_seqs.qza \
    --p-trunc-len 150 \
    --p-n-threads 8 \
    --o-table ./dada2_table.qza \
    --o-representative-sequences ./dada2_rep_set.qza \
    --o-denoising-stats ./dada2_stats.qza \
    --verbose

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_single.R /tmp/qiime2-archive-146a342y/5e2c35e8-e01c-45db-91b3-ea249c16a24f/data /tmp/tmptb5im1yi/output.tsv.biom /tmp/tmptb5im1yi/track.tsv /tmp/tmptb5im1yi 150 0 2.0 2 Inf independent consensus 1.0 8 1000000 NULL 16

1: Setting LC_CTYPE failed, using "C" 
2: Setting LC_COLLATE failed, using "C" 
3: Setting LC_TIME failed, using "C" 
4: Setting LC_MESSAGES failed, using "C" 
5: Setting LC_MONETARY failed, using "C" 
6: Setting LC_PAPER failed, using "C" 
7: Setting LC_MEASUREMENT failed, using "C" 
R version 4.1.2 (2021-11-01) 
Loading required package: Rcpp
DADA2: 1.22.0 / Rcpp: 1.0.8 / RcppParallel: 5.1.5 
1) Filtering ................................................
2) Learning Error Rates
35926200 total bases in 239508 reads from 48 s

In [32]:
# summarize denoising stats
! qiime metadata tabulate \
    --m-input-file ./dada2_stats.qza  \
    --o-visualization ./dada2_stats.qzv

[32mSaved Visualization to: ./dada2_stats.qzv[0m
[0m

In [33]:
q2.Visualization.load('dada2_stats.qzv')

In [34]:
# summarize ESV table
! qiime feature-table summarize \
    --i-table ./dada2_table.qza \
    --m-sample-metadata-file ./metadata.tsv \
    --o-visualization ./dada2_table.qzv

[32mSaved Visualization to: ./dada2_table.qzv[0m
[0m

In [35]:
q2.Visualization.load('dada2_table.qzv')

In [36]:
! qiime feature-table tabulate-seqs \
    --i-data ./dada2_rep_set.qza \
    --o-visualization ./dada2_rep_set.qzv

[32mSaved Visualization to: ./dada2_rep_set.qzv[0m
[0m

In [None]:
q2.Visualization.load('dada2_rep_set.qzv')

### Alternate Trimming w/ DADA2

In [None]:
! qiime dada2 denoise-single \
    --i-demultiplexed-seqs ./demux_seqs.qza \
    --p-trim-left 30 \
    --p-trunc-len 130 \
    --o-table ./dada2_table_alt.qza \
    --o-representative-sequences ./dada2_rep_set_alt.qza \
    --o-denoising-stats ./dada2_stats_alt.qza \
    --verbose

In [None]:
# summarize denoising stats
! qiime metadata tabulate \
    --m-input-file ./dada2_stats_alt.qza  \
    --o-visualization ./dada2_stats_alt.qzv

In [None]:
q2.Visualization.load('dada2_stats_alt.qzv')

In [None]:
# summarize ESV table
! qiime feature-table summarize \
    --i-table ./dada2_table_alt.qza \
    --m-sample-metadata-file ./metadata.tsv \
    --o-visualization ./dada2_table_alt.qzv

In [None]:
q2.Visualization.load('dada2_table_alt.qzv')

### deblur w/ default

In [None]:
! qiime quality-filter q-score \
    --i-demux ./demux_seqs.qza \
    --o-filtered-sequences demux-seqs-deblur.qza \
    --o-filter-stats demux-deblur-stats.qza

In [None]:
# Defaults to Greengenes. 
#    If you want to use SILVA or another ref db, then use:
#    `qiime deblur denoise-other`
#    silva files are located here: https://docs.qiime2.org/2021.11/data-resources/
! qiime deblur denoise-16S \
    --i-demultiplexed-seqs demux-seqs-deblur.qza \
    --p-trim-length 150 \
    --o-representative-sequences rep-seqs-deblur.qza \
    --o-table table-deblur.qza \
    --p-sample-stats \
    --o-stats deblur-stats.qza

In [None]:
! qiime metadata tabulate \
    --m-input-file demux-deblur-stats.qza \
    --o-visualization demux-deblur-stats.qzv

! qiime deblur visualize-stats \
    --i-deblur-stats deblur-stats.qza \
    --o-visualization deblur-stats.qzv

In [None]:
q2.Visualization.load('demux-deblur-stats.qzv')

In [None]:
q2.Visualization.load('deblur-stats.qzv')

In [None]:
! qiime feature-table summarize \
    --i-table table-deblur.qza \
    --o-visualization table-deblur.qzv \
    --m-sample-metadata-file metadata.tsv

! qiime feature-table tabulate-seqs \
    --i-data rep-seqs-deblur.qza \
    --o-visualization rep-seqs-deblur.qzv

In [None]:
q2.Visualization.load('table-deblur.qzv')

In [None]:
q2.Visualization.load('rep-seqs-deblur.qzv')