# Parkinson's Mouse Tutorial - Import & Demux

Run this notebook in `qiime2-2021.11`.

Well be working through the [pd-mouse tutorial](https://docs.qiime2.org/2021.11/tutorials/pd-mice/).

*Note: did you run `jupyter serverextension enable --py qiime2 --sys-prefix` before getting here?*

Also, see the [Jupyter Markdown documentation](https://jupyter.brynmawr.edu/services/public/dblank/Jupyter%20Notebook%20Users%20Manual.ipynb).

In [6]:
from os import getcwd, listdir, chdir, mkdir
import qiime2 as q2

In [7]:
getcwd()

'/Users/raymondotoo/Library/CloudStorage/OneDrive-UALittleRock/github_stuff/MultiOmics/Microbiome_Qiime2/Microbiome_Qiime2'

In [None]:
listdir()

['01-Parkinson-Mouse-Tutorial-Import-Demux.ipynb',
 '02-Parkinson-Mouse-Tutorial-Taxonomy-Phylogeny.ipynb',
 'metadata.tsv',
 'manifest.tsv',
 'demultiplexed_seqs.zip',
 'readme.md',
 '.ipynb_checkpoints',
 '03-Parkinson-Mouse-Tutorial-Diversity.ipynb',
 'processed']

In [None]:
mkdir('../Microbiome_Qiime2/processed')

FileExistsError: [Errno 17] File exists: '../Microbiome_Qiime2/processed'

In [None]:
chdir('../Microbiome_Qiime2/processed')
getcwd()

'/Users/raymondotoo/Library/CloudStorage/OneDrive-UALittleRock/github_stuff/MultiOmics/Microbiome_Qiime2/Microbiome_Qiime2/processed'

## Download and View Metadata

We'll use `wget` to download the metadata file, and then visualize it in onw of two ways:
 - [QIIME 2 View Website](https://view.qiime2.org/)
 - [QIIME 2 CLI / Utilities](https://docs.qiime2.org/2021.11/tutorials/utilities/)
 - [QIIME 2 API](https://docs.qiime2.org/2021.11/interfaces/artifact-api/)
 
 *Note: If you are running this notebook on the HPC, you may need to copy and paste these commands into the "Grace Shell Access" under the "Clusters" menu of the Grace HPC Portal page. Make sure you are downloading the files into the appropriate directory. Aalternatively, simply download the files to you computer and use Jupyter Lab to upload the files.*

In [None]:
# Download Metadata
! wget \
    -O "metadata.tsv" \
    "https://data.qiime2.org/2021.11/tutorials/pd-mice/sample_metadata.tsv"

--2022-09-08 11:15:18--  https://data.qiime2.org/2021.11/tutorials/pd-mice/sample_metadata.tsv
Resolving data.qiime2.org (data.qiime2.org)... 54.200.1.12
Connecting to data.qiime2.org (data.qiime2.org)|54.200.1.12|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://docs.google.com/spreadsheets/d/e/2PACX-1vS6QgFSVey6INsa6vQLSPNGOyg18sck918nszO-cY7WE5llesrZtKhMIeI2xXY462S5-0EeW1H9MmNF/pub?gid=1509704122&single=true&output=tsv [following]
--2022-09-08 11:15:18--  https://docs.google.com/spreadsheets/d/e/2PACX-1vS6QgFSVey6INsa6vQLSPNGOyg18sck918nszO-cY7WE5llesrZtKhMIeI2xXY462S5-0EeW1H9MmNF/pub?gid=1509704122&single=true&output=tsv
Resolving docs.google.com (docs.google.com)... 142.251.32.238
Connecting to docs.google.com (docs.google.com)|142.251.32.238|:443... connected.
HTTP request sent, awaiting response... 307 Temporary Redirect
Location: https://doc-0k-6o-sheets.googleusercontent.com/pub/3sm34aofsvmt5ehut1q6bddr3o/r26h1rmdvahqmfrkv0u5dgne4g/16626537

In [None]:
# Peek at the metadata
! qiime tools inspect-metadata metadata.tsv

[1m              COLUMN NAME  TYPE       [0m
                  barcode  categorical
                 mouse_id  categorical
                 genotype  categorical
                  cage_id  categorical
                    donor  categorical
             donor_status  categorical
     days_post_transplant  numeric    
genotype_and_donor_status  categorical
[1m                     IDS:  [0m48
[1m                 COLUMNS:  [0m8


**Make metadata Visualization**

In [None]:
! qiime metadata tabulate \
  --m-input-file metadata.tsv \
  --o-visualization metadata.qzv

[32mSaved Visualization to: metadata.qzv[0m
[0m

In [None]:
! qiime tools peek metadata.qzv

[32mUUID[0m:        8dfd7149-c2aa-44ec-a6e0-a2df56d055bb
[32mType[0m:        Visualization


In [None]:
pip install update q2_itsxpress

Collecting update
  Downloading update-0.0.1-py2.py3-none-any.whl (2.9 kB)
Collecting q2_itsxpress
  Downloading q2_itsxpress-1.8.0-py3-none-any.whl (13 kB)
Collecting style==1.1.0
  Downloading style-1.1.0-py2.py3-none-any.whl (6.4 kB)
Collecting itsxpress>=1.8.0
  Downloading itsxpress-1.8.0-py3-none-any.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.6/1.6 MB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting biopython>=1.60
  Downloading biopython-1.79-cp38-cp38-macosx_10_9_x86_64.whl (2.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m14.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Installing collected packages: style, update, biopython, itsxpress, q2_itsxpress
Successfully installed biopython-1.79 itsxpress-1.8.0 q2_itsxpress-1.8.0 style-1.1.0 update-0.0.1
Note: you may need to restart the kernel to use updated packages.


In [None]:
conda update itsxpress


PackageNotInstalledError: Package is not installed in prefix.
  prefix: /Users/raymondotoo/opt/anaconda3/envs/qiime2-2022.2
  package name: itsxpress



Note: you may need to restart the kernel to use updated packages.


In [None]:
# Visualize via API
q2.Visualization.load('metadata.qzv')

## Import data into QIIME 2

We will import:
 - [Manifest File](https://docs.qiime2.org/2021.11/tutorials/importing/#fastq-manifest-formats)
 - Demultiplexed Sequences (contrast to Multiplexed Sequences)
 
See the [Importing Data Tutorial](https://docs.qiime2.org/2021.11/tutorials/importing/#importing-data) for more information.

In [None]:
# get manifest file
!wget \
  -O "manifest.tsv" \
  "https://data.qiime2.org/2021.11/tutorials/pd-mice/manifest"

--2022-09-08 11:18:02--  https://data.qiime2.org/2021.11/tutorials/pd-mice/manifest
Resolving data.qiime2.org (data.qiime2.org)... 54.200.1.12
Connecting to data.qiime2.org (data.qiime2.org)|54.200.1.12|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2021.11/tutorials/pd-mice/manifest [following]
--2022-09-08 11:18:03--  https://s3-us-west-2.amazonaws.com/qiime2-data/2021.11/tutorials/pd-mice/manifest
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.245.48
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.245.48|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4640 (4.5K) [binary/octet-stream]
Saving to: 'manifest.tsv'


2022-09-08 11:18:03 (17.2 MB/s) - 'manifest.tsv' saved [4640/4640]



In [None]:
# get demultiplexed sequences
!wget \
  -O "demultiplexed_seqs.zip" \
  "https://data.qiime2.org/2021.11/tutorials/pd-mice/demultiplexed_seqs.zip"

--2022-09-08 11:18:12--  https://data.qiime2.org/2021.11/tutorials/pd-mice/demultiplexed_seqs.zip
Resolving data.qiime2.org (data.qiime2.org)... 54.200.1.12
Connecting to data.qiime2.org (data.qiime2.org)|54.200.1.12|:443... connected.
HTTP request sent, awaiting response... 302 FOUND
Location: https://s3-us-west-2.amazonaws.com/qiime2-data/2021.11/tutorials/pd-mice/demultiplexed_seqs.zip [following]
--2022-09-08 11:18:13--  https://s3-us-west-2.amazonaws.com/qiime2-data/2021.11/tutorials/pd-mice/demultiplexed_seqs.zip
Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.218.245.48
Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.218.245.48|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21508775 (21M) [application/zip]
Saving to: 'demultiplexed_seqs.zip'


2022-09-08 11:18:15 (12.0 MB/s) - 'demultiplexed_seqs.zip' saved [21508775/21508775]



In [1]:
# unzip sequences
! unzip demultiplexed_seqs.zip

Archive:  demultiplexed_seqs.zip
   creating: demultiplexed_seqs/
  inflating: demultiplexed_seqs/10483.recip.539.ASO.PD4.D7_4_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.539.ASO.PD4.D14_5_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.413.WT.HC2.D7_12_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.220.WT.OB1.D7_30_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.458.ASO.HC3.D49_2_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.538.WT.PD4.D21_4_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.459.WT.HC3.D14_2_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.461.ASO.HC3.D7_20_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.465.ASO.PD3.D14_16_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.461.ASO.HC3.D21_11_L001_R1_001.fastq.gz  
  inflating: demultiplexed_seqs/10483.recip.540.ASO.HC4.D7_7_L001_R1_001.fastq.gz  
  i

In [2]:
! head manifest.tsv

sample-id	absolute-filepath
recip.220.WT.OB1.D7	$PWD/demultiplexed_seqs/10483.recip.220.WT.OB1.D7_30_L001_R1_001.fastq.gz
recip.290.ASO.OB2.D1	$PWD/demultiplexed_seqs/10483.recip.290.ASO.OB2.D1_27_L001_R1_001.fastq.gz
recip.389.WT.HC2.D21	$PWD/demultiplexed_seqs/10483.recip.389.WT.HC2.D21_1_L001_R1_001.fastq.gz
recip.391.ASO.PD2.D14	$PWD/demultiplexed_seqs/10483.recip.391.ASO.PD2.D14_5_L001_R1_001.fastq.gz
recip.391.ASO.PD2.D21	$PWD/demultiplexed_seqs/10483.recip.391.ASO.PD2.D21_1_L001_R1_001.fastq.gz
recip.391.ASO.PD2.D7	$PWD/demultiplexed_seqs/10483.recip.391.ASO.PD2.D7_15_L001_R1_001.fastq.gz
recip.400.ASO.HC2.D14	$PWD/demultiplexed_seqs/10483.recip.400.ASO.HC2.D14_32_L001_R1_001.fastq.gz
recip.401.ASO.HC2.D7	$PWD/demultiplexed_seqs/10483.recip.401.ASO.HC2.D7_22_L001_R1_001.fastq.gz
recip.403.ASO.PD2.D21	$PWD/demultiplexed_seqs/10483.recip.403.ASO.PD2.D21_31_L001_R1_001.fastq.gz


**Import and Summarize Data**

In [3]:
! qiime tools import \
  --type "SampleData[SequencesWithQuality]" \
  --input-format SingleEndFastqManifestPhred33V2 \
  --input-path ./manifest.tsv \
  --output-path ./demux_seqs.qza

[32mImported ./manifest.tsv as SingleEndFastqManifestPhred33V2 to ./demux_seqs.qza[0m
[0m

In [4]:
! qiime demux summarize \
  --i-data ./demux_seqs.qza \
  --o-visualization ./demux_seqs.qzv

[32mSaved Visualization to: ./demux_seqs.qzv[0m
[0m

In [8]:
q2.Visualization.load('demux_seqs.qzv')

## Denoising Sequence data

 - DADA2 approach as outlined in the tutorial.
 - Alternate trimming w/ DADA2.
 - Using deblur w/ default trimming.

### Default

In [9]:
getcwd()

'/Users/raymondotoo/Library/CloudStorage/OneDrive-UALittleRock/github_stuff/MultiOmics/Microbiome_Qiime2/Microbiome_Qiime2'

In [10]:
! qiime dada2 denoise-single \
    --i-demultiplexed-seqs ./demux_seqs.qza \
    --p-trunc-len 150 \
    --p-n-threads 8 \
    --o-table ./dada2_table.qza \
    --o-representative-sequences ./dada2_rep_set.qza \
    --o-denoising-stats ./dada2_stats.qza \
    --verbose

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_single.R /var/folders/60/239m7ccs0ksbbl7n3b9s1cqh0000gn/T/qiime2-archive-g9dcae_t/93f2d449-3fea-4f25-8ca2-4ef43c99578f/data /var/folders/60/239m7ccs0ksbbl7n3b9s1cqh0000gn/T/tmp1cyidx12/output.tsv.biom /var/folders/60/239m7ccs0ksbbl7n3b9s1cqh0000gn/T/tmp1cyidx12/track.tsv /var/folders/60/239m7ccs0ksbbl7n3b9s1cqh0000gn/T/tmp1cyidx12 150 0 2.0 2 Inf independent consensus 1.0 8 1000000 NULL 16

R version 4.1.3 (2022-03-10) 
Loading required package: Rcpp
[?25hDADA2: 1.22.0 / Rcpp: 1.0.8.3 / RcppParallel: 5.1.5 
[?25h1) Filtering [?25h[?25h[?25h................................................[?25h[?25h
[?25h[?25h2) Learning Error Rates
[?25h35926200 total bases in 239508 reads from 48 samples will be used for learning the error rates

In [11]:
# summarize denoising stats
! qiime metadata tabulate \
    --m-input-file ./dada2_stats.qza  \
    --o-visualization ./dada2_stats.qzv

[32mSaved Visualization to: ./dada2_stats.qzv[0m
[0m

In [12]:
q2.Visualization.load('dada2_stats.qzv')

In [13]:
# summarize ESV table
! qiime feature-table summarize \
    --i-table ./dada2_table.qza \
    --m-sample-metadata-file ./metadata.tsv \
    --o-visualization ./dada2_table.qzv

[32mSaved Visualization to: ./dada2_table.qzv[0m
[0m

In [14]:
q2.Visualization.load('dada2_table.qzv')

In [15]:
! qiime feature-table tabulate-seqs \
    --i-data ./dada2_rep_set.qza \
    --o-visualization ./dada2_rep_set.qzv

[32mSaved Visualization to: ./dada2_rep_set.qzv[0m
[0m

In [16]:
q2.Visualization.load('dada2_rep_set.qzv')

### Alternate Trimming w/ DADA2

In [17]:
! qiime dada2 denoise-single \
    --i-demultiplexed-seqs ./demux_seqs.qza \
    --p-trim-left 30 \
    --p-trunc-len 130 \
    --o-table ./dada2_table_alt.qza \
    --o-representative-sequences ./dada2_rep_set_alt.qza \
    --o-denoising-stats ./dada2_stats_alt.qza \
    --verbose

Running external command line application(s). This may print messages to stdout and/or stderr.
The command(s) being run are below. These commands cannot be manually re-run as they will depend on temporary files that no longer exist.

Command: run_dada_single.R /var/folders/60/239m7ccs0ksbbl7n3b9s1cqh0000gn/T/qiime2-archive-j03z300v/93f2d449-3fea-4f25-8ca2-4ef43c99578f/data /var/folders/60/239m7ccs0ksbbl7n3b9s1cqh0000gn/T/tmpohuoccsz/output.tsv.biom /var/folders/60/239m7ccs0ksbbl7n3b9s1cqh0000gn/T/tmpohuoccsz/track.tsv /var/folders/60/239m7ccs0ksbbl7n3b9s1cqh0000gn/T/tmpohuoccsz 130 30 2.0 2 Inf independent consensus 1.0 1 1000000 NULL 16

R version 4.1.3 (2022-03-10) 
Loading required package: Rcpp
[?25hDADA2: 1.22.0 / Rcpp: 1.0.8.3 / RcppParallel: 5.1.5 
[?25h1) Filtering [?25h[?25h[?25h................................................[?25h[?25h
[?25h[?25h2) Learning Error Rates
[?25h25075800 total bases in 250758 reads from 48 samples will be used for learning the error rate

In [18]:
# summarize denoising stats
! qiime metadata tabulate \
    --m-input-file ./dada2_stats_alt.qza  \
    --o-visualization ./dada2_stats_alt.qzv

[32mSaved Visualization to: ./dada2_stats_alt.qzv[0m
[0m

In [19]:
q2.Visualization.load('dada2_stats_alt.qzv')

In [20]:
# summarize ESV table
! qiime feature-table summarize \
    --i-table ./dada2_table_alt.qza \
    --m-sample-metadata-file ./metadata.tsv \
    --o-visualization ./dada2_table_alt.qzv

[32mSaved Visualization to: ./dada2_table_alt.qzv[0m
[0m

In [21]:
q2.Visualization.load('dada2_table_alt.qzv')

### deblur w/ default

In [22]:
! qiime quality-filter q-score \
    --i-demux ./demux_seqs.qza \
    --o-filtered-sequences demux-seqs-deblur.qza \
    --o-filter-stats demux-deblur-stats.qza

[32mSaved SampleData[SequencesWithQuality] to: demux-seqs-deblur.qza[0m
[32mSaved QualityFilterStats to: demux-deblur-stats.qza[0m
[0m

In [23]:
# Defaults to Greengenes. 
#    If you want to use SILVA or another ref db, then use:
#    `qiime deblur denoise-other`
#    silva files are located here: https://docs.qiime2.org/2021.11/data-resources/
! qiime deblur denoise-16S \
    --i-demultiplexed-seqs demux-seqs-deblur.qza \
    --p-trim-length 150 \
    --o-representative-sequences rep-seqs-deblur.qza \
    --o-table table-deblur.qza \
    --p-sample-stats \
    --o-stats deblur-stats.qza

[32mSaved FeatureTable[Frequency] to: table-deblur.qza[0m
[32mSaved FeatureData[Sequence] to: rep-seqs-deblur.qza[0m
[32mSaved DeblurStats to: deblur-stats.qza[0m
[0m

In [24]:
! qiime metadata tabulate \
    --m-input-file demux-deblur-stats.qza \
    --o-visualization demux-deblur-stats.qzv

! qiime deblur visualize-stats \
    --i-deblur-stats deblur-stats.qza \
    --o-visualization deblur-stats.qzv

[32mSaved Visualization to: demux-deblur-stats.qzv[0m
[0m[32mSaved Visualization to: deblur-stats.qzv[0m
[0m

In [25]:
q2.Visualization.load('demux-deblur-stats.qzv')

In [26]:
q2.Visualization.load('deblur-stats.qzv')

In [27]:
! qiime feature-table summarize \
    --i-table table-deblur.qza \
    --o-visualization table-deblur.qzv \
    --m-sample-metadata-file metadata.tsv

! qiime feature-table tabulate-seqs \
    --i-data rep-seqs-deblur.qza \
    --o-visualization rep-seqs-deblur.qzv

[32mSaved Visualization to: table-deblur.qzv[0m
[0m[32mSaved Visualization to: rep-seqs-deblur.qzv[0m
[0m

In [28]:
q2.Visualization.load('table-deblur.qzv')

In [29]:
q2.Visualization.load('rep-seqs-deblur.qzv')