## Overview

The 16S sequences were provided to me from Mr. DNA via a DropBox download link. They are **Demultiplexed** (aka **Demuxed**) sequences that still have the forward and reverse primers in the sequences.

-   The Raw Data is **demultiplexed**

-   A R1 and R2 fastq.gz file has been generated for each individual sample

-   All forward reads are binned into the R1 fastq.gz files

-   All reverse reads are binned into the R2 fastq.gz files

-   Other than demultiplexing; you can consider the Raw Data on BaseSpace as untouched (**The Forward and Reverse Primer Sequences have not been removed**)

Here I follow the QIIME2 [Casava 1.8 paired-end demultiplexed fastq](https://docs.qiime2.org/2023.5/tutorials/importing/#:~:text=Casava%201.8%20paired%2Dend%20demultiplexed%20fastq) tutorial example on importing data, using the files provided to me by Mr. DNA, Molecular Research via dropbox. 

## Data download

I got an email from Mr. DNA with a Dropbox link to the data files, where I downloaded two .zip folders; one had raw data files and the other had analysis pipeline files that Mr. DNA generated.

Here I am working with the raw data files located in `coral-pae-temp/analysis/microbiome/rawdata/demux`

In the `demux` folder is a `fastq.gz` file for each sample.

The file name includes the sample identifier and should look like `4.Ea_S1_L001_R1_001.fastq.gz`. 
The underscore-separated fields in this file name are:

1.  the sample identifier,

2.  the barcode sequence or a barcode identifier,

3.  the lane number,

4.  the direction of the read (i.e. R1 or R2, because these are paired-end reads), and

5.  the set number.


Make an output directory 

In [8]:
!cd ../ ; mkdir output

Make a table of the metadata that you can drag into qiime2view

In [9]:
!qiime metadata tabulate \
  --m-input-file ../rawdata/sample-metadata.tsv \
  --o-visualization ../output/metadata.qzv

[32mSaved Visualization to: ../output/metadata.qzv[0m
[0m

drag metadata.qzv into (view.qiime2.org)[https://view.qiime2.org/]

Import Sequences into qiime

In [10]:
!qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path ../rawdata/demux \
  --input-format CasavaOneEightSingleLanePerSampleDirFmt \
  --output-path ../output/demux-paired-end.qza

[32mImported ../rawdata/demux as CasavaOneEightSingleLanePerSampleDirFmt to ../output/demux-paired-end.qza[0m
[0m

The `demux-paired-end.qza` artifact contains raw, demultiplexed sequences that still have forward and reverse primers

## Trim primers from paired-end sequences using `cutadapt`

> "The PCR primers (F515/R806) were developed against the V4 region of the 16S rRNA, which we determined would yield optimal community clustering with reads of this length using a procedure similar to that of ref. 15. [For reference, this primer pair amplifies the region 533–786 in the Escherichia coli strain 83972 sequence (greengenes accession no. prokMSA_id:470367).]The reverse PCR primer is barcoded with a 12-base errorcorrecting Golay code to facilitate multiplexing of up to ≈1,500 samples per lane, and both PCR primers contain sequencer adapter regions." - (Caporasco et al. 2011)

Caporaso, J. G., Lauber, C. L., Walters, W. A., Berg-Lyons, D., Lozupone, C. A., Turnbaugh, P. J., Fierer, N., & Knight, R. (2011). Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proceedings of the National Academy of Sciences, 108(supplement_1), 4516–4522. https://doi.org/10.1073/pnas.1000080107

> "The V4 variable region of the 16S rRNA gene was amplified using the 515F (5′-­GTGCCAGCMGCCGCGGTAA-­3′) and 806R (5′-­GGACTACHVGGGTWTCTAAT-­3′) primer set (Caporaso et al. 2011). ” - (Brown et al. 2021)

Brown, Tanya, Dylan Sonett, Jesse R. Zaneveld, and Jacqueline L. Padilla-Gamiño. 2021. “Characterization of the Microbiome and Immune Response in Corals with Chronic Montipora White Syndrome.” Molecular Ecology 30 (11): 2591–2606. https://doi.org/10.1111/mec.15899.

In [13]:
!qiime cutadapt trim-paired \
  --i-demultiplexed-sequences ../output/demux-paired-end.qza \
  --p-cores 4 \
  --p-front-f GTGCCAGCMGCCGCGGTAA \
  --p-front-r GGACTACHVGGGTWTCTAAT \
  --o-trimmed-sequences ../output/demux-trimmed.qza

[32mSaved SampleData[PairedEndSequencesWithQuality] to: ../output/demux-trimmed.qza[0m
[0m

## Visualize trimmed & demultiplexed sequences

In [14]:
!qiime demux summarize \
  --i-data ../output/demux-trimmed.qza \
  --o-visualization ../output/demux-trimmed-summary.qzv

[32mSaved Visualization to: ../output/demux-trimmed-summary.qzv[0m
[0m

In [1]:
conda activate qiime2-2023.5

usage: conda [-h] [--no-plugins] [-V] COMMAND ...
conda: error: argument COMMAND: invalid choice: 'activate' (choose from 'clean', 'compare', 'config', 'create', 'info', 'init', 'install', 'list', 'notices', 'package', 'remove', 'uninstall', 'rename', 'run', 'search', 'update', 'upgrade', 'doctor', 'env', 'content-trust')

Note: you may need to restart the kernel to use updated packages.


In [2]:
from qiime2 import Visualization
Visualization.load('../output/demux-trimmed-summary.qzv')

## Denoise

In [15]:
!qiime dada2 denoise-paired \
  --i-demultiplexed-seqs ../output/demux-trimmed.qza \
  --p-trim-left-f 0 \
  --p-trim-left-r 0 \
  --p-trunc-len-f 298 \
  --p-trunc-len-r 298 \
  --o-table ../output/table.qza \
  --o-representative-sequences ../output/rep-seqs.qza \
  --o-denoising-stats ../output/denoising-stats.qza

[31m[1mPlugin error from dada2:

  No reads passed the filter. trunc_len_f (298) or trunc_len_r (298) may be individually longer than read lengths, or trunc_len_f + trunc_len_r may be shorter than the length of the amplicon + 12 nucleotides (the length of the overlap). Alternatively, other arguments (such as max_ee or trunc_q) may be preventing reads from passing the filter.

Debug info has been saved to /tmp/qiime2-q2cli-err-s30_1g9r.log[0m
[0m