---
title: "Microbiome Sequence Analysis"
subtitle: "Importing 16S Sequence Data"
author: "Sarah Tanja"
format: html
editor: visual
toc: true
toc-title: Contents <i class="bi bi-bookmark-heart"></i>
toc-depth: 5
toc-location: left
reference-location: margin
citation-location: margin
bibliography: ./references/qiime2.bib
---

## Overview

The 16S sequences were provided to me from Mr. DNA via a DropBox download link. They are **Demultiplexed** (aka **Demuxed**) sequences that still have the forward and reverse primers in the sequences.

-   The Raw Data is **demultiplexed**

-   A R1 and R2 fastq.gz file has been generated for each individual sample

-   All forward reads are binned into the R1 fastq.gz files

-   All reverse reads are binned into the R2 fastq.gz files

-   Other than demultiplexing; you can consider the Raw Data on BaseSpace as untouched (**The Forward and Reverse Primer Sequences have not been removed**)

Here I follow the QIIME2 [Casava 1.8 paired-end demultiplexed fastq](https://docs.qiime2.org/2023.5/tutorials/importing/#:~:text=Casava%201.8%20paired%2Dend%20demultiplexed%20fastq) tutorial example on importing data, using the files provided to me by Mr. DNA, Molecular Research via dropbox. 

## Data download

I got an email from Mr. DNA with a Dropbox link to the data files, where I downloaded two .zip folders; one had raw data files and the other had analysis pipeline files that Mr. DNA generated.

Here I am working with the raw data files located in `coral-pae-temp/analysis/microbiome/rawdata/demux`

In the `demux` folder is a `fastq.gz` file for each sample.

The file name includes the sample identifier and should look like `4.Ea_S1_L001_R1_001.fastq.gz`. 
The underscore-separated fields in this file name are:

1.  the sample identifier,

2.  the barcode sequence or a barcode identifier,

3.  the lane number,

4.  the direction of the read (i.e. R1 or R2, because these are paired-end reads), and

5.  the set number.


[Using Python to run QIIME2](https://docs.qiime2.org/2023.5/interfaces/artifact-api/)

In [17]:
!qiime metadata tabulate \
  --m-input-file ../rawdata/sample-metadata.tsv \
  --o-visualization ../output/metadata.qzv

[32mSaved Visualization to: ../output/metadata.qzv[0m
[0m

drag metadata.qzv into (view.qiime2.org)[https://view.qiime2.org/]

How Do You Run qiime2 in Jupyter?
Checkout [this forum post](https://forum.qiime2.org/t/how-to-run-qiime2-in-jupiter-notebook/24705) 

Option 1. Activate qiime2 environment and install jupyter notebook there.

Option 2. If jupyter notebook should be launched outside of environment, you need to install nb-kernels in "base":

conda install -n notebook_env nb_conda_kernels

and then inside of qiime2 env:

conda install -c anaconda ipykernel

In this case you will be able to choose qiime2 kernel in the notebook from "base".

When launched, qiime2 commands can be executed by providing "!" sign at the beginning of the first line of the command, or by using qiime2

Checkout [this forum post](https://forum.qiime2.org/t/activating-jupyterlab-in-qiime2/9697) 



Import Sequences into qiime

In [16]:
!qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path ../rawdata/demux \
  --input-format CasavaOneEightSingleLanePerSampleDirFmt \
  --output-path ../output/demux-paired-end.qza

[32mImported ../rawdata/demux as CasavaOneEightSingleLanePerSampleDirFmt to ../output/demux-paired-end.qza[0m
[0m

Checkout the `demux-paired-end.qza` in 

In [3]:
!qiime --help

Usage: [94mqiime[0m [OPTIONS] COMMAND [ARGS]...

  QIIME 2 command-line interface (q2cli)
  --------------------------------------

  To get help with QIIME 2, visit https://qiime2.org.

  To enable tab completion in Bash, run the following command or add it to
  your .bashrc/.bash_profile:

      source tab-qiime

  To enable tab completion in ZSH, run the following commands or add them to
  your .zshrc:

      autoload -Uz compinit && compinit
      autoload bashcompinit && bashcompinit
      source tab-qiime

[1mOptions[0m:
  [94m--version[0m   Show the version and exit.
  [94m--help[0m      Show this message and exit.

[1mCommands[0m:
  [94minfo[0m                Display information about current deployment.
  [94mtools[0m               Tools for working with QIIME 2 files.
  [94mdev[0m                 Utilities for developers and advanced users.
  [94malignment[0m           Plugin for generating and manipulating alignments.
  [94mcomp

Demultiplexed

In [13]:
!qiime demux emp-paired \
 --m-barcodes-file ../rawdata/sample-metadata.tsv \
 --m-barcodes-column BarcodeSequence \
 --p-no-rev-comp-mapping-barcodes \
 --i-seqs emp-paired-end-sequences.qza \
 --o-per-sample-sequences demux-full.qza \
 --o-error-correction-details demux-details.qza \
 --p-no-golay-error-correction 

Usage: [94mqiime demux emp-paired[0m [OPTIONS]

  Demultiplex paired-end sequence data (i.e., map barcode reads to sample ids)
  for data generated with the Earth Microbiome Project (EMP) amplicon
  sequencing protocol. Details about this protocol can be found at
  http://www.earthmicrobiome.org/protocols-and-standards/

[1mInputs[0m:
  [94m[4m--i-seqs[0m ARTIFACT [32mEMPPairedEndSequences[0m
                       The paired-end sequences to be demultiplexed.
                                                                    [35m[required][0m
[1mParameters[0m:
  [94m[4m--m-barcodes-file[0m METADATA
  [94m[4m--m-barcodes-column[0m COLUMN  [32mMetadataColumn[Categorical][0m
                       The sample metadata column containing the per-sample
                       barcodes.                                    [35m[required][0m
  [94m--p-golay-error-correction[0m / [94m--p-no-golay-error-correction[0m
                       Perform 12nt Golay error corre

In [5]:
!qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path casava-18-paired-end-demultiplexed \
  --input-format CasavaOneEightSingleLanePerSampleDirFmt \
  --output-path demux-paired-end.qza

SyntaxError: invalid syntax (1215720388.py, line 1)