---
title: "Microbiome Sequence Analysis"
subtitle: "Importing 16S Sequence Data"
author: "Sarah Tanja"
format: html
editor: visual
toc: true
toc-title: Contents <i class="bi bi-bookmark-heart"></i>
toc-depth: 5
toc-location: left
reference-location: margin
citation-location: margin
bibliography: ./references/qiime2.bib
---

## Overview

The 16S sequences were provided to me from Mr. DNA via a DropBox download link. They are **Demultiplexed** (aka **Demuxed**) sequences that still have the forward and reverse primers in the sequences.

-   The Raw Data is **demultiplexed**

-   A R1 and R2 fastq.gz file has been generated for each individual sample

-   All forward reads are binned into the R1 fastq.gz files

-   All reverse reads are binned into the R2 fastq.gz files

-   Other than demultiplexing; you can consider the Raw Data on BaseSpace as untouched (**The Forward and Reverse Primer Sequences have not been removed**)

Here I follow the QIIME2 [Casava 1.8 paired-end demultiplexed fastq](https://docs.qiime2.org/2023.5/tutorials/importing/#:~:text=Casava%201.8%20paired%2Dend%20demultiplexed%20fastq) tutorial example on importing data, using the files provided to me by Mr. DNA, Molecular Research via dropbox. 

## Data download

I got an email from Mr. DNA with a Dropbox link to the data files, where I downloaded two .zip folders; one had raw data files and the other had analysis pipeline files that Mr. DNA generated.

Here I am working with the raw data files located in `coral-pae-temp/analysis/microbiome/rawdata/demux`

In the `demux` folder is a `fastq.gz` file for each sample.

The file name includes the sample identifier and should look like `4.Ea_S1_L001_R1_001.fastq.gz`. 
The underscore-separated fields in this file name are:

1.  the sample identifier,

2.  the barcode sequence or a barcode identifier,

3.  the lane number,

4.  the direction of the read (i.e. R1 or R2, because these are paired-end reads), and

5.  the set number.


[Using Python to run QIIME2](https://docs.qiime2.org/2023.5/interfaces/artifact-api/)

In [7]:
from qiime2.plugins import feature_table
from qiime2 import Artifact

How Do You Run qiime2 in Jupyter?
Checkout [this forum post](https://forum.qiime2.org/t/how-to-run-qiime2-in-jupiter-notebook/24705) 
Option 1. Activate qiime2 environment and install jupyter notebook there.

Option 2. If jupyter notebook should be launched outside of environment, you need to install nb-kernels in "base":

conda install -n notebook_env nb_conda_kernels

and then inside of qiime2 env:

conda install -c anaconda ipykernel

In this case you will be able to choose qiime2 kernel in the notebook from "base".

When launched, qiime2 commands can be executed by providing "!" sign at the beginning of the first line of the command, or by using qiime2

Checkout [this forum post](https://forum.qiime2.org/t/activating-jupyterlab-in-qiime2/9697) python API.


In [6]:
!conda activate qiime2-2023.5

usage: conda [-h] [--no-plugins] [-V] COMMAND ...
conda: error: argument COMMAND: invalid choice: 'activate' (choose from 'clean', 'compare', 'config', 'create', 'info', 'init', 'install', 'list', 'notices', 'package', 'remove', 'uninstall', 'rename', 'run', 'search', 'update', 'upgrade', 'doctor', 'env', 'content-trust')


In [7]:
!qiime --help

/bin/bash: qiime: command not found


In [1]:
!ls

1_install-qiime2.ipynb	2_import-sequences.ipynb
1-install-qiime2.qmd	2-import-sequences.qmd


In [None]:
qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path casava-18-paired-end-demultiplexed \
  --input-format CasavaOneEightSingleLanePerSampleDirFmt \
  --output-path demux-paired-end.qza