# Import Sequences into Qiime2 and trim adapters

This is the first step in the analysis. Read in the raw sequences and then trim the sequences of their adapters. Non of these steps take a particularly long time an can thus be run locally instead of on a cluster or server.

## Import Demultiplexed Sequences

Since the sequencing was performed in two different runs it is best practice to process both runs seprately and then later merge the feature tables and sequences.

In [None]:
%%bash
qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path batch_1_reads/ \
  --input-format CasavaOneEightSingleLanePerSampleDirFmt \
  --output-path batch1-demux-paired-end.qza

qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path batch_2_reads/ \
  --input-format CasavaOneEightSingleLanePerSampleDirFmt \
  --output-path batch2-demux-paired-end.qza

## Summarize the demultiplexed sequences

Since the sequences have already been demuxed we can summarize the 'demux-paired-end.qza' files as is. Generate a summary of the demultiplexing results to:

- determine how many sequences were obtained per sample
- get a summary of the distribution of sequence qualities at each position in the sequence data.

In [None]:
%%bash
qiime demux summarize \
   --i-data batch1-demux-paired-end.qza \
   --o-visualization batch1-demux-paired-end-summary.qzv
   
qiime demux summarize \
   --i-data batch2-demux-paired-end.qza \
   --o-visualization batch2-demux-paired-end-summary.qzv

## Trim adapters from the demuxed reads

Since adapters are still present on the sequences we can use CutAdapt to trim them before using them as input to DAD2 (which may result in undesirable effects if not controlled).

In [None]:
%%bash
qiime cutadapt trim-paired \
    --i-demultiplexed-sequences batch1-demux-paired-end.qza \
    --p-cores 16 \
    --p-front-f GCCTACGGGNGGCWGCAG \
    --p-front-r GGACTACHVGGGTATCTAATCC \
    --output-dir b1-trimmed-demuxed-seqs
    
# batch 2    
qiime cutadapt trim-paired \
    --i-demultiplexed-sequences batch2-demux-paired-end.qza \
    --p-cores 16 \
    --p-front-f GCCTACGGGNGGCWGCAG \
    --p-front-r GGACTACHVGGGTATCTAATCC \
    --output-dir b2-trimmed-demuxed-seqs

### Summarize and visualize the trimmed sequences

This will give us insights into the quality of the reads

In [None]:
%%bash
qiime demux summarize \
   --i-data b1-trimmed-demuxed-seqs/trimmed_sequences.qza \
   --o-visualization b1-trimmed-sequences.qzv
   
qiime demux summarize \
   --i-data b2-trimmed-demuxed-seqs/trimmed_sequences.qza \
   --o-visualization b2-trimmed-sequences.qzv

We can then use `qiime tools view b[12]-trimmed-sequences.qzv` to inspect the quality of the reads prior to performing denoising with DADA2