# Sequence Quality Control and Feature Table Construction

The DADA2 steps take quite some time some it is recommended to run them in a terminal multiplexer such as `tmux`. The input for DADA2 will be the trimmed, demultiplexed reads produced in the `import-and-trim.ipynb` file. Prior to performing DADA2 it is advidable to inspect the read quality to determine the trimming values for DADA2. 

## Run DADA2 on both sets of data

Be mindful of the different trimming settings - that is because the reads produced in the second run were generally of lower quality than the first run.

In [None]:
%%bash
# first set of reads
qiime dada2 denoise-paired \
    --i-demultiplexed-seqs b1-trimmed-demuxed-seqs/trimmed_sequences.qza \
    --p-trunc-len-f 265 \
    --p-trunc-len-r 235 \
    --p-n-threads 16 \
    --o-table b1-table.qza \
    --o-representative-sequences b1-rep-seqs.qza \
    --o-denoising-stats b1-stats.qza

# second set of reads
qiime dada2 denoise-paired \
    --i-demultiplexed-seqs b2-trimmed-demuxed-seqs/trimmed_sequences.qza \
    --p-trunc-len-f 260 \
    --p-trunc-len-r 215 \
    --p-n-threads 16 \
    --o-table b2-table.qza \
    --o-representative-sequences b2-rep-seqs.qza \
    --o-denoising-stats b2-stats.qza

## Visualize the denoising stats 

Make sure we didn't drop too many reads at any one step. Depending on the step, this would indicate a problem with the trimming/truncation lengths. 

**note**: A lot of reads were lost in the filtering step in the second batch, presumably because the second batch had worse sequence quality. In either case these results are still okay because there are 10's of thousands of reads remaining.  

In [None]:
%%bash
qiime metadata tabulate \
    --m-input-file b1-stats.qza \
    --o-visualization b1-stats.qzv
    
qiime metadata tabulate \
    --m-input-file b2-stats.qza \
    --o-visualization b2-stats.qzv

## Merge the Feature Tables

Next we can merge the resulting ASV tables of both DADA2 pipelines into a single feature-table

In [None]:
%%bash
qiime feature-table merge \
    --i-tables b1-table.qza
    --i-tables b2-table.qza \
    --o-merged-table table.qza

## Merge the Representative Sequences

Now merge the representative sequences into one fastq file.

In [None]:
%%bash
qiime feature-table merge-seqs \
    --i-data b1-rep-seqs.qza
    --i-data b2-rep-seqs.qza \
    --o-merged-data rep-seqs.qza

## Visualize the merged tables and seqs

In [None]:
%%bash
# visualize table
qiime feature-table summarize \
    --i-table table.qza \
    --o-visualization table.qzv \
    --m-sample-metadata-file sample-metadata.tsv

# visualize merged reads
qiime feature-table tabulate-seqs \
    --i-data rep-seqs.qza
    --o-visualization rep-seqs.qzv