# 07/04/2020

Source: https://docs.qiime2.org/2020.2/tutorials/importing/#pairedendfastqmanifestphred33v2, and 
https://docs.qiime2.org/2020.2/tutorials/atacama-soils/#paired-end-read-analysis-commands

1. Import data 
2. Denoise with DADA2

Note: don't neet to demultiplex

In [1]:
cd /xdisk/tfaily/mig2020/extra/nathaliagg/sulfate_experiment/microbial_16S/

In [2]:
mkdir qiime2

In [3]:
cd qiime2

In [6]:
pwd

/xdisk/tfaily/mig2020/extra/nathaliagg/sulfate_experiment/microbial_16S/qiime2


In [4]:
cp ../raw_data/manifest.tsv .

In [5]:
cp ../metadata.tsv .

### Load module and activate conda environment

In [7]:
module load anaconda/2020/2020.02

 /cm/local/apps/environment-modules/4.0.0//bin /cm/shared/uaapps/pbspro/19.2.4/sbin /cm/shared/uaapps/pbspro/19.2.4/bin
 /cm/shared/uaapps/pbspro/19.2.4/share/man


In [8]:
source activate qiime2-2020.2

(qiime2-2020.2) 

: 1

### 1. Import data

The sequences are importedd into an artifact of type `SampleData[PairedEndSequencesWithQuality]`, and format `PairedEndFastqManifestPhred33V2`.

In [9]:
export DATADIR="/xdisk/tfaily/mig2020/extra/nathaliagg/sulfate_experiment/microbial_16S/raw_data"

(qiime2-2020.2) 

: 1

In [10]:
qiime tools import \
  --type 'SampleData[PairedEndSequencesWithQuality]' \
  --input-path manifest.tsv \
  --output-path paired-end-demux.qza \
  --input-format PairedEndFastqManifestPhred33V2

[32mImported manifest.tsv as PairedEndFastqManifestPhred33V2 to paired-end-demux.qza[0m
(qiime2-2020.2) 

: 1

Output artifact `paired-end-demux.qza`

Since data is already demultiplexed, I can jump to generating a view summary of how many sequences were obtained per sample.

In [11]:
qiime demux summarize \
  --i-data paired-end-demux.qza \
  --o-visualization demux.qzv

[32mSaved Visualization to: demux.qzv[0m
(qiime2-2020.2) 

: 1

Output visualization `demux.qzv`

Load `demuz.qzv` into `view.qiime2.org`. Look in the `Interactive Quality Plot` tab. These plots contain the quality scores for the forward and reverse sequences. I use these graphs to determine what trimming parameters to use for denoising with DADA2 (next step).

These are 300-bp reads both forward and reverse (from MultiQC). The reads need to be long enough to overlap when joining paired ends. Usually, the first and the end few base pairs need to be trimmed. I alread trimmed the reads with trimmomatic, but it is possible to see that in the `Interactive Quality Plot` the quality towards the end of the sequences are low. 

I'll include 280 as truncated lengths for an extra trim.

### 3. Denoise with DADA2

In [12]:
qiime dada2 denoise-paired --help

Usage: [34mqiime dada2 denoise-paired[0m [OPTIONS]

  This method denoises paired-end sequences, dereplicates them, and filters
  chimeras.

[1mInputs[0m:
  [34m[4m--i-demultiplexed-seqs[0m ARTIFACT [32mSampleData[PairedEndSequencesWithQuality][0m
                         The paired-end demultiplexed sequences to be
                         denoised.                                  [35m[required][0m
[1mParameters[0m:
  [34m[4m--p-trunc-len-f[0m INTEGER
                         Position at which forward read sequences should be
                         truncated due to decrease in quality. This truncates
                         the 3' end of the of the input sequences, which will
                         be the bases that were sequenced in the last cycles.
                         Reads that are shorter than this value will be
                         discarded. After this parameter is applied there must
                         still be at least a 12 nucleotide overla

: 1

In [16]:
dt=$(date '+%d/%m/%Y %H:%M:%S');
echo "$dt"

qiime dada2 denoise-paired \
  --i-demultiplexed-seqs paired-end-demux.qza \
  --p-trunc-len-f 270 \
  --p-trunc-len-r 270 \
  --p-n-threads 25 \
  --output-dir denoise_dada2 \
  --o-table table.qza \
  --o-representative-sequences rep-seqs.qza \
  --o-denoising-stats denoising-stats.qza
  
dt=$(date '+%d/%m/%Y %H:%M:%S');
echo "$dt"

(qiime2-2020.2) 04/07/2020 09:16:35
(qiime2-2020.2) (qiime2-2020.2) [32mSaved FeatureTable[Frequency] to: table.qza[0m
[32mSaved FeatureData[Sequence] to: rep-seqs.qza[0m
[32mSaved SampleData[DADA2Stats] to: denoising-stats.qza[0m
(qiime2-2020.2) (qiime2-2020.2) (qiime2-2020.2) 04/07/2020 12:02:24
(qiime2-2020.2) 

: 1

Output artifacts `denoising-stats.qza`, `rep-seqs.qza`, and `table.qza`.

But the results were not put in that directory...

I'll manually move them:

In [17]:
mv denoising-stats.qza denoise_dada2/
mv rep-seqs.qza denoise_dada2/
mv table.qza denoise_dada2

(qiime2-2020.2) (qiime2-2020.2) (qiime2-2020.2) 

: 1

`denoising-stats.qza` contains denoising statistics. Generate visualization by running:

In [18]:
qiime metadata tabulate \
  --m-input-file denoise_dada2/denoising-stats.qza \
  --o-visualization denoise_dada2/denoising-stats.qzv

[32mSaved Visualization to: denoise_dada2/denoising-stats.qzv[0m
(qiime2-2020.2) 

: 1

Output `denoising-stats.qzv` --> `view.qiime2.org`

`table.qza` and `rep-seqs.qza` contain the feature table and corresponding feature sequences, respectively.

Generate summaries and visualizations:

In [19]:
qiime feature-table summarize \
  --i-table denoise_dada2/table.qza \
  --o-visualization denoise_dada2/table.qzv \
  --m-sample-metadata-file metadata.tsv

qiime feature-table tabulate-seqs \
  --i-data denoise_dada2/rep-seqs.qza \
  --o-visualization denoise_dada2/rep-seqs.qzv

[32mSaved Visualization to: denoise_dada2/table.qzv[0m
(qiime2-2020.2) (qiime2-2020.2) [32mSaved Visualization to: denoise_dada2/rep-seqs.qzv[0m
(qiime2-2020.2) 

: 1

Output visualizations: `table.qzv` and `rep-seqs.qzv` --> `view.qiime2.org`

Ready for the next notebook!

In [20]:
source deactivate qiime2-2020.2

