#  Environmental analysis

![scheme](img01.jpg
)

RW: Raw Water before treatment

BC: Before Chlorination

FW: Finished Water 

DS1-DS3: tap water

PB1-PB2: biofilm in pipes

WM: biofilm in water meters

We are going to study the variation of microbial populations before entering the circuit (RW), at the end of the treatment (FW) and tap water samples in three houses (DS1, DS2 and DS3). We will also study samples taken in June and July. 

I propose you to run the pipeline I showed you in Unit 2 whit this dataset and answer the following questions:

1)	How many reads have we got for each sample?

2)	Which is the trimming length you are using for the denoising step?

3)	How many ASVs do you have before filtering deblur-table? And after filtering? 

4)	Which is the average frequency of sequences per sample before filtering deblur table?

5)	Which is the sample with the lower number of sequences after filtering deblur table? And the one with the highest number?

6)	Which is the most abundant phylum in each sample?

7)	Has the study enough coverage to allow us to make any statistical inference on communities’ diversity?

8)	Studying the Unifrac Weighted PCoA plot. Is there any effect of water treatment or sampling moment on the bacteria communities? 

9)	If we compare Untreated samples (RW) vs Treated samples (FW, DS1, DS2, DS3) which are the Phyla or Classes explaining the differences among both groups? (Hint: use LEfSe)

**Dataset Contents**:

1)	fastq folder: raw sample sequences

2)	85_otus.fasta and 85_otus_taxonomy.txt: taxonomy database

3)	metadata.txt: sample metadata. I have included some columns useful for diversity analyses.

4)	primers.txt: information on primers used for 16S PCR amplification

5)	quiz.docx: this file

6)	samplemanifest: manifest file with information of the ubication of  fastq files and their corresponding tags.

## Responses

1)	How many reads have we got for each sample?


>From *grep* against fastq files and verified after import on *qiime*.

| File              |#reads|
| :------------------|--------:|
|SRR3593621_1.fastq |52945| 
|SRR3593621_2.fastq |52945|
|SRR3593622_1.fastq |62218|
|SRR3593622_2.fastq |62218|
|SRR3593623_1.fastq |92740|
|SRR3593623_2.fastq |92740|
|SRR3593625_1.fastq |70366|
|SRR3593625_2.fastq |70366|
|SRR3593627_1.fastq |100615|
|SRR3593627_2.fastq |100615|
|SRR3593628_1.fastq |78495|
|SRR3593628_2.fastq |78495|
|SRR3593631_1.fastq |97332|
|SRR3593631_2.fastq |97332|
|SRR3593632_1.fastq |84361|
|SRR3593632_2.fastq |84361|
|SRR3593664_1.fastq |101827|
|SRR3593664_2.fastq |101827|
|SRR3593665_1.fastq |84850|
|SRR3593665_2.fastq |84850|


2)	Which is the trimming length you are using for the denoising step?

>$300$. It's the length at which the drop of sequence quality is produced.

3)	How many ASVs do you have before filtering deblur-table? And after filtering? 

>Before filtering I've $813$ features after $468$.

4)	Which is the average frequency of sequences per sample before filtering deblur table?
>$35,290$

5)	Which is the sample with the lower number of sequences after filtering deblur table? And the one with the highest number?

6)	Which is the most abundant phylum in each sample?

7)	Has the study enough coverage to allow us to make any statistical inference on communities’ diversity?

8)	Studying the Unifrac Weighted PCoA plot. Is there any effect of water treatment or sampling moment on the bacteria communities? 

9)	If we compare Untreated samples (RW) vs Treated samples (FW, DS1, DS2, DS3) which are the Phyla or Classes explaining the differences among both groups? (Hint: use LEfSe)



## Pipeline

In [None]:
%%bash
#Alignment and phylogeny
mkdir tree_out

qiime alignment mafft --i-sequences deblur_output/rep_seqs_filt.qza \
                      --p-n-threads 3 \
                      --o-alignment tree_out/rep_seqs_filt_aligned.qza

qiime alignment mask --i-alignment tree_out/rep_seqs_filt_aligned.qza \
                     --o-masked-alignment tree_out/rep_seqs_filt_aligned_masked.qza

qiime phylogeny fasttree --i-alignment tree_out/rep_seqs_filt_aligned_masked.qza \
                         --p-n-threads 2 \
                         --o-tree tree_out/rep_seqs_filt_aligned_masked_tree

qiime phylogeny midpoint-root --i-tree tree_out/rep_seqs_filt_aligned_masked_tree.qza \
                              --o-rooted-tree tree_out/rep_seqs_filt_aligned_masked_tree_rooted.qza

#Training database
qiime tools import \
  --type 'FeatureData[Sequence]' \
  --input-path 85_otus.fasta \
  --output-path 85_otus.qza

qiime tools import \
  --type 'FeatureData[Taxonomy]' \
  --input-format HeaderlessTSVTaxonomyFormat \
  --input-path 85_otu_taxonomy.txt \
  --output-path ref-taxonomy.qza

qiime feature-classifier extract-reads \
  --i-sequences 85_otus.qza \
  --p-f-primer AACMGGATTAGATACCCKG \
  --p-r-primer ACGTCATCCCCACCTTCC \
  --p-min-length 100 \
  --p-max-length 400 \
  --o-reads ref-seqs.qza

qiime feature-classifier fit-classifier-naive-bayes \
  --i-reference-reads ref-seqs.qza \
  --i-reference-taxonomy ref-taxonomy.qza \
  --o-classifier classifier.qza

#Taxonomic assignation

qiime feature-classifier classify-sklearn --i-reads deblur_output/rep_seqs_filt.qza \
                                          --i-classifier classifier.qza \
                                          --p-n-jobs 2 \
                                          --output-dir taxa

qiime tools export --input-path taxa/classification.qza --output-path taxa

qiime taxa barplot --i-table deblur_output/deblur_table_filt.qza \
                   --i-taxonomy taxa/classification.qza \
                   --m-metadata-file metadata.txt \
                   --o-visualization taxa/taxa_barplot.qzv

qiime feature-table group --i-table deblur_output/deblur_table_filt.qza \
                          --p-axis sample \
                          --p-mode sum \
                          --m-metadata-file metadata.txt \
                          --m-metadata-column Plant \
                          --o-grouped-table deblur_output/deblur_table_filt_Plant.qza

qiime taxa barplot --i-table deblur_output/deblur_table_filt_Plant.qza \
                   --i-taxonomy taxa/classification.qza \
                   --m-metadata-file metadata.txt \
                   --o-visualization taxa/taxa_barplot_Plant.qzv

#Diversity

qiime diversity alpha-rarefaction --i-table deblur_output/deblur_table_filt.qza \
                                  --p-max-depth 5496 \
                                  --p-steps 20 \
                                  --i-phylogeny tree_out/rep_seqs_filt_aligned_masked_tree_rooted.qza \
                                  --m-metadata-file metadata.txt \
                                  --o-visualization rarefaction_curves.qzv

qiime diversity core-metrics-phylogenetic --i-table deblur_output/deblur_table_filt.qza \
                                          --i-phylogeny tree_out/rep_seqs_filt_aligned_masked_tree_rooted.qza \
                                          --p-sampling-depth 1411 \
                                          --m-metadata-file metadata.txt \
                                          --p-n-jobs 2 \
                                          --output-dir diversity

qiime diversity alpha-group-significance --i-alpha-diversity diversity/shannon_vector.qza \
                                         --m-metadata-file metadata.txt \
                                         --o-visualization diversity/shannon_compare_groups.qzv

#Differential Abundance
#Ancom

qiime taxa collapse \
  --i-table deblur_output/deblur_table_filt.qza \
  --i-taxonomy taxa/classification.qza \
  --p-level 6 \
  --o-collapsed-table deblur_output/deblur_collapsed.qza

qiime composition add-pseudocount \
  --i-table deblur_output/deblur_collapsed.qza \
  --o-composition-table deblur_output/comp-deblur-l6.qza

qiime composition ancom \
  --i-table deblur_output/comp-deblur-l6.qza \
  --m-metadata-file metadata.txt \
  --m-metadata-column Plant \
  --o-visualization deblur_output/l6-ancom-Subject.qzv

#LEFSe

qiime taxa collapse --i-table deblur_output/deblur_table_filt.qza \
                    --o-collapsed-table deblur_output/L3_collapse_table.qza \
                    --p-level 3 \
                    --i-taxonomy taxa/classification.qza

qiime tools export --input-path deblur_output/L3_collapse_table.qza \
                   --output-path lefse_table/

biom convert -i lefse_table/feature-table.biom \
             -o lefse_table/feature-table.txt \
             --header-key “taxonomy” --to-tsv

### Preprocessing and quality check


In [5]:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import matplotlib.pyplot as plt

FILE_ID = "SRR"
FASTQ_STR = "@SRR"

In [26]:
%%bash -s "$FILE_ID" "$FASTQ_STR"
ssh microbioinf@192.168.56.101 env FILE_ID=$1 FASTQ_STR=$2 2>/dev/null /bin/bash <<"EOT"
export PATH=$PATH:/home/microbioinf/miniconda3/bin
echo "#### Check files FILE_ID=${FILE_ID}, FASTQ_STR=$FASTQ_STR"
cd Documentos/Tema_2/fastq
head -4 ${FILE_ID}*fastq
grep -c $FASTQ_STR ${FILE_ID}*fastq

#echo "#### Compute quality"
#mkdir ${FILE_ID}_Quality
#fastqc ${FILE_ID}_R1.fastq -o ${FILE_ID}_Quality/
#fastqc ${FILE_ID}_R2.fastq -o ${FILE_ID}_Quality/
#
#echo "#### Replace ' ' by '_' in header"
#head -n 1 ${FILE_ID}*fastq
#cat ${FILE_ID}_R1.fastq | sed 's/ /_/g' > ${FILE_ID}_R1_.fastq
#cat ${FILE_ID}_R2.fastq | sed 's/ /_/g' > ${FILE_ID}_R2_.fastq
#head -n 1 ${FILE_ID}*fastq
EOT

#### Check files FILE_ID=SRR, FASTQ_STR=@SRR
==> SRR3593621_1.fastq <==
@SRR3593621.1 1 length=300
TACGTAGGGTGCAAGCGTTATCCGGATTCACTGTTCTTCCCTATTCGTTTAGTTTTTTTTTTCCTTCTCACTTCACAGCCCTTTGCTTTACCTCGTCCTTCTTTTCTTCTTTACTATACTCGTTTTTTATATTTGTACGTGGTTCTCCTTTTTGTTCTGTGCCTTGCGTTGTGTTCTTGTTGTACCCCAATTGCCATTGCTCCTTTCTTCTTCATTCCTGTCACTCTTCCACGCAAGCTATCGTACTCCATCAGTTTAGTCCCCTCCTTTTTTCTAGCCCTCAACTCTTCCCTGCTAGTT
+SRR3593621.1 1 length=300
<6BCCGGGGGGGGGGGGGG7CEE6+,8@@,;C,,,<,<,,,,,,;6,;,6,,,;-,:+8+86,,,<9,:,6CE,,,,,,66,,,,,5:,,6696,,8,89,,,9,<,,,,9,,,:,<,,95+,+++9+,9,,,,,,,,+,,+++:>;,74,4,+,,5,,8,,,,8,,+66+++,66,7,,,,,,:,:5++,,+2,,,,+3+74<2=2@,,+22,5,5*4+,4*/5++++3*.)*)((//++(-(((*+*))+)))3.)))++,/;*/(/)//587))6((,(*),-.).)-44)43)).-

==> SRR3593621_2.fastq <==
@SRR3593621.1 1 length=300
TTTCCTCTTTCTCTCTTCCTCCCCACCCTTCCTCCTTCCCTCTTTCTCTTCCTTCTCTCCTCACATTCTCCTCCCTCCCCTGCTTCCCTTTATTTCTCCCTCTACTCCTCTCTCTTCTCTGCTTCTCTCCTCTACCTTTCGTTTCTCAGTGTCATCTATTTCCCTGCCCGTTGCCTTCTCCTTTCTTCTTCCTCCTCATCTCCACTCATTTCACTTC

In [25]:
%%bash -s "$FILE_ID" "$FASTQ_STR"
ssh microbioinf@192.168.56.101 env FILE_ID=$1 FASTQ_STR=$2 2>/dev/null /bin/bash <<"EOT"
echo "#### Check files FILE_ID=${FILE_ID}, FASTQ_STR=$FASTQ_STR"
export PATH=$PATH:/home/microbioinf/miniconda3/bin
source activate qiime2-2018.11
cd Documentos/Tema_2
qiime
qiime tools import --type 'SampleData[PairedEndSequencesWithQuality]' \
                   --input-path samplemanifest \
                   --output-path paired-end-demux.qza \
                   --input-format PairedEndFastqManifestPhred33
EOT

#### Check files FILE_ID=SRR, FASTQ_STR=@SRR
Usage: qiime [OPTIONS] COMMAND [ARGS]...

  QIIME 2 command-line interface (q2cli)
  --------------------------------------

  To get help with QIIME 2, visit https://qiime2.org.

  To enable tab completion in Bash, run the following command or add it to
  your .bashrc/.bash_profile:

      source tab-qiime

  To enable tab completion in ZSH, run the following commands or add them to
  your .zshrc:

      autoload bashcompinit && bashcompinit && source tab-qiime

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  info                Display information about current deployment.
  tools               Tools for working with QIIME 2 files.
  dev                 Utilities for developers and advanced users.
  alignment           Plugin for generating and manipulating alignments.
  composition         Plugin for compositional data analysis.
  cutadapt            Plugin for removing adapter sequen

In [27]:
%%bash -s "$FILE_ID" "$FASTQ_STR"
ssh microbioinf@192.168.56.101 env FILE_ID=$1 FASTQ_STR=$2 2>/dev/null /bin/bash <<"EOT"
echo "#### Check files FILE_ID=${FILE_ID}, FASTQ_STR=$FASTQ_STR"
export PATH=$PATH:/home/microbioinf/miniconda3/bin
source activate qiime2-2018.11
cd Documentos/Tema_2
qiime demux summarize --i-data paired-end-demux.qza --o-visualization paired-end-demux.qzv
EOT

#### Check files FILE_ID=SRR, FASTQ_STR=@SRR
Saved Visualization to: paired-end-demux.qzv


### Determination of sOTUs using Deblur

#### Sample pre-processing

In [28]:
%%bash -s "$FILE_ID" "$FASTQ_STR"
ssh microbioinf@192.168.56.101 env FILE_ID=$1 FASTQ_STR=$2 2>/dev/null /bin/bash <<"EOT"
echo "#### Check files FILE_ID=${FILE_ID}, FASTQ_STR=$FASTQ_STR"
export PATH=$PATH:/home/microbioinf/miniconda3/bin
source activate qiime2-2018.11
cd Documentos/Tema_2

# Deblur does not currently support unpaired paired-end readings, 
#so we have to use the VSEARCH algorithm to merge the readings:
qiime vsearch join-pairs --i-demultiplexed-seqs paired-end-demux.qza \
                         --o-joined-sequences joined-reads.qza
qiime demux summarize --i-data joined-reads.qza --o-visualization joined-reads.qzv

# Filtering of the readings according to their quality.
qiime quality-filter q-score-joined --i-demux joined-reads.qza \
                                    --o-filter-stats filt_stats.qza \
                                    --o-filtered-sequences joined-filt-reads.qza

qiime demux summarize --i-data joined-filt-reads.qza --o-visualization joined-filt-reads.qzv

EOT

#### Check files FILE_ID=SRR, FASTQ_STR=@SRR
Saved SampleData[JoinedSequencesWithQuality] to: joined-reads.qza
Saved Visualization to: joined-reads.qzv
Saved SampleData[JoinedSequencesWithQuality] to: joined-filt-reads.qza
Saved QualityFilterStats to: filt_stats.qza
Saved Visualization to: joined-filt-reads.qzv


#### Determination of sOTUs/features

In [31]:
%%bash -s "$FILE_ID" "$FASTQ_STR"
ssh microbioinf@192.168.56.101 env FILE_ID=$1 FASTQ_STR=$2 2>/dev/null /bin/bash <<"EOT"
export PATH=$PATH:/home/microbioinf/miniconda3/bin
source activate qiime2-2018.11
cd Documentos/Tema_2

qiime deblur denoise-16S --i-demultiplexed-seqs joined-filt-reads.qza \
                         --p-trim-length 300 \
                         --p-sample-stats \
                         --p-jobs-to-start 2 \
                         --p-min-reads 1 \
                         --output-dir deblur_output

qiime feature-table summarize --i-table deblur_output/table.qza --o-visualization deblur_output/deblur_table_summary.qzv


EOT

Saved Visualization to: deblur_output/deblur_table_summary.qzv


In [34]:
%%bash -s "$FILE_ID" "$FASTQ_STR"
ssh microbioinf@192.168.56.101 env FILE_ID=$1 FASTQ_STR=$2 2>/dev/null /bin/bash <<"EOT"
export PATH=$PATH:/home/microbioinf/miniconda3/bin
source activate qiime2-2018.11
cd Documentos/Tema_2

# Exclude sOTUS that have a frequency below 0.1% of the mean depth. 
# This threshold would exclude those sOTUs that are due to Illumina sequencing errors (0.1% of total reads).

qiime feature-table filter-features --i-table deblur_output/table.qza \
                                    --p-min-frequency 3 \
                                    --p-min-samples 1 \
                                    --o-filtered-table deblur_output/deblur_table_filt.qza

# Exclude low frequency sOTUS
qiime feature-table filter-seqs --i-data deblur_output/representative_sequences.qza \
                                --i-table deblur_output/deblur_table_filt.qza \
                                --o-filtered-data deblur_output/rep_seqs_filt.qza


# Summarize
qiime feature-table summarize --i-table deblur_output/deblur_table_filt.qza --o-visualization deblur_output/deblur_table_filt_summary.qzv

EOT

Saved FeatureTable[Frequency] to: deblur_output/deblur_table_filt.qza
Saved FeatureData[Sequence] to: deblur_output/rep_seqs_filt.qza
Saved Visualization to: deblur_output/deblur_table_filt_summary.qzv


### Phylogenetic distances determination using FastTree

# Outputs

In [10]:
%%bash
jupyter nbconvert --to=latex --template=~/report.tplx environmental_population_analysis.ipynb
/Library/TeX/texbin/pdflatex -shell-escape environmental_population_analysis
jupyter nbconvert --to html_toc environmental_population_analysis.ipynb 1>/dev/null 2>/dev/null

This is pdfTeX, Version 3.14159265-2.6-1.40.19 (TeX Live 2018) (preloaded format=pdflatex)
 \write18 enabled.
entering extended mode
(./environmental_population_analysis.tex
LaTeX2e <2018-04-01> patch level 2
Babel <3.18> and hyphenation patterns for 84 language(s) loaded.
(/usr/local/texlive/2018/texmf-dist/tex/latex/base/article.cls
Document Class: article 2014/09/29 v1.4h Standard LaTeX document class
(/usr/local/texlive/2018/texmf-dist/tex/latex/base/size11.clo))
(/usr/local/texlive/2018/texmf-dist/tex/latex/placeins/placeins.sty)
(/usr/local/texlive/2018/texmf-dist/tex/latex/amsfonts/amssymb.sty
(/usr/local/texlive/2018/texmf-dist/tex/latex/amsfonts/amsfonts.sty))
(/usr/local/texlive/2018/texmf-dist/tex/latex/amsmath/amsmath.sty
For additional information on amsmath, use the `?' option.
(/usr/local/texlive/2018/texmf-dist/tex/latex/amsmath/amstext.sty
(/usr/local/texlive/2018/texmf-dist/tex/latex/amsmath/amsgen.sty))
(/usr/local/texlive/2018/texmf-dist/tex/latex/amsmath/amsbsy.sty

[NbConvertApp] Converting notebook environmental_population_analysis.ipynb to latex
Your version must be at least (1.12.1) but less than (2.0.0).
Refer to http://pandoc.org/installing.html.
Continuing with doubts...
  check_pandoc_version()
Your version must be at least (1.12.1) but less than (2.0.0).
Refer to http://pandoc.org/installing.html.
Continuing with doubts...
  check_pandoc_version()
Your version must be at least (1.12.1) but less than (2.0.0).
Refer to http://pandoc.org/installing.html.
Continuing with doubts...
  check_pandoc_version()
Your version must be at least (1.12.1) but less than (2.0.0).
Refer to http://pandoc.org/installing.html.
Continuing with doubts...
  check_pandoc_version()
Your version must be at least (1.12.1) but less than (2.0.0).
Refer to http://pandoc.org/installing.html.
Continuing with doubts...
  check_pandoc_version()
Your version must be at least (1.12.1) but less than (2.0.0).
Refer to http://pandoc.org/installing.html.
Continuing with doubts...

CalledProcessError: Command 'b'jupyter nbconvert --to=latex --template=~/report.tplx environmental_population_analysis.ipynb\n/Library/TeX/texbin/pdflatex -shell-escape environmental_population_analysis\njupyter nbconvert --to html_toc environmental_population_analysis.ipynb 1>/dev/null 2>/dev/null\n'' returned non-zero exit status 1.