# Tutorial para usar QIIME en PATUNG

Tutorial para correr QIIME en el servidor de PATUNG del LANCIS.
Se asume que el usuario ya tiene cuenta y contraseña del servidor de PATUNG.
El usuario debe tener conocimiento de BASH para crear y mover archivos (no es cierto, pongo los comandos aquí pero si ayuda).

* Nota: si no te gusta Linux (como a mi) todo el proceso se puede realizar en Windows 10 usando el subsistema Linux/Ubuntu sin usar máquina virtual. 


Para empezar este tutorial debemos estar dentro de nuestra sesión de PATUNG y corriendo el Notebook remotamente en nuestra máquina local. Para esto por favor sigue las instrucciones en el siguiente link: __```url```__



## Usando QIIME

Una ve

Set path to analysis directory.

In [None]:
ANALYSIS_PATH=/srv/home/anavarro/agave_metagenom
echo ANALYSIS_PATH

TMPDIR=/srv/home/anavarro/tmp
source activate qiime2-2018.8

1) Import seqs

In [None]:
{
echo '#!/bin/bash'
printf 'export TMPDIR=%s\n' "$TMPDIR"

echo 'qiime tools import \'
echo "--type 'SampleData[PairedEndSequencesWithQuality]' \\"
printf -- '--input-path %s/1_secuencias/2_seqs_manifest.txt \\\n' "$ANALYSIS_PATH"
echo '--input-format PairedEndFastqManifestPhred33 \'
printf --  '--output-path %s/2_resultados/1_demultiplexed_pairedEnd_seqs.qza' "$ANALYSIS_PATH"

} > 1_import_seqs.sh

In [None]:
{
printf 'executable 	= %s/3_ejecutables/1_import_seqs.sh\n' "$ANALYSIS_PATH" 
echo 'getenv		= True'
printf 'output		= %s/4_outs/1_import_seqs$(Process).out\n' "$ANALYSIS_PATH" 
printf 'log  		= %s/5_logs/1_import_seqs$(Process).log\n' "$ANALYSIS_PATH" 
printf 'error		= %s/6_errores/1_import_seqs$(Process).error\n' "$ANALYSIS_PATH" 

echo 'request_cpus	=5'
echo 'queue'
} > 1_import_seqs_condor.condor

In [None]:
condor_submit 1_import_seqs_condor.condor

In [None]:
condor_q

2) Join pair ends

In [None]:
{
echo '#!/bin/bash'
printf 'export TMPDIR=%s\n' "$TMPDIR"

echo 'qiime vsearch join-pairs \'
printf --  '--i-demultiplexed-seqs %s/2_resultados/1_demultiplexed_pairedEnd_seqs.qza \\\n' "$ANALYSIS_PATH"
printf --  '--o-joined-sequences %s/2_resultados/2_demultiplexed_joined_seqs_wq.qza \\\n' "$ANALYSIS_PATH"
echo '--p-allowmergestagger \'
echo '--verbose'
} > 2_vsearch_join-pairs.sh

In [None]:
{
printf 'executable 	= %s/3_ejecutables/2_vsearch_join-pairs.sh\n' "$ANALYSIS_PATH" 
echo 'getenv		= True'
printf 'output		= %s/4_outs/2_vsearch_join$(Process).out\n' "$ANALYSIS_PATH" 
printf 'log  		= %s/5_logs/2_vsearch_join$(Process).log\n' "$ANALYSIS_PATH" 
printf 'error		= %s/6_errores/2_vsearch_join$(Process).error\n' "$ANALYSIS_PATH" 

echo 'request_cpus	=5'

echo 'queue'
} > 2_vsearch_join-pairs_submit.condor

In [None]:
condor_submit 2_vsearch_join-pairs_submit.condor

3) Quality filter

In [None]:
{
echo '#!/bin/bash'
printf 'export TMPDIR=%s\n' "$TMPDIR"

echo 'qiime quality-filter q-score-joined \'
printf -- '--i-demux %s/2_resultados/2_demultiplexed_joined_seqs_wq.qza \\\n' "$ANALYSIS_PATH"
echo '--p-min-quality 4 \'
echo '--p-quality-window 3 \'
echo '--p-min-length-fraction 0.75 \'
echo '--p-max-ambiguous 2 \'
printf -- '--output-dir %s/2_resultados/3_q_filter_output \\\n' "$ANALYSIS_PATH"
echo '--verbose'
} > 3_quality-filter.sh

In [None]:
{
printf 'executable 	= %s/3_ejecutables/3_quality-filter.sh\n' "$ANALYSIS_PATH" 
echo 'getenv		= True'
printf 'output		= %s/4_outs/3_quality-filter$(Process).out\n' "$ANALYSIS_PATH" 
printf 'log  		= %s/5_logs/3_quality-filter$(Process).log\n' "$ANALYSIS_PATH" 
printf 'error		= %s/6_errores/3_quality-filter$(Process).error\n' "$ANALYSIS_PATH" 

echo 'request_cpus	=5'

echo 'queue'
} > 3_quality-filter_submbit.condor

In [None]:
condor_submit 3_quality-filter_submbit.condor

4) Deblur denoise

In [None]:
{
echo '#!/bin/bash'
printf 'export TMPDIR=%s\n' "$TMPDIR"

echo 'qiime deblur denoise-16S \'
printf -- '--i-demultiplexed-seqs %s/2_resultados/3_q_filter_output/filtered_sequences.qza \\\n' "$ANALYSIS_PATH"
echo '--p-trim-length 250 \'
echo '--p-sample-stats \'
echo '--p-jobs-to-start 40 \'
printf -- '--output-dir %s/2_resultados/4_deblur_output \\\n' "$ANALYSIS_PATH"
echo '--verbose'
} > 4_deblur-denoise.sh

In [None]:
{
printf 'executable 	= %s/3_ejecutables/4_deblur-denoise.sh\n' "$ANALYSIS_PATH" 

echo 'getenv		= True'
printf 'output		= %s/4_outs/4_deblur-denoise$(Process).out\n' "$ANALYSIS_PATH" 
printf 'log  		= %s/5_logs/4_deblur-denoise$(Process).log\n' "$ANALYSIS_PATH" 
printf 'error		= %s/6_errores/4_deblur-denoise$(Process).error\n' "$ANALYSIS_PATH" 

echo 'request_cpus	=5'

echo 'queue'
} > 4_deblur-denoise_submbit.condor

In [None]:
condor_submit 4_deblur-denoise_submbit.condor

5) cluster-features-de-novo. Open reference.

In [None]:
{
echo '#!/bin/bash'
printf 'export TMPDIR=%s\n' "$TMPDIR"

echo 'qiime vsearch cluster-features-de-novo \'
printf -- '--i-sequences %s/2_resultados/4_deblur_output/representative_sequences.qza \\\n' "$ANALYSIS_PATH"
printf -- '--i-table %s/2_resultados/4_deblur_output/table.qza \\\n' "$ANALYSIS_PATH"
echo '--p-perc-identity 0.97 \'
echo '--p-threads 0 \'
printf -- '--output-dir %s/2_resultados/5_vsearch_output \\\n' "$ANALYSIS_PATH"
echo '--verbose'
} > 5_vsearch_cluster_denovo.sh

In [None]:
{
printf 'executable 	= %s/3_ejecutables/5_vsearch_cluster_denovo.sh\n' "$ANALYSIS_PATH" 
echo 'getenv		= True'

printf 'output		= %s/4_outs/5_vsearch_denovo$(Process).out\n' "$ANALYSIS_PATH" 
printf 'log  		= %s/5_logs/5_vsearch_denovo$(Process).log\n' "$ANALYSIS_PATH" 
printf 'error		= %s/6_errores/5_vsearch_denovo$(Process).error\n' "$ANALYSIS_PATH" 

echo 'request_cpus	=5'
echo 'queue'

} > 5_vsearch_cluster_denovo_submbit.condor

In [None]:
 condor_submit 5_vsearch_cluster_denovo_submbit.condor

6) Alineamiento

In [None]:
{
echo '#!/bin/bash'
printf 'export TMPDIR=%s\n' "$TMPDIR"

echo 'qiime alignment mafft \'
printf -- '--i-sequences %s/2_resultados/5_vsearch_output/clustered_sequences.qza \\\n' "$ANALYSIS_PATH"
echo '--p-n-threads 1 \'
printf -- '--output-dir %s/2_resultados/6_alignment_output \\\n' "$ANALYSIS_PATH"
echo '--verbose'

} > 6_alignment_mafft.sh

In [None]:
{
printf 'executable 	= %s/3_ejecutables/6_alignment_mafft.sh\n' "$ANALYSIS_PATH" 
echo 'getenv		= True'

printf 'output		= %s/4_outs/6_alignment_mafft$(Process).out\n' "$ANALYSIS_PATH" 
printf 'log  		= %s/5_logs/6_alignment_mafft$(Process).log\n' "$ANALYSIS_PATH" 
printf 'error		= %s/6_errores/6_alignment_mafft$(Process).error\n' "$ANALYSIS_PATH" 

echo 'request_cpus	= 5'
echo 'queue'

} > 6_alignment_mafft_submbit.condor

In [None]:
condor_submit 6_alignment_mafft_submbit.condor

7) qiime feature-classifier classify-sklearn

In [None]:
Bajar gg-13-8-99-nb-classifier.qza

wget 'https://data.qiime2.org/2018.8/common/gg-13-8-99-nb-classifier.qza'

In [None]:
{
echo '#!/bin/bash'
printf 'export TMPDIR=%s\n' "$TMPDIR"

echo 'qiime feature-classifier classify-sklearn \'
printf -- '--i-reads %s/2_resultados/5_vsearch_output/clustered_sequences.qza \\\n' "$ANALYSIS_PATH"
printf -- '--i-classifier %s/7_databases/gg-13-8-99-nb-classifier.qza \\\n' "$ANALYSIS_PATH"
echo '--p-n-jobs -1 \'
echo '--p-confidence 0.8 \'
printf -- '--output-dir %s/2_resultados/7_taxonomy_output \\\n' "$ANALYSIS_PATH"
echo '--verbose'

} > 7_taxonomy_gg_sk.sh

In [None]:
{
printf 'executable 	= %s/3_ejecutables/7_taxonomy_gg_sk.sh\n' "$ANALYSIS_PATH" 
echo 'getenv		= True'
printf 'output		= %s/4_outs/7_taxonomy$(Process).out\n' "$ANALYSIS_PATH"
printf 'log  		= %s/5_logs/7_taxonomy$(Process).log\n' "$ANALYSIS_PATH"
printf 'error		= %s/6_errores/7_taxonomy$(Process).error\n' "$ANALYSIS_PATH"

echo 'request_cpus	= 5'
echo 'queue'
} > 7_taxonomy_gg_sk_submbit.condor

In [None]:
condor_submit 7_taxonomy_gg_sk_submbit.condor

8) Exporting clustered output to biom file

In [None]:
{
echo '#!/bin/bash'
printf 'export TMPDIR=%s\n' "$TMPDIR"

echo 'qiime tools export \'
printf -- '--input-path %s/2_resultados/5_vsearch_output/clustered_table.qza \\\n' "$ANALYSIS_PATH"
printf -- '--output-path %s/2_resultados/8_biom/' "$ANALYSIS_PATH"
} > 8_export_biom.sh

In [None]:
{
printf 'executable 	= %s/3_ejecutables/8_export_biom.sh\n' "$ANALYSIS_PATH"
echo 'getenv		= True'

printf 'output		= %s/4_outs/8_biom$(Process).out\n' "$ANALYSIS_PATH"
printf 'log  		= %s/5_logs/8_biom$(Process).log\n' "$ANALYSIS_PATH"
printf 'error		= %s/6_errores/8_biom$(Process).error\n' "$ANALYSIS_PATH"

echo 'request_cpus	= 5'

echo 'queue'
} > 8_export_biom.condor

In [None]:
condor_submit 8_export_biom.condor

In [None]:
biom summarize-table -i ../2_resultados/8_biom/feature-table.biom -o ../2_resultados/8_biom/biom_sum.txt

9) Exporting taxonomy classificaction to TSV file

In [None]:
{
echo '#!/bin/bash'
printf 'export TMPDIR=%s\n' "$TMPDIR"

echo 'qiime tools export \'
printf -- '--input-path %s/2_resultados/7_taxonomy_output/classification.qza \\\n' "$ANALYSIS_PATH"
printf -- '--output-path %s/2_resultados/9_tsv_gg_sk' "$ANALYSIS_PATH"
} > 9_export_to_tsv_gg_sk.sh

In [None]:
{
printf 'executable 	= %s/3_ejecutables/9_export_to_tsv_gg_sk.sh\n' "$ANALYSIS_PATH"
echo 'getenv		= True'
printf 'output		= %s/4_outs/9_export_tsv$(Process).out\n' "$ANALYSIS_PATH"
printf 'log  		= %s/5_logs/9_export_tsv$(Process).log\n' "$ANALYSIS_PATH"
printf 'error		= %s/6_errores/9_export_tsv$(Process).error\n' "$ANALYSIS_PATH"

echo 'request_cpus	= 5'

echo 'queue'
} > 9_export_to_tsv_condor.condor

In [None]:
condor_submit 9_export_to_tsv_condor.condor

10) Joining biom with classification and metadata.

In [None]:
{
echo '#!/bin/bash'
printf 'export TMPDIR=%s\n' "$TMPDIR"

echo 'biom add-metadata \'
printf -- '-i %s/2_resultados/8_biom/feature-table.biom  \\\n' "$ANALYSIS_PATH"
printf -- '-o %s/2_resultados/10_biomtable_w_tax.biom  \\\n' "$ANALYSIS_PATH"
printf -- '--observation-metadata-fp %s/2_resultados/9_tsv_gg_sk/3_taxonomy_updated.tsv \\\n' "$ANALYSIS_PATH"
echo '--observation-header OTUID,taxonomy \'
echo '--sc-separated taxonomy  \'
printf -- '--sample-metadata-fp %s/1_secuencias/2_metadata_agave.txt  \\\n' "$ANALYSIS_PATH"
echo '--sample-header SampleID,acetone,consumed_acetate,propionic_acid,butyric,consumed_carbs,hydrogen'
} > 10_join_biom_tax.sh

In [None]:
biom summarize-table -i ../2_resultados/10_biomtable_w_tax.biom -o ../2_resultados/10_biom_sum.txt

11) Exporting biom as OTU table. Useful for other analyses.

In [None]:
{
echo '#!/bin/bash'
printf 'export TMPDIR=%s\n' "$TMPDIR"

echo 'biom convert \'
printf -- '-i %s/2_resultados/10_biomtable_w_tax.biom \\\n' "$ANALYSIS_PATH"
printf -- '-o %s/2_resultados/11_table.from_biom_w_taxonomy.txt \\\n' "$ANALYSIS_PATH"
echo '--to-tsv \'
echo '--header-key taxonomy'
} > 11_biom_to_otu_table_gg_sk.sh

12) phylogeny fast tree

In [None]:
{
echo '#!/bin/bash'
printf 'export TMPDIR=%s\n' "$TMPDIR"

echo 'qiime phylogeny fasttree \'
printf -- '--i-alignment %s/2_resultados/6_alignment_output/alignment.qza \\\n' "$ANALYSIS_PATH"
echo '--p-n-threads 1 \'
printf -- '--o-tree %s/2_resultados/12_tree_output \\\n' "$ANALYSIS_PATH"
echo '--verbose'
} > 12_fasttree.sh

In [None]:
{
printf 'executable 	= %s/3_ejecutables/12_fasttree.sh\n' "$ANALYSIS_PATH"
echo 'getenv		= True'
printf 'output		= %s/4_outs/12_fasttree$(Process).out\n' "$ANALYSIS_PATH"
printf 'log  		= %s/5_logs/12_fasttree$(Process).log\n' "$ANALYSIS_PATH"
printf 'error		= %s/6_errores/12_fasttree$(Process).error\n' "$ANALYSIS_PATH"


echo 'request_cpus	= 5'

echo 'queue'
} > 12_fasttree_submbit.condor

13) Exporting qza tree to nwk

In [None]:
{
echo '#!/bin/bash'
printf 'export TMPDIR=%s\n' "$TMPDIR"

echo 'qiime tools export \'
printf -- '--input-path %s/2_resultados/12_tree_output.qza \\\n' "$ANALYSIS_PATH"
printf -- '--output-path %s/2_resultados/13_exported-tree' "$ANALYSIS_PATH"
} > 13_export_to_nwk.sh

In [None]:
{
printf 'executable 	= %s/3_ejecutables/13_export_to_nwk.sh\n' "$ANALYSIS_PATH"
echo 'getenv		= True'
printf 'output		= %s/4_outs/11_export_to_nwk$(Process).out\n' "$ANALYSIS_PATH"
printf 'log  		= %s/5_logs/11_export_to_nwk$(Process).log\n' "$ANALYSIS_PATH"
printf 'error		= %s/6_errores/11_export_to_nwk$(Process).error\n' "$ANALYSIS_PATH"


echo 'request_cpus	= 5'

echo 'queue'
} > 13_export_to_nwk_condor.condor