## Alignments
The following commands execute a series of Nextflow workflows that generate and evaluate the Multiple Sequence Alignments (MSAs).

The command lines used for the regressive alignments can be found in the `templates/dpa_align` directory of the repository.

The command lines used for the standard alignments (with input guide-trees provided) can be found in the `templates/std_align` directory of the repository.

The command lines used for the default alignments (internal guide-trees used) can be found in the `templates/default_align` directory of the repository.

The command lines used to evaluate the alignments uses T-Coffee can be found in the evaluate process of the Nextflow workflow script (`main.nf`).

In [1]:
import os
pwd = os.getcwd()
work_dir=pwd+"/.."
os.chdir(work_dir)
os.getcwd()

'/nfs/users2/cn/efloden/projects/dpa-analysis'

In [2]:
!nextflow run main.nf \
             --align_method='CLUSTALO,MAFFT-FFTNS1' \
             --trees='data/trees/*.{CLUSTALO,MAFFT-FFTNS1,MAFFT_PARTTREE}.dnd' \
             --tree_method='none' \
             --refs='data/refs/*.ref' \
             --seqs='data/combined_seqs/*.fa' \
             --regressive_align=true \
             --standard_align=true \
             --default_align=false \
             --output results \
             -profile crg \
             -with-singularity
             -resume
    

N E X T F L O W  ~  version 0.32.0
Launching `main.nf` [friendly_koch] - revision: 1e21f76d69
R E G R E S S I V E   M S A   A n a l y s i s  ~  version 0.1"
Input sequences (FASTA)                        : data/combined_seqs/*.fa
Input references (Aligned FASTA)               : data/refs/*.ref
Input trees (NEWICK)                           : data/trees/*.{CLUSTALO,MAFFT-FFTNS1,MAFFT_PARTTREE}.dnd
Output directory (DIRECTORY)                   : results
Alignment methods                              : CLUSTALO,MAFFT-FFTNS1
Tree methods                                   : none
Generate default alignments                    : false
Generate standard alignments                   : true
Generate regressive alignments (DPA)           : true
Bucket Sizes for regressive alignments         : 1000
Perform evaluation? Requires reference         : true
Output directory (DIRECTORY)                   : results

[warm up] executor > crg
[6a/a7a7ae] Submitted process > regressive_alignment (cys.CLUSTA

In [4]:
!nextflow run main.nf \
             --align_method='MAFFT-GINSI' \
             --trees='data/trees/*.{CLUSTALO,MAFFT-FFTNS1,MAFFT_PARTTREE}.dnd' \
             --tree_method='none' \
             --refs='data/refs/*.ref' \
             --seqs='data/combined_seqs/*.fa' \
             --regressive_align=true \
             --standard_align=false \
             --default_align=false \
             --output results \
             -profile crg \
             -with-singularity \
             -resume

N E X T F L O W  ~  version 0.32.0
Launching `main.nf` [scruffy_stallman] - revision: 1e21f76d69
R E G R E S S I V E   M S A   A n a l y s i s  ~  version 0.1"
Input sequences (FASTA)                        : data/combined_seqs/*.fa
Input references (Aligned FASTA)               : data/refs/*.ref
Input trees (NEWICK)                           : data/trees/*.{CLUSTALO,MAFFT-FFTNS1,MAFFT_PARTTREE}.dnd
Output directory (DIRECTORY)                   : results
Alignment methods                              : MAFFT-GINSI
Tree methods                                   : none
Generate default alignments                    : false
Generate standard alignments                   : false
Generate regressive alignments (DPA)           : true
Bucket Sizes for regressive alignments         : 1000
Perform evaluation? Requires reference         : true
Output directory (DIRECTORY)                   : results

[warm up] executor > crg
[3b/a4edf6] Submitted process > regressive_alignment (blmb.MAFFT-GINSI

In [3]:
!nextflow run main.nf \
             --align_method='UPP,MAFFT-SPARSECORE' \
             --trees='data/trees/*.{CLUSTALO,MAFFT-FFTNS1,MAFFT_PARTTREE}.dnd' \
             --tree_method='none' \
             --refs='data/refs/*.ref' \
             --seqs='data/combined_seqs/*.fa' \
             --regressive_align=true \
             --standard_align=false \
             --default_align=true \
             --output results \
             -profile crg \
             -with-singularity \
             -resume

N E X T F L O W  ~  version 0.32.0
Launching `main.nf` [festering_mayer] - revision: 1e21f76d69
R E G R E S S I V E   M S A   A n a l y s i s  ~  version 0.1"
Input sequences (FASTA)                        : data/combined_seqs/*.fa
Input references (Aligned FASTA)               : data/refs/*.ref
Input trees (NEWICK)                           : data/trees/*.{CLUSTALO,MAFFT-FFTNS1,MAFFT_PARTTREE}.dnd
Output directory (DIRECTORY)                   : results
Alignment methods                              : UPP,MAFFT-SPARSECORE
Tree methods                                   : none
Generate default alignments                    : true
Generate standard alignments                   : false
Generate regressive alignments (DPA)           : true
Bucket Sizes for regressive alignments         : 1000
Perform evaluation? Requires reference         : true
Output directory (DIRECTORY)                   : results

[warm up] executor > crg
[dc/8e812c] Submitted process > default_alignment (hpr.MAFFT-SP

In [2]:
!nextflow run main.nf \
             --align_method='CLUSTALO,MAFFT-FFTNS1,MAFFT-GINSI' \
             --tree_method='CLUSTALO,MAFFT-FFTNS1,MAFFT_PARTTREE' \
             --refs='data/refs/*.ref' \
             --seqs='data/refs_fasta/*.ref' \
             --regressive_align=true \
             --standard_align=true \
             --default_align=false \
             --output results_reference \
             -profile crg \
             -with-singularity \
             -resume

N E X T F L O W  ~  version 0.32.0
Launching `main.nf` [berserk_rosalind] - revision: 1e21f76d69
R E G R E S S I V E   M S A   A n a l y s i s  ~  version 0.1"
Input sequences (FASTA)                        : data/refs_fasta/*.ref
Input references (Aligned FASTA)               : data/refs/*.ref
Input trees (NEWICK)                           : false
Output directory (DIRECTORY)                   : results_reference
Alignment methods                              : CLUSTALO,MAFFT-FFTNS1,MAFFT-GINSI
Tree methods                                   : CLUSTALO,MAFFT-FFTNS1,MAFFT_PARTTREE
Generate default alignments                    : false
Generate standard alignments                   : true
Generate regressive alignments (DPA)           : true
Bucket Sizes for regressive alignments         : 1000
Perform evaluation? Requires reference         : true
Output directory (DIRECTORY)                   : results_reference

[warm up] executor > crg
[90/a0e28a] Submitted process > guide_trees (cys.

In [3]:
!nextflow run main.nf \
             --align_method='UPP,MAFFT-SPARSECORE' \
             --tree_method='CLUSTALO,MAFFT-FFTNS1,MAFFT_PARTTREE' \
             --refs='data/refs/*.ref' \
             --seqs='data/refs_fasta/*.ref' \
             --regressive_align=true \
             --standard_align=false \
             --default_align=true \
             --output results_reference \
             -profile crg \
             -with-singularity \
             -resume

N E X T F L O W  ~  version 0.32.0
Launching `main.nf` [mad_mercator] - revision: 1e21f76d69
R E G R E S S I V E   M S A   A n a l y s i s  ~  version 0.1"
Input sequences (FASTA)                        : data/refs_fasta/*.ref
Input references (Aligned FASTA)               : data/refs/*.ref
Input trees (NEWICK)                           : false
Output directory (DIRECTORY)                   : results_reference
Alignment methods                              : UPP,MAFFT-SPARSECORE
Tree methods                                   : CLUSTALO,MAFFT-FFTNS1,MAFFT_PARTTREE
Generate default alignments                    : true
Generate standard alignments                   : false
Generate regressive alignments (DPA)           : true
Bucket Sizes for regressive alignments         : 1000
Perform evaluation? Requires reference         : true
Output directory (DIRECTORY)                   : results_reference

[warm up] executor > crg
[d2/aa0677] Cached process > guide_trees (hla.MAFFT-FFTNS1)
[9b/a5