## FastaAligner
Class for aligning a directory of FASTA files and also generating phylogenetic trees using Parsnp.
This class is meant for handling FASTA files with one sequence per file, and will raise
an error if given a multi-FASTA file. 

This feature can be imported in the following way:

In [2]:
from transmission_toolkit.FASTAtools import FastaAligner

__Aligning a directory of genomes__

Aligning sequences should be done via the align() method. This method uses Parsnp to align the sequences and also generate a phylogenetic tree in Newick file format.

In [None]:
fasta = 'path/to/fasta'
reference = 'path/to/reference/genome'

sequence = FastaAligner(fasta)
sequence.align(reference, output_dir='output_directory', threads=2)

Note that the align() method can take key word arguments output_dir and threads which specify the output directory and number of threads to use for the alignment and phylogeny processes.

# MultiFastaParser
This class parses a file in multi-FASTA format and makes it easier to access records in the file.

This feature can be imported in the following way:

In [3]:
from transmission_toolkit.FASTAtools import MultiFastaParser

__Accessing records in multi-FASTA file__

MultiFastaParser already parses the multi-FASTA file for you and generates an iterable of records that can be utilized. The records generated from MultiFastaParser are always in the order that they appeared in the multi-FASTA file.

In [4]:
mfasta = MultiFastaParser('tutorial_seqs/parsnp/parsnp.mfa')

In [5]:
for record in mfasta.records:
    print(record.name)

sequence.fasta.ref
record1.fna
record10.fna
record3.fna
record2.fna
record5.fna
record4.fna
record6.fna
record7.fna
record8.fna
record9.fna


In [7]:
# Picking first record in multi-FASTA
first_record = mfasta.records[0]

# Accessing the record's sequence
first_record.seq

'ATTAAAGGTTTATACCTTCCCAGGTAACAAACCAACCAACTTTCGATCTCTTGTAGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAGTGCACTCACGCAGTATAATTAATAACTAATTACTGTCGTTGACAGGACACGAGTAACTCGTCTATCTTCTGCAGGCTGCTTACGGTTTCGTCCGTGTTGCAGCCGATCATCAGCACATCTAGGTTTCGTCCGGGTGTGACCGAAAGGTAAGATGGAGAGCCTTGTCCCTGGTTTCAACGAGAAAACACACGTCCAACTCAGTTTGCCTGTTTTACAGGTTCGCGACGTGCTCGTACGTGGCTTTGGAGACTCCGTGGAGGAGGTCTTATCAGAGGCACGTCAACATCTTAAAGATGGCACTTGTGGCTTAGTAGAAGTTGAAAAAGGCGTTTTGCCTCAACTTGAACAGCCCTATGTGTTCATCAAACGTTCGGATGCTCGAACTGCACCTCATGGTCATGTTATGGTTGAGCTGGTAGCAGAACTCGAAGGCATTCAGTACGGTCGTAGTGGTGAGACACTTGGTGTCCTTGTCCCTCATGTGGGCGAAATACCAGTGGCTTACCGCAAGGTTCTTCTTCGTAAGAACGGTAATAAAGGAGCTGGTGGCCATAGTTACGGCGCCGATCTAAAGTCATTTGACTTAGGCGACGAGCTTGGCACTGATCCTTATGAAGATTTTCAAGAAAACTGGAACACTAAACATAGCAGTGGTGTTACCCGTGAACTCATGCGTGAGCTTAACGGAGGGGCATACACTCGCTATGTCGATAACAACTTCTGTGGCCCTGATGGCTACCCTCTTGAGTGCATTAAAGACCTTCTAGCACGTGCTGGTAAAGCTTCATGCACTTTGTCCGAACAACTGGACTTTATTGACACTAAGAGGGGTGTATACTGCTGCCGTGAACATGAGCATGAAATTGCTTGGTACACGGAACGTTC

__Tagging an ID to a record__

Sometimes it may be easier to work with an ID rather than a record's filename. The code below tags each record in a multi-FASTA format by the order they appear in the multi-FASTA file.

In [8]:
for idx, record in enumerate(mfasta.records):
    record.set_id(idx)
    print(record.id)

0
1
2
3
4
5
6
7
8
9
10


__Grouping records with identical sequences__

To group records by sequence, you may opt to use the get_groups() method in this class. This returns a list of sets with each se having record names grouped by identical sequences.

In [10]:
mfasta.get_groups()

[{'record9.fna', 'sequence.fasta.ref'},
 {'record1.fna', 'record2.fna'},
 {'record10.fna'},
 {'record3.fna'},
 {'record5.fna'},
 {'record4.fna', 'record6.fna'},
 {'record7.fna'},
 {'record8.fna'}]

__Inferring phylogenies with RAxML__

If you wanted to generate a phylogenetic tree using RAxML, you can use the infer_phylogeny() method. The following code uses default arguments for RAxML with the number of threads and output directory specified.

In [None]:
mfasta.infer_phylogeny(output_dir='output_directory', threads=2)

However, if you want RAxML to run a specific command, you may opt to use the custom_cmd key word argument.

In [None]:
mfasta.infer_phylogeny(custom_cmd='This is a custom RAxML command')