These are some of the Python functions I created in Prof. Rachel Trana's CS 327, Computational Methods in Biology, course at Northeastern Illinois University, in the Spring 2019 semester.
They implement various algorithms that have been historically useful in bioinformatics. The problems they solve are accessible on the Rosalind bioinformatics algorithms learning site. Each algorithm has a link to the relevant Rosalind page in its description.
Please do not steal these and submit them as your own work for the course.
Download the repository, then access the functions inside it from your own Python file using import statements. I wrote these in Python 2.7. As such, they're not guaranteed to work with other versions of Python.
To test the functions, you'll need a text input in the correct format. Please check the Rosalind page for each algorithm to see a sample of the expected input format. Then, the input has to be manipulated into the correct format to be passed into each function. I have left the code to do that out of this repository.
Finds the alignment of two strings (of amino acids) that is highest-scoring according to the BLOSUM62 scoring matrix. Uses a dynamic programming algorithm to choose the highest-scoring path through a matrix of scores representing possible alignments.
Parameters: string (first string), string (second string)
Returns: string, string (strings modified to fit alignment that is highest-scoring for both)
Rosalind page: http://rosalind.info/problems/ba5e/
Finds the longest string of letters (in the original intent of this problem, nucleotides) contained in the same order in both of two input strings.
Parameters: string (first string), string (second string)
Returns: string (longest common subsequence)
Imports: numpy
Rosalind page: http://rosalind.info/problems/ba5c/
Given a directed graph, output the nodes in topological order. In other words, start with a random node that has no incoming edges. It becomes first in order. Then, remove it and all of its outgoing edges. Repeat this process until all nodes have been removed. The resulting order is a topological order (there may be multiple ones for a single graph).
Takes a list of edges as a parameter. Each element in the list is formatted as follows, where 0 represents the node where the edge is outgoing, and and 1 represents the node where it is incoming:
'0 -> 1'
Parameters: list
Returns: list (numbered nodes in topological order)
Rosalind page:http://rosalind.info/problems/ba5n/
This is a tool for comparing two files. I use it to compare my algorithms' output with a sample output, to ensure correctness.
Parameters: string filename1, string filename2
Returns: string (multiple lines). Each line indicates whether the given line matches in both files
Takes a given string of nucleotides, reverses it, then returns the complement (replacing all A nucleotides with T, G with C, and vice versa).
Parameters: string
Returns: string
Rosalind page: http://rosalind.info/problems/ba1c/
Converts a pattern of DNA nucleotides to a base 10 integer representation.
Parameters: string (a given k-mer)
Returns: integer (base 10 index of given k-mer)
Converts a base 10 integer representation of DNA nucleotide pattern into a string representation.
Parameters: integer (base 10 index of given k-mer), integer (value of k)
Returns: string
Uses the Sorted-Faster-Frequent-Words Algorithm. Returns the k-mer or k-mers (that is, substrings of length k, given a value for k) that occur most frequently in a nucleotide string.
Parameters: string (nucleotide string), integer (value of k)
Returns: list (most frequent k-mer(s))