Bioinformatics Challenges

This project is to solve common bioinformatic challenges using Python.

Many of the functions and scripts here are inspired by challenges on Rosalind.

bioinfo_toolbox.py

This script contains important functions pertaining to bioinformatics. This module is stand alone and can be useful for elementary bioinformatic processes.

Many of the lower level Rosalind challenges can be solved using this simple module.

The table below shows the available functions with descriptions:

Function	Description	Arguments
convert_phred()	Takes a phred score and returns the respective Qscore	letter, val=33
qual_score()	Takes a string of quality scores and returns the average quality score	phred_score
validate_base_seq()	Takes a sequence and confirms that it is DNA or RNA by returning bool	seq, RNAflag=False
gc_content()	Takes a DNA or RNA sequence and returns proportion of sequence that is 'G' or 'C'	seq
calc_median()	Takes a sorted numerical list and returns the median value from that list	sortedlist
oneline_fasta()	Takes a FASTA file and outputs a FASTA file where each sequence is only one line	filer, filew='oneline.fa'
rev_compliment()	Takes a sequence of DNA or RNA and returns the reverse compliment sequence	seq
dna_to_aa()	Takes a sequence of DNA and returns a list of peptides that are encoded from that DNA sequence	seq
permutation_calc()	Takes n and r and returns number of permutations and an optional list of numeric permutations	n, r, perm_out=True
transition_transversion()	Takes two DNA sequences and returns the transition to transversion ratio R(s₁, s₂)	seq1, seq2
kmerize()	Takes a sequence and kmerizes it with k length	seq, k

Rosalind Challenges

Rosalind contains a large set of bioinformatic challenges that are free to attempt and learn from.

Here is code to solve some of Rosalind's challenges found in the rosalind folder.

Script	Description	Problem Title
point_mutations.py	Calculates the hamming distance between two sequences	Counting Point Mutations
open_reading_f.py	Finds all possible polypeptides from a DNA sequence	Open Reading Frames
restriction_sites.py	Locates restriction sites by finding reverse palindromes in DNA	Locating Restriction Sites
mRNA_poss.py	Calculates number of mRNA sequences a protein sequence could have been derived from	Inferring mRNA from Protein
shared_motif.py	Finds the longest shared motif in a DNA FASTA file	Finding a Shared Motif
rna_splicing.py	Takes a DNA sequence with intron sequences and splices, returns resulting protein string	RNA Splicing
transition_transversion.py	Takes a FASTA file with two sequences and returns the transition to transversion ration between the two	Transitions and Transversions
overlap_graphs.py	Takes a FASTA file and compares all sequences to find overlapping suffixes and prefixes	Overlap Graphs
fibonacci_recurrence.py	Given n (generations) and k (offspring per generation) returns number of breeding pairs after n time	Rabbits and Recurrence Relations
enumerating_kmers.py	From a string of letters, returns all combinations of length r and sorts lexicographically	Enumerating k-mers Lexicographically
kmer_composition.py	Takes a single sequence in FASTA format and returns the 4-mer composition of the sequence in lexicographic order	k-Mer Composition
consensus_seq.py	Takes a FASTA file of sequences of all the same length and returns the consensus sequence and nucleotide make up per position	Consensus and Profile

Usage

To use bioinfo_toolbox.py or Rosalind scripts, follow this breif guide.

Required Packages

sys
math
random

NOTE: All Rosalind scripts must be executed from the rosalind directory

First clone the repository:

git clone https://github.com/ivango17/Bioinformatics_Challenges.git

Navigate to the debug directory:

cd Bioinformatics_Challenges/debug

Run the test script to ensure that bioinfo_toolbox.py is working properly:

./debug.py

or

python debug.py

To use this repository to solve Rosalind problems and navigate to the rosalind directory:

cd ../rosalind

Each script has a help option on how to input data from Rosalind:

./<script.py> -h

or

python <script.py> -h

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
debug		debug
rosalind		rosalind
README.md		README.md
bioinfo_toolbox.py		bioinfo_toolbox.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bioinformatics Challenges

bioinfo_toolbox.py

Rosalind Challenges

Usage

Required Packages

To Do

About

Releases

Packages

Languages

ivango17/Bioinformatics_Challenges

Folders and files

Latest commit

History

Repository files navigation

Bioinformatics Challenges

bioinfo_toolbox.py

Rosalind Challenges

Usage

Required Packages

To Do

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages