Repository dedicated to the software project on Bioinformatics in Python of the course Advanced Programming in Bioinformatics.
The "Bioinformatics with Python" suite was developed to show the potential of Python solving Bioinformatics problems with a couple of lines of code.
This suite contains 7 modules with the most basic functions used in Bioinformatics.
This module provides the function generate_random_dna()
which, given a desired length, returns a random dna string.
Example:
>>> generate_random_dna(10)
'ACGACGTACG'
This module computes the 'G' and 'C' nucleotides percentage in a DNA sequence with gc_content()
function.
Example:
>>> seq = 'ACGT'
>>> gc_content(seq)
0.5
This modules contains the reverse_complement()
function, which, given a DNA sequence, returns its reverse complementary sequence.
Example:
>>> seq = 'GTCATCCG'
>>> revese_complement()
'CGGATGAC'
The term 'hamming distance' is an indicator that shows the number of positions where two sequences are different, one of the implementations is as follows: initialize a counter, run through the sequences and count the times when the bases do not match.
This module provides the hamming_dist()
function, which computes the number of positions in which two sequences differ.
Example:
>>> lhs = 'ACGT'
>>> rhs = 'TCGA'
>>> hamming_dist(lhs, rhs)
2
The k-numbers are subsequences of length k within a sequence.
This module provides the k_mer()
function, which, given a DNA sequence and a k parameter, returns all existing k-mers that contains that sequence.
Example:
>>> seq = 'ACGT'
>>> k_mer(seq, 3)
['ACG','CGT']
Sequence motifs are short, recurring patterns in a DNA sequence that may have a specific biological function.
This module provides the motifs
function which, given a DNA sequence, k parameter and number of results (optional) reutrns the number of occurrences of the different k-mer in the sequence.
Example:
>>> seq = 'ACGACTAC'
>>> k = 2
>>> n_results = 4
>>> motifs(seq, k, n_results)
[('AC', 3), ('CG', 1), ('GA', 1), ('CT', 1)]
A very traditional problem in bioinformatics is the calculation of rabbit populations, which follow a function derived from the Fibonacci sequence.
This module provides the fibonacci_rabbits()
function, which, given a month, returns the number of pairs of rabbits that will exist after that time.
>>> fibonacci_rabbits(20)
10946