# Biopython projects

In the next days I will be learning some Biopython useful features to perform different bioinformatics-related tasks. If you want to learn more, I encourage you to read Biopython **Tutorial and Cookbook** written by Jeff Chang, Brad Chapman, Iddo Friedberg, Thomas Hamelryck,
Michiel de Hoon, Peter Cock, Tiago Antao, Eric Talevich and Bartek Wilczynski (december 2019). I have learnt most of the things I show you in this project by reading this book.

In [36]:
import Bio
print(Bio.__version__)

1.77


I think the simplest example of a Biopython functionality is to count the amino-acid frequencies in a protein, like TRPA1. 

In short, TRPA1 is a polymodal sensor that detects danger signals. It is a promiscuous chemical nocisensor that seems to be involved in noxious cold and mechanical stimuli sensation, itching and another physiological processes. It is an ion channel which belongs to the TRP channels family, it permeates cations in a non-selective fashion, and is associated with some pathological states such as chemotherapy-induced neuropathy (by oxaliplatin and paclitaxel), osteoarthritis and postoperative pain.

Well, we can easily extract its amino-acid composition easily:

In [37]:
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
import requests

# We get the fasta sequence from Uniprot (Uniprot ID for TRPA1 is 75762)
TRPA1_fasta = requests.get("https://www.uniprot.org/uniprot/O75762.fasta", headers={'Accept-Encoding': 'br'}).text.strip()
print("This is the Uniprot fasta file for TRPA1:\n{}\n".format(TRPA1_fasta))

# We split the entry by newline symbols
TRPA1_split = TRPA1_fasta.split("\n")

# We discard the header and join TRPA1 sequence in an only string
TRPA1_seq = "".join(TRPA1_split[1:])
print("This is TRPA1 sequence:\n{}".format(TRPA1_seq))


TRPA1 = Seq(TRPA1_seq, IUPAC.protein)

print("\nNow let's analyze TRPA1 amino-acid composition.")
for aminoacid in "ACDEFGHIKLMNPQRSTVWY":
    print("There is {}% of {} in TRPA1 sequence.".format(
        round(100*TRPA1.count(aminoacid)/len(TRPA1), 1), aminoacid))

This is the Uniprot fasta file for TRPA1:
>sp|O75762|TRPA1_HUMAN Transient receptor potential cation channel subfamily A member 1 OS=Homo sapiens OX=9606 GN=TRPA1 PE=1 SV=3
MKRSLRKMWRPGEKKEPQGVVYEDVPDDTEDFKESLKVVFEGSAYGLQNFNKQKKLKRCD
DMDTFFLHYAAAEGQIELMEKITRDSSLEVLHEMDDYGNTPLHCAVEKNQIESVKFLLSR
GANPNLRNFNMMAPLHIAVQGMNNEVMKVLLEHRTIDVNLEGENGNTAVIIACTTNNSEA
LQILLKKGAKPCKSNKWGCFPIHQAAFSGSKECMEIILRFGEEHGYSRQLHINFMNNGKA
TPLHLAVQNGDLEMIKMCLDNGAQIDPVEKGRCTAIHFAATQGATEIVKLMISSYSGSVD
IVNTTDGCHETMLHRASLFDHHELADYLISVGADINKIDSEGRSPLILATASASWNIVNL
LLSKGAQVDIKDNFGRNFLHLTVQQPYGLKNLRPEFMQMQQIKELVMDEDNDGCTPLHYA
CRQGGPGSVNNLLGFNVSIHSKSKDKKSPLHFAASYGRINTCQRLLQDISDTRLLNEGDL
HGMTPLHLAAKNGHDKVVQLLLKKGALFLSDHNGWTALHHASMGGYTQTMKVILDTNLKC
TDRLDEDGNTALHFAAREGHAKAVALLLSHNADIVLNKQQASFLHLALHNKRKEVVLTII
RSKRWDECLKIFSHNSPGNKCPITEMIEYLPECMKVLLDFCMLHSTEDKSCRDYYIEYNF
KYLQCPLEFTKKTPTQDVIYEPLTALNAMVQNNRIELLNHPVCKEYLLMKWLAYGFRAHM
MNLGSYCLGLIPMTILVVNIKPGMAFNSTGIINETSDHSEILDTTNSYLIKTCMILVFLS
SIFGYCKEAGQIFQQKRNYFMDISNVLEWIIYTT

We see that the most abundant amino-acids are leucin and isoleucin (around 20% of TRPA1 amino-acids combined). The abundance of this two non-polar amino-acids makes sense, as TRPA1 is a transmembrane channel so it has a large transmembrane region and the inner part of the plasma membrane is hydrophobic.

Besides, we can also count TRPA1 composition discriminating between hydrophobic, polar, negatively and positively charged amino acids:

In [38]:
hydrophobic = polar = negative = positive = 0

for aminoacid in "ACDEFGHIKLMNPQRSTVWY":
    if aminoacid in "AFGILMPVW":
        hydrophobic += round(100*TRPA1.count(aminoacid)/len(TRPA1), 1)
    elif aminoacid in "CNQSTY":
        polar += round(100*TRPA1.count(aminoacid)/len(TRPA1), 1)
    elif aminoacid in "HKR":
        positive += round(100*TRPA1.count(aminoacid)/len(TRPA1), 1)
    elif aminoacid in "DE":
        negative += round(100*TRPA1.count(aminoacid)/len(TRPA1), 1)

print("There is {}% hydrophobic, {}% polar, {}% positively-charged and {}% \
negatively-charged amino-acids".format(hydrophobic, polar, positive, negative))

There is 48.8% hydrophobic, 25.9% polar, 14.3% positively-charged and 11.1% negatively-charged amino-acids


It is clear that the vast majority of amino-acids in this protein are hydrophobic.

This simple task could have been easily done with base Python, but I just wanted to practice a little with this `Bio.Seq.Seq` object before moving to some more complex projects.