Analysis of (homo sapein) p53 tumor suprressor gene.



*   Install BioPython
*   Import p53 gene sequence in FASTA format
*   Count number of nucleotides
*   Count number of purines and pyrimidines
*   Calculate GC percentage in sequence










In [2]:
# Install biopython
!pip install biopython

Collecting biopython
  Downloading biopython-1.79-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (2.3 MB)
[K     |████████████████████████████████| 2.3 MB 25.3 MB/s 
Installing collected packages: biopython
Successfully installed biopython-1.79


In [3]:
# import bio
import Bio
from Bio import Seq

In [4]:
# import Codon table
from Bio.Data import CodonTable

In [5]:
# print codon table
print(CodonTable.unambiguous_dna_by_name['Standard'])

Table 1 Standard, SGC0

  |  T      |  C      |  A      |  G      |
--+---------+---------+---------+---------+--
T | TTT F   | TCT S   | TAT Y   | TGT C   | T
T | TTC F   | TCC S   | TAC Y   | TGC C   | C
T | TTA L   | TCA S   | TAA Stop| TGA Stop| A
T | TTG L(s)| TCG S   | TAG Stop| TGG W   | G
--+---------+---------+---------+---------+--
C | CTT L   | CCT P   | CAT H   | CGT R   | T
C | CTC L   | CCC P   | CAC H   | CGC R   | C
C | CTA L   | CCA P   | CAA Q   | CGA R   | A
C | CTG L(s)| CCG P   | CAG Q   | CGG R   | G
--+---------+---------+---------+---------+--
A | ATT I   | ACT T   | AAT N   | AGT S   | T
A | ATC I   | ACC T   | AAC N   | AGC S   | C
A | ATA I   | ACA T   | AAA K   | AGA R   | A
A | ATG M(s)| ACG T   | AAG K   | AGG R   | G
--+---------+---------+---------+---------+--
G | GTT V   | GCT A   | GAT D   | GGT G   | T
G | GTC V   | GCC A   | GAC D   | GGC G   | C
G | GTA V   | GCA A   | GAA E   | GGA G   | A
G | GTG V   | GCG A   | GAG E   | GGG G   | G
--+---------

Import p53 gene sequence.

---


Sequence taken from NCBI Gene Database, in FASTA format



In [6]:
# Homo sapiens tumor protein p53, downloaded from NCBI Gene Databass
# download in fasta file
#load file
# import SeqIO
from Bio import SeqIO

In [7]:
for record in SeqIO.parse("p53-sequence.fasta",'fasta'):
  print(record)

ID: NG_017013.2:5001-24149
Name: NG_017013.2:5001-24149
Description: NG_017013.2:5001-24149 Homo sapiens tumor protein p53 (TP53), RefSeqGene (LRG_321) on chromosome 17
Number of features: 0
Seq('GATGGGATTGGGGTTTTCCCCTCCCATGTGCTCAAGACTGGCGCTAAAAGTTTT...GTG')


In [8]:
# put sequence in variable
p53=SeqIO.read("p53-sequence.fasta",'fasta')

In [16]:
gene_p53=p53.seq

Finding length of the gene

In [19]:
# print length of the sequence
nucleotide_length=len(gene_p53)
print(f'The length of the gene is {nucleotide_length} nucleotides.')

The length of the gene is 19149 nucleotides.


Find GC content in the sequence

In [20]:
# find GC content
gc_content = gene_p53.count('G')+gene_p53.count('C')
gc_content

9458

In [30]:
# GC percentage
gc_percentage=gc_content/nucleotide_length * 100
print("GC percentage is: %.2f%%"% gc_percentage)

GC percentage is: 49.39%


Find number of purines and pyrimidines

In [31]:
# number of purines (A and G)
purines=gene_p53.count('A')+gene_p53.count('G')
print("Number of purins is:", purines )

Number of purins is: 9782


In [32]:
# number of pyrimidines (T and C)
pyrimidines=gene_p53.count("T")+gene_p53.count("C")
print('Number of pyrimidines is:', pyrimidines)

Number of pyrimidines is: 9367


In [33]:
# Validate calculation
# gene length should equal purines + pyrimidines
nucleotide_length==purines+pyrimidines

True

mRNA and protein sequence

In [34]:
# Transcription
rna_p53=gene_p53.transcribe()
print(rna_p53)

GAUGGGAUUGGGGUUUUCCCCUCCCAUGUGCUCAAGACUGGCGCUAAAAGUUUUGAGCUUCUCAAAAGUCUAGAGCCACCGUCCAGGGAGCAGGUAGCUGCUGGGCUCCGGGGACACUUUGCGUUCGGGCUGGGAGCGUGCUUUCCACGACGGUGACACGCUUCCCUGGAUUGGGUAAGCUCCUGACUGAACUUGAUGAGUCCUCUCUGAGUCACGGGCUCUCGGCUCCGUGUAUUUUCAGCUCGGGAAAAUCGCUGGGGCUGGGGGUGGGGCAGUGGGGACUUAGCGAGUUUGGGGGUGAGUGGGAUGGAAGCUUGGCUAGAGGGAUCAUCAUAGGAGUUGCAUUGUUGGGAGACCUGGGUGUAGAUGAUGGGGAUGUUAGGACCAUCCGAACUCAAAGUUGAACGCCUAGGCAGAGGAGUGGAGCUUUGGGGAACCUUGAGCCGGCCUAAAGCGUACUUCUUUGCACAUCCACCCGGUGCUGGGCGUAGGGAAUCCCUGAAAUAAAAGAUGCACAAAGCAUUGAGGUCUGAGACUUUUGGAUCUCGAAACAUUGAGAACUCAUAGCUGUAUAUUUUAGAGCCCAUGGCAUCCUAGUGAAAACUGGGGCUCCAUUCCGAAAUGAUCAUUUGGGGGUGAUCCGGGGAGCCCAAGCUGCUAAGGUCCCACAACUUCCGGACCUUUGUCCUUCCUGGAGCGAUCUUUCCAGGCAGCCCCCGGCUCCGCUAGAUGGAGAAAAUCCAAUUGAAGGCUGUCAGUCGUGGAAGUGAGAAGUGCUAAACCAGGGGUUUGCCCGCCAGGCCGAGGAGGACCGUCGCAAUCUGAGAGGCCCGGCAGCCCUGUUAUUGUUUGGCUCCACAUUUACAUUUCUGCCUCUUGCAGCAGCAUUUCCGGUUUCUUUUUGCCGGAGCAGCUCACUAUUCACCCGAUGAGAGGGGAGGAGAGAGAGAGAAAAUGUCCUUUAGGCCGGUUCCUCUUACUUGGCAGAGGGAGGCUGCUA

In [37]:
# Translation
protein_p53=rna_p53.translate()
protein_p53

Seq('DGIGVFPSHVLKTGAKSFELLKSLEPPSREQVAAGLRGHFAFGLGACFPRR*HA...*GV')

In [38]:
len(protein_p53)

6383