# The Biopython's CodonTable module

More information on: https://biopython.org/DIST/docs/api/Bio.Data.CodonTable-module.html and https://biopython.org/docs/1.74/api/Bio.Data.CodonTable.html

The Bio.Data module features a lot of information regarding codon tables, IUPAC and protein alphabets and so on.
Other biopython features are dependant on this module, such as SeqIO's translate module.

In [5]:
from Bio.Data import CodonTable
from Bio import SeqIO

In [6]:
record = SeqIO.read("Cebus_albifrons_NC_002763.1.gb", "genbank")
print(help(record.translate))

Help on method translate in module Bio.SeqRecord:

translate(table='Standard', stop_symbol='*', to_stop=False, cds=False, gap=None, id=False, name=False, description=False, features=False, annotations=False, letter_annotations=False, dbxrefs=False) method of Bio.SeqRecord.SeqRecord instance
    Return new SeqRecord with translated sequence.
    
    This calls the record's .seq.translate() method (which describes
    the translation related arguments, like table for the genetic code),
    
    By default the new record does NOT preserve the sequence identifier,
    name, description, general annotation or database cross-references -
    these are unlikely to apply to the translated sequence.
    
    You can specify the returned record's id, name and description as
    strings, or True to keep that of the parent, or False for a default.
    
    You can specify the returned record's features with a list of
    SeqFeature objects, or False (default) to omit them.
    
    You can also s

As we can see, the translate method needs an argument (table) to convert DNA to protein sequences. These are based on the [NCBI's genetic code tables](ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt). For instance, we could get the complete CDS for the ND1 gene of ***Cebus albifrons*** (a primate) using the "Vertebrate Mitochondrial" table (table number 2):

In [7]:
for i in record.features:
    if i.qualifiers.get('gene') == ['ND1']:
        print(i.location.extract(record.seq).translate(2)) ##Translate by table number
        print()
        print(i.location.extract(record.seq).translate('Vertebrate Mitochondrial')) ##Translate by table name
        break

MFMINLLLLITPALVAMAFLTLTERKILGYMQLRKGPNIVGPYGVLQPIADAMKLFTKEPLLPITSTTTLYMIAPTLALTISLLLWSPLPMPYSLVNFNLGLLFMLATSSLAVYSTLWSGWASNSNYALIGALRAVAQTISYEVTLAIILLSTLLMSGSFNLQSLITTQEQSWLLLPSWPLTMMWFISTLAETNRAPFDLTEGESELVSGFNIEYAAGSFALFFMAEYMNIIMMNALTTTIFTATSYNMITTELYTLNFTTKTLLLTTLFLWIRTAYPRFRYDQLMYLLWKKFLPLTLALCMWYISMPMLLSGIPPQT*

MFMINLLLLITPALVAMAFLTLTERKILGYMQLRKGPNIVGPYGVLQPIADAMKLFTKEPLLPITSTTTLYMIAPTLALTISLLLWSPLPMPYSLVNFNLGLLFMLATSSLAVYSTLWSGWASNSNYALIGALRAVAQTISYEVTLAIILLSTLLMSGSFNLQSLITTQEQSWLLLPSWPLTMMWFISTLAETNRAPFDLTEGESELVSGFNIEYAAGSFALFFMAEYMNIIMMNALTTTIFTATSYNMITTELYTLNFTTKTLLLTTLFLWIRTAYPRFRYDQLMYLLWKKFLPLTLALCMWYISMPMLLSGIPPQT*


----

These tables (available for RNA or DNA, ambiguous or unambiguous) can be accessed by id or name through the CodonTable module from Bio.Data: 

In [8]:
dir(CodonTable)

['Alphabet',
 'AmbiguousCodonTable',
 'AmbiguousForwardTable',
 'CodonTable',
 'IUPAC',
 'IUPACData',
 'NCBICodonTable',
 'NCBICodonTableDNA',
 'NCBICodonTableRNA',
 'TranslationError',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'ambiguous_dna_by_id',
 'ambiguous_dna_by_name',
 'ambiguous_generic_by_id',
 'ambiguous_generic_by_name',
 'ambiguous_rna_by_id',
 'ambiguous_rna_by_name',
 'generic_by_id',
 'generic_by_name',
 'list_ambiguous_codons',
 'list_possible_proteins',
 'make_back_table',
 'register_ncbi_table',
 'standard_dna_table',
 'standard_rna_table',
 'unambiguous_dna_by_id',
 'unambiguous_dna_by_name',
 'unambiguous_rna_by_id',
 'unambiguous_rna_by_name']

All available genetic tables can be seen by name or id:

In [9]:
print("GENETIC TABLES BY NAME:")
for i in CodonTable.unambiguous_dna_by_name:
    print(i)

print()
    
print("GENETIC TABLES BY ID:")
for i in CodonTable.unambiguous_dna_by_id:
    print(i)

GENETIC TABLES BY NAME:
Standard
SGC0
Vertebrate Mitochondrial
SGC1
Yeast Mitochondrial
SGC2
Mold Mitochondrial
Protozoan Mitochondrial
Coelenterate Mitochondrial
Mycoplasma
Spiroplasma
SGC3
Invertebrate Mitochondrial
SGC4
Ciliate Nuclear
Dasycladacean Nuclear
Hexamita Nuclear
SGC5
Echinoderm Mitochondrial
Flatworm Mitochondrial
SGC8
Euplotid Nuclear
SGC9
Bacterial
Archaeal
Plant Plastid
Alternative Yeast Nuclear
Ascidian Mitochondrial
Alternative Flatworm Mitochondrial
Blepharisma Macronuclear
Chlorophycean Mitochondrial
Trematode Mitochondrial
Scenedesmus obliquus Mitochondrial
Thraustochytrium Mitochondrial
Pterobranchia Mitochondrial
Candidate Division SR1
Gracilibacteria
Pachysolen tannophilus Nuclear
Karyorelict Nuclear
Condylostoma Nuclear
Mesodinium Nuclear
Peritrich Nuclear
Blastocrithidia Nuclear
Balanophoraceae Plastid

GENETIC TABLES BY ID:
1
2
3
4
5
6
9
10
11
12
13
14
15
16
21
22
23
24
25
26
27
28
29
30
31
32


And each table can also be accessed by name or id:

In [10]:
print(CodonTable.unambiguous_dna_by_name["SGC0"])
print()
print(CodonTable.unambiguous_dna_by_id[2])

Table 1 Standard, SGC0

  |  T      |  C      |  A      |  G      |
--+---------+---------+---------+---------+--
T | TTT F   | TCT S   | TAT Y   | TGT C   | T
T | TTC F   | TCC S   | TAC Y   | TGC C   | C
T | TTA L   | TCA S   | TAA Stop| TGA Stop| A
T | TTG L(s)| TCG S   | TAG Stop| TGG W   | G
--+---------+---------+---------+---------+--
C | CTT L   | CCT P   | CAT H   | CGT R   | T
C | CTC L   | CCC P   | CAC H   | CGC R   | C
C | CTA L   | CCA P   | CAA Q   | CGA R   | A
C | CTG L(s)| CCG P   | CAG Q   | CGG R   | G
--+---------+---------+---------+---------+--
A | ATT I   | ACT T   | AAT N   | AGT S   | T
A | ATC I   | ACC T   | AAC N   | AGC S   | C
A | ATA I   | ACA T   | AAA K   | AGA R   | A
A | ATG M(s)| ACG T   | AAG K   | AGG R   | G
--+---------+---------+---------+---------+--
G | GTT V   | GCT A   | GAT D   | GGT G   | T
G | GTC V   | GCC A   | GAC D   | GGC G   | C
G | GTA V   | GCA A   | GAA E   | GGA G   | A
G | GTG V   | GCG A   | GAG E   | GGG G   | G
--+---------

**Obs:** Note that SGC0 is the same as Standard Genetic Code Table (Table 1). Likewise, the Vertebrate Mitochondrial Table is also called SGC1 (Table 2).

All tables and their numbers can be viewed [here](https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi).

----

Each table can be retrieved as a dictionary (codons as keys, aminoacids as values), useful in several scripts (e.g. for counting codons and/or calculating codon usage).

In [14]:
print(CodonTable.unambiguous_dna_by_id[2].forward_table)

{'TTT': 'F', 'TTC': 'F', 'TTA': 'L', 'TTG': 'L', 'TCT': 'S', 'TCC': 'S', 'TCA': 'S', 'TCG': 'S', 'TAT': 'Y', 'TAC': 'Y', 'TGT': 'C', 'TGC': 'C', 'TGA': 'W', 'TGG': 'W', 'CTT': 'L', 'CTC': 'L', 'CTA': 'L', 'CTG': 'L', 'CCT': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P', 'CAT': 'H', 'CAC': 'H', 'CAA': 'Q', 'CAG': 'Q', 'CGT': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R', 'ATT': 'I', 'ATC': 'I', 'ATA': 'M', 'ATG': 'M', 'ACT': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T', 'AAT': 'N', 'AAC': 'N', 'AAA': 'K', 'AAG': 'K', 'AGT': 'S', 'AGC': 'S', 'GTT': 'V', 'GTC': 'V', 'GTA': 'V', 'GTG': 'V', 'GCT': 'A', 'GCC': 'A', 'GCA': 'A', 'GCG': 'A', 'GAT': 'D', 'GAC': 'D', 'GAA': 'E', 'GAG': 'E', 'GGT': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G'}


----

Of course, the RNA tables could also be easily obtained:

In [15]:
print(CodonTable.unambiguous_rna_by_id[2].forward_table.items())

dict_items([('UUU', 'F'), ('UUC', 'F'), ('UUA', 'L'), ('UUG', 'L'), ('UCU', 'S'), ('UCC', 'S'), ('UCA', 'S'), ('UCG', 'S'), ('UAU', 'Y'), ('UAC', 'Y'), ('UGU', 'C'), ('UGC', 'C'), ('UGA', 'W'), ('UGG', 'W'), ('CUU', 'L'), ('CUC', 'L'), ('CUA', 'L'), ('CUG', 'L'), ('CCU', 'P'), ('CCC', 'P'), ('CCA', 'P'), ('CCG', 'P'), ('CAU', 'H'), ('CAC', 'H'), ('CAA', 'Q'), ('CAG', 'Q'), ('CGU', 'R'), ('CGC', 'R'), ('CGA', 'R'), ('CGG', 'R'), ('AUU', 'I'), ('AUC', 'I'), ('AUA', 'M'), ('AUG', 'M'), ('ACU', 'T'), ('ACC', 'T'), ('ACA', 'T'), ('ACG', 'T'), ('AAU', 'N'), ('AAC', 'N'), ('AAA', 'K'), ('AAG', 'K'), ('AGU', 'S'), ('AGC', 'S'), ('GUU', 'V'), ('GUC', 'V'), ('GUA', 'V'), ('GUG', 'V'), ('GCU', 'A'), ('GCC', 'A'), ('GCA', 'A'), ('GCG', 'A'), ('GAU', 'D'), ('GAC', 'D'), ('GAA', 'E'), ('GAG', 'E'), ('GGU', 'G'), ('GGC', 'G'), ('GGA', 'G'), ('GGG', 'G')])


Many different methods can be used in the CodonTable module: 

In [16]:
dir(CodonTable)

['Alphabet',
 'AmbiguousCodonTable',
 'AmbiguousForwardTable',
 'CodonTable',
 'IUPAC',
 'IUPACData',
 'NCBICodonTable',
 'NCBICodonTableDNA',
 'NCBICodonTableRNA',
 'TranslationError',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'ambiguous_dna_by_id',
 'ambiguous_dna_by_name',
 'ambiguous_generic_by_id',
 'ambiguous_generic_by_name',
 'ambiguous_rna_by_id',
 'ambiguous_rna_by_name',
 'generic_by_id',
 'generic_by_name',
 'list_ambiguous_codons',
 'list_possible_proteins',
 'make_back_table',
 'register_ncbi_table',
 'standard_dna_table',
 'standard_rna_table',
 'unambiguous_dna_by_id',
 'unambiguous_dna_by_name',
 'unambiguous_rna_by_id',
 'unambiguous_rna_by_name']

Objects created using the CodonTable are from the 'CodonTable' class:

In [17]:
mito = CodonTable.unambiguous_dna_by_id[2]
print(type(mito))

<class 'Bio.Data.CodonTable.NCBICodonTableDNA'>


The methods associated with this class expands the usability of such objects:

In [18]:
print(dir(mito))

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'back_table', 'forward_table', 'id', 'names', 'nucleotide_alphabet', 'protein_alphabet', 'start_codons', 'stop_codons']


You can, for instance, print a reverse table (with aminoacids as keys and a **single codon** as a value - naive codon mapping). Would be better if the returned dictionary had all codons.

In [19]:
mito.back_table

{'K': 'AAG',
 'N': 'AAT',
 'T': 'ACT',
 'S': 'TCT',
 'M': 'ATG',
 'I': 'ATT',
 'Q': 'CAG',
 'H': 'CAT',
 'P': 'CCT',
 'R': 'CGT',
 'L': 'TTG',
 'E': 'GAG',
 'D': 'GAT',
 'A': 'GCT',
 'G': 'GGT',
 'V': 'GTT',
 'Y': 'TAT',
 'W': 'TGG',
 'C': 'TGT',
 'F': 'TTT',
 None: 'TAA'}

The "forward_table" method, already used here, generates the default dictionary (codons as keys, aminoacids as values):

In [20]:
mito.forward_table

{'TTT': 'F',
 'TTC': 'F',
 'TTA': 'L',
 'TTG': 'L',
 'TCT': 'S',
 'TCC': 'S',
 'TCA': 'S',
 'TCG': 'S',
 'TAT': 'Y',
 'TAC': 'Y',
 'TGT': 'C',
 'TGC': 'C',
 'TGA': 'W',
 'TGG': 'W',
 'CTT': 'L',
 'CTC': 'L',
 'CTA': 'L',
 'CTG': 'L',
 'CCT': 'P',
 'CCC': 'P',
 'CCA': 'P',
 'CCG': 'P',
 'CAT': 'H',
 'CAC': 'H',
 'CAA': 'Q',
 'CAG': 'Q',
 'CGT': 'R',
 'CGC': 'R',
 'CGA': 'R',
 'CGG': 'R',
 'ATT': 'I',
 'ATC': 'I',
 'ATA': 'M',
 'ATG': 'M',
 'ACT': 'T',
 'ACC': 'T',
 'ACA': 'T',
 'ACG': 'T',
 'AAT': 'N',
 'AAC': 'N',
 'AAA': 'K',
 'AAG': 'K',
 'AGT': 'S',
 'AGC': 'S',
 'GTT': 'V',
 'GTC': 'V',
 'GTA': 'V',
 'GTG': 'V',
 'GCT': 'A',
 'GCC': 'A',
 'GCA': 'A',
 'GCG': 'A',
 'GAT': 'D',
 'GAC': 'D',
 'GAA': 'E',
 'GAG': 'E',
 'GGT': 'G',
 'GGC': 'G',
 'GGA': 'G',
 'GGG': 'G'}

The other methods allow printing the table's names, id, alphabets and start/stop codons:

In [21]:
print("This table's names are: {}".format(mito.names))
print("This table's id is: {}".format(mito.id))
print("This table's nucleotide alphabet is: {}".format(mito.nucleotide_alphabet))
print("This table's protein alphabet is: {}".format(mito.protein_alphabet))
print("This table's start codons are: {}".format(mito.start_codons))
print("This table's stop codons are: {}".format(mito.stop_codons))


This table's names are: ['Vertebrate Mitochondrial', 'SGC1']
This table's id is: 2
This table's nucleotide alphabet is: IUPACUnambiguousDNA()
This table's protein alphabet is: IUPACProtein()
This table's start codons are: ['ATT', 'ATC', 'ATA', 'ATG', 'GTG']
This table's stop codons are: ['TAA', 'TAG', 'AGA', 'AGG']


This is only a basic overview of the CodonTables module. With it, scripts making use of genetic tables can be implemented in a fast, concise and readable way.