# **How to: optimize codons with Poly package and friendzymes toolkit**

A very common task for the design of parts is codon optimization. Here we will show how you can create customized Codon Tables and how you can use this to do codon optimization of a given Coding Sequence (CDS).

---

Authors: [**@GiovannaMaklouf**](https://twitter.com/giomaklouf), [**@IsaacG**](https://twitter.com/IsaacGuerreiro9)


# Configurations for this tutorial




First let's run some important settings so you can run this tutorial successfully. 


Colab notebooks use python kernels to run each cell. However, because ***Poly*** is written in **Go language (golang)**, we need to install and configure some things in colab to make feasible run something in go lang here.

### **1. In order to start the golang environment, run the line below:**

In [None]:
# this process may take a few minutes
!add-apt-repository ppa:longsleep/golang-backports -y
!apt update
!apt install golang-go
%env GOPATH=/root/go
!go get -u github.com/gopherdata/gophernotes
!cp ~/go/bin/gophernotes /usr/bin/
!npx degit gopherdata/gophernotes/kernel \
     /usr/local/share/jupyter/kernels/gophernotes

0% [Working]            Get:1 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
Get:2 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease [3,622 B]
Ign:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Hit:4 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease
Hit:5 http://archive.ubuntu.com/ubuntu bionic InRelease
Ign:6 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release
Get:8 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
Hit:9 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release
Hit:10 http://ppa.launchpad.net/cran/libgit2/ubuntu bionic InRelease
Get:11 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages [2,367 kB]
Get:12 http://security.ubuntu.com/ubuntu bionic-security/res

### **2. Download important data to run this tutorial**

In [None]:
!rm -rf $GOPATH/pkg/mod/github.com/!open-!science-!global
!rm -rf $GOPATH/pkg/mod/cache/download/github.com/!open-!science-!global
!go get -u github.com/Open-Science-Global/poly@e3e1c61

go: downloading github.com/Open-Science-Global/poly v0.11.3


In [None]:
!wget https://raw.githubusercontent.com/Open-Science-Global/friendzymes-cookbook/main/data/codon-optimization/ecoli-k12-cdss.fasta
!wget https://raw.githubusercontent.com/Open-Science-Global/friendzymes-cookbook/main/data/codon-optimization/enzymes.fasta

--2021-10-15 12:46:40--  https://raw.githubusercontent.com/Open-Science-Global/friendzymes-toolkit/main/data/ecoli-k12-cdss.fasta
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5054174 (4.8M) [text/plain]
Saving to: ‘ecoli-k12-cdss.fasta’


2021-10-15 12:46:40 (173 MB/s) - ‘ecoli-k12-cdss.fasta’ saved [5054174/5054174]

--2021-10-15 12:46:40--  https://raw.githubusercontent.com/Open-Science-Global/friendzymes-toolkit/main/data/enzymes.fasta
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3730 (3.6K) [text/plain]
Saving to: ‘enzym

### **3. Connect Colab Notebook to your GDrive (not required)**

The previous code will download the files temporarily. If you want to download them to a folder on your drive and save it for later analysis or if you are already using this notebook to run your own files, you should connect your Google Drive to Colab and you will be able to access, read and save files permanently. 
So, if you prefer, you can do this with the code line below:



In [None]:
from google.colab import drive
drive.mount('/content/drive')

After running these steps, click on **Runtime** in the menu bar & **Change Runtime Type** to Go, if it hasn't changed yet.

This will make Colab starting use a open source go kernel called Gopher Notes.

Now we are ready to work.

## **Importing packages and pre-requisites**

In [None]:
package main

import (
	"fmt"
	"os"
	"path/filepath"
	"strings"
	"sync"

  "github.com/Open-Science-Global/poly"
	"github.com/Open-Science-Global/poly/checks"
	"github.com/Open-Science-Global/poly/io/fasta"
	"github.com/Open-Science-Global/poly/io/genbank"
	"github.com/Open-Science-Global/poly/linearfold"
	"github.com/Open-Science-Global/poly/synthesis"
	"github.com/Open-Science-Global/poly/transform"
	"github.com/Open-Science-Global/poly/transform/codon"
)

## **Preparations for codon optimization: codon table**

Generating table of codon usage (organism-specific) data is very important for the goal of the optimization.

Run the cell below to generate the function that makes your codon table, based on the organism of your choice. You don't need to change anything here unless you want to edit the function.

In [None]:
// run this to generate the function

func GenerateCodonTable(file string) codon.Table {
	// input: a list of CDSs from the target organism
  // ouput:

  fmt.Printf("Reading file %s...\n", file)
  cdsSequences := fasta.Read(file)

  // Create a single big string with all the CDSs
  var allCdssFromFile strings.Builder
  for _, cds := range cdsSequences {
    allCdssFromFile.WriteString(cds.Sequence)
  }
  codingRegions := allCdssFromFile.String()

  fmt.Printf("Creating table for %s...\n", file)
  codonTable := codon.GetCodonTable(11)

  fmt.Printf("Optimizing table for %s...\n", file)
  optimizationTable := codonTable.OptimizeTable(codingRegions)
  
  fmt.Println("Table created and optimized!")
  fmt.Printf("\n")
  return optimizationTable
  
	}

The input of this function is a list of CDSs from the target organism. 
For this tutorial, we have already downloaded this in the settings part (in "Download important data to run this tutorial") using the e. coli CDS list as an example.

In [None]:
// below replace with filename from target organism CDS list

codonTable := GenerateCodonTable("ecoli-k12-cdss.fasta")

Reading file ecoli-k12-cdss.fasta...
Creating table for ecoli-k12-cdss.fasta...
Optimizing table for ecoli-k12-cdss.fasta...
Table created and optimized!



In [None]:
codonTable

{[TTG CTG ATT ATC ATA ATG GTG] [TAA TAG TGA] [{R [{CGT 20797} {CGC 35121} {CGA 22508} {CGG 28314} {AGA 15603} {AGG 16634}]} {V [{GTT 21109} {GTC 16913} {GTA 16529} {GTG 24718}]} {E [{GAA 24516} {GAG 10173}]} {A [{GCT 23803} {GCC 24249} {GCA 27397} {GCG 39059}]} {F [{TTT 23860} {TTC 20190}]} {Y [{TAT 17276} {TAC 15339}]} {P [{CCT 14497} {CCC 12097} {CCA 19831} {CCG 29316}]} {H [{CAT 15805} {CAC 16251}]} {I [{ATT 22930} {ATC 24709} {ATA 17623}]} {L [{TTA 23007} {TTG 29359} {CTT 13512} {CTC 11962} {CTA 9545} {CTG 35433}]} {C [{TGT 18078} {TGC 33106}]} {Q [{CAA 19977} {CAG 24702}]} {D [{GAT 21572} {GAC 13527}]} {N [{AAT 19710} {AAC 26010}]} {K [{AAA 36208} {AAG 26833}]} {G [{GGT 19634} {GGC 27826} {GGA 16247} {GGG 15448}]} {S [{TCT 16992} {TCC 14752} {TCA 21666} {TCG 24158} {AGT 13050} {AGC 22780}]} {* [{TAA 15518} {TAG 6586} {TGA 32914}]} {W [{TGG 41128}]} {M [{ATG 29974}]} {T [{ACT 15165} {ACC 20931} {ACA 17599} {ACG 23965}]}]}

You can save the output of this for later use with:

In [None]:
codon.WriteCodonJSON(codonTable, "ecoli-k12-codontable.json") // rename *fileName* as you wish

> Reminder: to see if the output was saved, inside the files tab (left menu) click on content OR sample_data (depending on where you sent the file) with the right mouse button and REFRESH.



## **Optimization**

Keep in mind that the sequence you give as input for codon optimization has to be:

1.   a coding sequence (CDS);
2.   in a translated format, i.e. in amino acids;
3.   in a fasta file.


#### In case you have **CDS (DNA sequence)** you can translate it by following the steps below:

In [None]:
// Upload your CDS in a fasta file OR just copy/paste your sequence and create the variable "CDSsequence".
  
CDSsequence := fasta.Read("CDS.fasta")

In [None]:
// Poly generally makes Codon Optimization by receiving a list of protein sequences, but we actually have now CDSs
// So we should first translate CDSs. We will be using the Eubacterial genetic code table 11, you could take a look
// at this table in https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi

translatedSeq, _ := codon.Translate(CDSsequence, codon.GetCodonTable(11))
translatedSeq

#### If you already have the amino acid sequence you can just continue:

The data we are using below has been downloaded previously (in "Download important data to run this tutorial") as an example for you to use in this tutorial.

In [None]:
// Upload the list of enzymes to codon optimize
  
	enzymes := fasta.Read("enzymes.fasta")

In [None]:
// Enzymes file overview inspection (if you have more than one in your file)
fmt.Print(len(enzymes))

5


2 <nil>

In [None]:
// In the case of the example file, we have 5. We can check them one by one, as follows: 
fmt.Println(enzymes[0])
fmt.Println(enzymes[1])
fmt.Println(enzymes[2])
fmt.Println(enzymes[3])
fmt.Println(enzymes[4])

{Pfu-Sso7d MILDVDYITEEGKPVIRLFKKENGKFKIEHDRTFRPYIYALLRDDSKIEEVKKITGERHGKIVRIVDVEKVEKKFLGKPITVWKLYLEHPQDVPTIREKVREHPAVVDIFEYDIPFAKRYLIDKGLIPMEGEEELKILAFDIETLYHEGEEFGKGPIIMISYADENEAKVITWKNIDLPYVEVVSSEREMIKRFLRIIREKDPDIIVTYNGDSFDFPYLAKRAEKLGIKLTIGRDGSEPKMQRIGDMTAVEVKGRIHFDLYHVITRTINLPTYTLEAVYEAIFGKPKEKVYADEIAKAWESGENLERVAKYSMEDAKATYELGKEFLPMEIQLSRLVGQPLWDVSRSSTGNLVEWFLLRKAYERNEVAPNKPSEEEYQRRLRESYTGGFVKEPEKGLWENIVYLDFRALYPSIIITHNVSPDTLNLEGCKNYDIAPQVGHKFCKDIPGFIPSLLGHLLEERQKIKTKMKETQDPIEKILLDYRQKAIKLLANSFYGYYGYAKARWYCKECAESVTAWGRKYIELVWKELEEKFGFKVLYIDTDGLYATIPGGESEEIKKKALEFVKYINSKLPGLLELEYEGFYKRGFFVTKKRYAVIDEEGKVITRGLEIVRRDWSEIAKETQARVLETILKHGDVEEAVRIVKEVIQKLANYEIPPEKLAIYEQITRPLHEYKAIGPHVAVAKKLAAKGVKIKPGMVIGYIVLRGDGPISNRAILAEEYDPKKHKYDAEYYIENQVLPAVLRILEGFGYRKEDLRYQKTRQVGLTSWLNIKKSGTGGGGATVKFKYKGEEKEVDISKIKKVWRVGKMISFTYDEGGGKTGRGAVSEKDAPKELLQMLEKQKKGGGSGGGSENLYFQGGGGSMVSSGEDIFSGLVPILIELEGDVNGHRFSVRGEGYGDASNGKLEIKFICTTGRLPVPWPTLVTTLSYGVQCFAKYPEHMRQNDFFKSAMPDGYVQERTISFKEDGTYKTRAEVKFEGEALVNRIDL

319 <nil>

In [None]:
// run this to set the Codon Optimization function
func CodonOptimization(enzymeSequence string, codonTable codon.Table) string {
  // input

	// Optimize sequence using the protein sequence and codon table
	optimizedSequence, _ := codon.Optimize(enzymeSequence, codonTable)

	// Lets check if the codon optimization actually works by making some checks:
	// First one is if both codon sequences are different
	if optimizedSequence == enzymeSequence {
		fmt.Println("Both sequences are equal, some problem occur. They should be different because one is optimized. Checks what happened and run again.")
		os.Exit(0)
	}

	// Check if both translated sequences are equal
	protein, _ := codon.Translate(optimizedSequence, codon.GetCodonTable(11))
	if protein != enzymeSequence {
		fmt.Println("These protein sequences aren't equal, some problem occur. They should be equal because codon optimization don't change any aminoacid.")
		os.Exit(0)
	}
	return optimizedSequence
}

Now that we have the function saved, the code below uses a *foor loop* to run each of the enzymes we have in our list through `CodonOptimization()` and saves each one into the `enzymesCodonOptimized` variable.

In [None]:
var enzymesCodonOptimized []string
for _, enzyme := range enzymes {  
  enzymesCodonOptimized = append(enzymesCodonOptimized, CodonOptimization(enzyme.Sequence, codonTable))
}

fmt.Print(len(enzymesCodonOptimized))

5


2 <nil>

In [None]:
for _, sequence := range enzymesCodonOptimized {
  fmt.Println(sequence)
}

ATGATCCTGGATGTAGACTATATCACTGAAGAAGGGAAGCCTGTTATTCGCTTGTTTAAAAAGGAAAATGGTAAATTTAAAATTGAACACGATCGAACATTTCGGCCCTACATTTATGCCCTTTTACGGGACGACAGCAAGATCGAGGAAGTCAAGAAGATTACCGGCGAACGGCACGGAAAAATAGTTCGTATCGTGGACGTTGAAAAAGTAGAAAAAAAGTTTTTAGGCAAACCGATTACAGTCTGGAAATTGTATCTTGAACATCCTCAGGACGTGCCAACCATACGGGAAAAAGTGCGTGAACACCCTGCCGTAGTAGACATCTTTGAATACGATATCCCATTTGCAAAGCGTTATTTGATTGACAAAGGTTTGATACCTATGGAAGGTGAAGAGGAACTGAAAATTTTGGCGTTTGACATCGAAACGTTATACCACGAAGGAGAAGAATTCGGCAAGGGACCGATCATTATGATTTCATACGCGGATGAGAATGAAGCCAAGGTTATAACCTGGAAGAACATCGACTTACCATATGTCGAGGTTGTATCATCGGAGCGCGAAATGATAAAACGTTTTCTGCGAATCATTCGGGAAAAAGATCCCGATATCATTGTGACGTATAACGGTGATTCCTTTGACTTTCCATATTTGGCTAAACGAGCGGAAAAATTGGGAATCAAACTTACTATAGGCCGTGATGGTAGTGAACCAAAGATGCAACGAATAGGAGATATGACGGCCGTTGAAGTGAAAGGCAGAATTCATTTCGACTTATACCATGTCATAACCAGGACGATTAACTTGCCTACATACACGCTGGAAGCGGTTTACGAGGCAATCTTCGGGAAACCAAAAGAAAAGGTATACGCCGATGAGATCGCGAAAGCCTGGGAGTCAGGCGAGAACCTGGAACGTGTCGCAAAGTACAGTATGGAAGACGCAAAGGCCACGTACGAATTAGGAAAAGAATTCTTACCGATGGAAATCCAGCTGT

Finally, we can **save** our optimized CDSs (in this case, to e. coli) in a fasta file. 
You can then use these sequences as you like - update your complete sequence in Benchling, Snapgene, etc.

In [None]:
// run this to set the function to export your optimized sequence in a fasta file
// input: the output from CodonOptimization()
// output: one fasta file with all optimized sequences

func exportSequencesAsFasta(sequences []string, enzymes []fasta.Fasta, outputFilename string) {
  var fastas []fasta.Fasta
  for index, sequence := range sequences {  
    data := fasta.Fasta{enzymes[index].Name, sequence}
    fmt.Println(data)
    fastas = append(fastas, data)
  }
  fasta.Write(fastas, outputFilename)
}

In [None]:
exportSequencesAsFasta(enzymesCodonOptimized, enzymes, "OptmizedSequences.fasta")

{Pfu-Sso7d ATGATCCTGGATGTAGACTATATCACTGAAGAAGGGAAGCCTGTTATTCGCTTGTTTAAAAAGGAAAATGGTAAATTTAAAATTGAACACGATCGAACATTTCGGCCCTACATTTATGCCCTTTTACGGGACGACAGCAAGATCGAGGAAGTCAAGAAGATTACCGGCGAACGGCACGGAAAAATAGTTCGTATCGTGGACGTTGAAAAAGTAGAAAAAAAGTTTTTAGGCAAACCGATTACAGTCTGGAAATTGTATCTTGAACATCCTCAGGACGTGCCAACCATACGGGAAAAAGTGCGTGAACACCCTGCCGTAGTAGACATCTTTGAATACGATATCCCATTTGCAAAGCGTTATTTGATTGACAAAGGTTTGATACCTATGGAAGGTGAAGAGGAACTGAAAATTTTGGCGTTTGACATCGAAACGTTATACCACGAAGGAGAAGAATTCGGCAAGGGACCGATCATTATGATTTCATACGCGGATGAGAATGAAGCCAAGGTTATAACCTGGAAGAACATCGACTTACCATATGTCGAGGTTGTATCATCGGAGCGCGAAATGATAAAACGTTTTCTGCGAATCATTCGGGAAAAAGATCCCGATATCATTGTGACGTATAACGGTGATTCCTTTGACTTTCCATATTTGGCTAAACGAGCGGAAAAATTGGGAATCAAACTTACTATAGGCCGTGATGGTAGTGAACCAAAGATGCAACGAATAGGAGATATGACGGCCGTTGAAGTGAAAGGCAGAATTCATTTCGACTTATACCATGTCATAACCAGGACGATTAACTTGCCTACATACACGCTGGAAGCGGTTTACGAGGCAATCTTCGGGAAACCAAAAGAAAAGGTATACGCCGATGAGATCGCGAAAGCCTGGGAGTCAGGCGAGAACCTGGAACGTGTCGCAAAGTACAGTATGGAAGACGCAAAGGCCACGTACGAATTAGGAAAAGAATTCTTACCGATGGA