A program that attempts to recode a gene by considering the relative usage of each codon in it's host and selecting a codon with the nearest relative usage in a target organism.
In typical codon optimizers, each codon of a gene of interest is converted to the "best" codon for a target organism. Yet, wild-type sequences don't always use the "best" codon in their host organism. This code adjusts for this by selecting the codon of a target organism that most closely approximates the codon's usage in a source organism.
This code was intended as a fork of Bart Nijsse's
codon harmonizer
but ended up a whole-scale rewrite.
Where referenced in academic work, you may cite this repository and may also consider referencing the manuscript discussing Nijsse's work.
pip install 'git+https://github.com/smsaladi/codon-harmonizer.git#master'
Count codon usage in source and target organisms
wget -O - https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/011/385/GCF_000011385.1_ASM1138v1/GCF_000011385.1_ASM1138v1_cds_from_genomic.fna.gz | gzip -cd > Gvio.cds.fna
wget -O - https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_cds_from_genomic.fna.gz | gzip -cd > Ecol_MG1655.cds.fna
codonharmonizer Gvio.cds.fna --write_freqs > Gvio.freq.csv
codonharmonizer Ecol_MG1655.cds.fna --write_freqs > Ecol_MG1655.freq.csv
Use these reference sets to recode genes of interest
codonharmonizer test/example_gene.fasta --target Ecol_MG1655.freq.csv --source Gvio.freq.csv
For additional information on the source and target frequencies for the
wild-type and recoded sequences, specify --stats FILENAME.json
switch.