Skip to content

poseidonchan/CodonTranslator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CodonTranslator

A conditional codon sequence optimizer based on sequence and protein prefixes.

Installation

Requirements

  • PyTorch and transformers
  • ESM is required at runtime for protein prefixing. Please note that the ESM package has conflicts with the newest transformers library, please install esm at first and then install newest transformers.
pip install -e ./

Pretrained model

The final checkpoint is available at google drive, please download the whole folder and specify the folder path for use.

Usage examples

Basic usage

from CodonTranslator import CodonTranslator

model = CodonTranslator.from_pretrained(
    model_path="/final_model", # model.safetensors, vocab.json, trainer_config.json
    device="cuda"
)

dna = model.sampling(
    species="Homo sapiens",
    protein_seq="MSEQUENCEA",
    enforce_mapping=True,
    temperature=1,
    top_k=50,
    top_p=1,
)
print(dna)

Batch inference

seqs = model.batch_inference(
    species=["Homo sapiens", "Homo sapiens"],
    protein_seqs=["MSEQUENCEA", "MSEQUENCEA"],
    enforce_mapping=True,
    temperature=1,
    top_k=50,
    top_p=1,
)
print(seqs)

Issues

Please feel free to raise issues and ask questions. The pretraining code will be released once the paper is accepted. The pretraining dataset can be shared upon user's request. Please contact me via email for the dataset sharing.

About

a decoder-only codon language model for codon optimization across all domains of life

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published