We release our fine-tuned genome inference model on Hugging Face:
You can load it in Python as follows:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "shuaimin4588/GenoVerse"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")A single 4090 GPU is sufficient for model deployment and inference.
We release our data on HuggingFace:
👉 Datasets on HuggingFace (This includes the complete test sets for each task, as well as the training data and test data for cell phenotype prediction.)
Clone this repository and install the required dependencies:
git clone https://github.com/your-repo/GenSyntax.git
cd GenoVerse
pip install -r requirements.txt
python Plasmid_host_identification.py \
--model checkpoint \
--input-json-paths test_data/gene_task1_test_1000_format.jsonpython Gene_function_prediction.py \
--model checkpoint \
--input-json-paths test_data/gene_task2_test_500_opts.jsonpython Genome_assembly.py \
--model checkpoint \
--input-json-paths test_data/gene_task3_test_500_contig3_format.jsonpython Gene_essentiality_prediction.py \
--model checkpoint \
--input-json-paths test_data/gene_task4_test_1000_format.jsonpython minimal_genome_inference.py \
--model checkpoint \
--input-json-paths test_data/bacteria_chromosomes_9-mini.json