**General Template for Prompt**

The general prompt is provided below which is used for zero-shot training for all models below. All API keys can be stored in the left secrets tab for direct access.

Represented Models: GPT-4, Claude-3 Opus, MistralAI, Code Llama, Facebook Llama-2, Galactica, ChemLLM,

In [None]:
general_template = "You are an expert chemist. Given the SMILES representation of reactants and reagents, your task is to predict the potential product using your chemical reaction knowledge."
task_specific_template = (
    "The input contains both reactants and reagents, and different reactants and reagents are separated by \".\". "
    "Your reply should contain only the SMILES representation of the predicted product and no other text. "
    "Your reply must be valid and chemically reasonable."
)
icl_example = (
    "Reactants and reagents SMILES: C1COC1.CCN(CC)CC.CS(=O)(=O)Cl.CS(C)=O.N[C@@H]C2=CC=C(CN3C=C(CO)C(C(F)(F)F)=N3)C=C2C1\n"
    "Product SMILES: CS(=O)(=O)N[C@@H]1CC2=CC=C(CN3C=C(CO)C(C(F)(F)F)=N3)C=C2C1"
)
question = "Reactants and reagents SMILES: CCN.CN1C=CC=C1C=O\nProduct SMILES:"

prompt = f"{general_template}\n\n{task_specific_template}\n\nICL Example:\n{icl_example}\n\n{question}"

In [None]:
from google.colab import drive
drive.mount('/content/drive')
# for access to api keys stored in secrets tab
import secrets

**GPT-4 Zero-Shot**

In [None]:
!pip install openai
import os
from openai import OpenAI

In [None]:
client = OpenAI(
    api_key=secrets.OPENAI_API_KEY,
)

In [None]:
response = openai.ChatCompletion.create(
    model="gpt-4",
    messages=[{"role": "user", "content": prompt}]
    )

In [None]:
print(response)
print(response['choices'][0]['message']['content'])

**Claude-3 Opus Zero-Shot**

In [None]:
!pip install anthropic
import os
from anthropic import Anthropic

In [None]:

client = Anthropic(
    api_key=secrets.ANTHROPIC_API_KEY,
)

In [None]:
message = client.messages.create(
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": "Hello, Claude",
        },
        {
            "role": "user",
            "content": prompt,
        },

    ],
    model="claude-3-opus-20240229",
)
print(message.content)

**MistralAI Zero-Shot**

In [None]:
! pip install mistralai
from mistralai import Mistral
api_key = secrets.MISTRAL_API_KEY

In [None]:
model = "mistral-large-latest"

client = Mistral(api_key=api_key)

chat_response = client.chat.complete(
    model=model,
    messages=[{"role":"user", "content":prompt}]
)

print(chat_response.choices[0].message.content)

**Code Llama Zero-Shot**

In [None]:
!pip install replicate
import replicate
export REPLICATE_API_TOKEN = secrets.REPLICATE_API_KEY

In [None]:
output = replicate.run(
    "meta/codellama-70b-instruct:a279116fe47a0f65701a8817188601e2fe8f4b9e04a518789655ea7b995851bf",
    input={
        "prompt": prompt,
      }
)
print("".join(output))

**Facebook Llama-2 Zero-Shot**

In [None]:
from huggingface_hub import InferenceClient

client = InferenceClient(
    "meta-llama/Llama-2-7b-chat-hf",
    token=secrets.LLAMA2_API_KEY, # THIS NEEDS TO BE GENERATED IN HUGGINGFACE AT THIS LINK https://huggingface.co/meta-llama/Llama-2-7b-chat-hf
)

In [None]:
for message in client.chat_completion(
	messages=[{"role": "user", "content": prompt}],
	max_tokens=500,
	stream=True,
):
    print(message.choices[0].delta.content, end="")

**Galactica Zero-Shot**

In [None]:
!pip install galai
import galai as gal
from galai.notebook_utils import *

In [None]:
model = gal.load_model("huge", parallelize=True)

In [None]:
input_prompt = prompt

reference = model.generate_reference(input_prompt)
display_markdown(f"**Prompt**: {input_prompt}\n\n**Reference**: {reference}")

**ChemLLM Zero-Shot**

In [None]:
# Website Available for Zero-Shot: https://chemllm.org/
# https://huggingface.co/AI4Chem/ChemLLM-7B-Chat
!pip install transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch

model_name_or_id = "AI4Chem/ChemLLM-7B-Chat"

model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.float16, device_map="auto",trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id,trust_remote_code=True)

input_prompt = prompt

inputs = tokenizer(input_prompt, return_tensors="pt").to("cuda")

In [None]:
generation_config = GenerationConfig(
    do_sample=True,
    top_k=1,
    temperature=0.9,
    max_new_tokens=500,
    repetition_penalty=1.5,
    pad_token_id=tokenizer.eos_token_id
)

outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

**Molonist LLM**

In [None]:
# Need to run locally from computer

# STEP 1
# >> git clone https://github.com/zjunlp/Mol-Instruction
# >> cd demo

# STEP 2
# ! pip install gradio

# STEP 3
# SPECIFY LOCAL PARAMS IN generate.sh FILE

# >> CUDA_VISIBLE_DEVICES=0 python generate.py \
#     --CLI False\
#     --protein False\
#     --load_8bit \
#     --base_model $BASE_MODEL_PATH \
#     --share_gradio True\
#     --lora_weights $FINETUNED_MODEL_PATH \

# STEP 4
# SET FINETUNED_MODEL_PATH to 'zjunlp/llama-molinst-molecule-7b'

# STEP 5
# >> sh generate.sh

# STEP 6
# >> python generate.py --CLI True

# Command-Line based interaction will now be available with model

**LLASMOL-Galactica, LLASMOL-LLAMA2, LLASMOL-CODE_LLAMA, LLASMOL-MISTRAL**
Needs to be run locally from computer

In [None]:
# clone https://github.com/OSU-NLP-Group/LLM4Chem to local repo
# CD into folder

# from generation import LlaSMolGeneration

# generator = LlaSMolGeneration('osunlp/LlaSMol-Mistral-7B')
# generator.generate('Can you tell me the IUPAC name of <SMILES> C1CCOC1 </SMILES> ?')

# Above, the model can be switched out among these 4 following models: "LlaSMol-Mistral-7B", "LlaSMol-CodeLlama-7B", "LlaSMol-Llama2-7B", "LlaSMol-Galactica-6.7B"
# Smiles representation should be wrapped as following <SMILES> ... </SMILES>

**STOUT SOTA Model**

In [None]:
!pip install STOUT-pypi
!conda create --name STOUT python=3.10
!conda activate STOUT
!conda install -c decimer stout-pypi
!pip install git+https://github.com/Kohulan/Smiles-TO-iUpac-Translator.git

In [None]:
from STOUT import translate_forward, translate_reverse

# SMILES to IUPAC name translation
SMILES = "CN1C=NC2=C1C(=O)N(C(=O)N2C)C"
IUPAC_name = translate_forward(SMILES)
print(f"🧪 IUPAC name of {SMILES} is: {IUPAC_name}")

# IUPAC name to SMILES translation
IUPAC_name = "1,3,7-trimethylpurine-2,6-dione"
SMILES = translate_reverse(IUPAC_name)
print(f"🔬 SMILES of {IUPAC_name} is: {SMILES}")

**Uni-Mol SOTA**

In [None]:
!pip install unimol_tools
!pip install huggingface_hub

## Dependencies installation
!pip install -r requirements.txt

## Clone repository
!git clone https://github.com/deepmodeling/Uni-Mol.git
!cd Uni-Mol/unimol_tools

## Install
!python setup.py install

In [None]:
!export HF_ENDPOINT=https://hf-mirror.com
!export UNIMOL_WEIGHT_DIR=/path/to/your/weights/dir/

In [None]:
from unimol_tools import MolTrain, MolPredict
clf = MolTrain(task='classification',
                data_type='molecule',
                epochs=10,
                batch_size=16,
                metrics='auc',
                )
pred = clf.fit(data = data)
# currently support data with smiles based csv/txt file, and
# custom dict of {'atoms':[['C','C],['C','H','O']], 'coordinates':[coordinates_1,coordinates_2]} #specify the details of the molecule here so that it can be input into the model

clf = MolPredict(load_model='../exp')
res = clf.predict(data = data)

**MolT5 SOTA**

In [None]:
from transformers import T5Tokenizer, T5ForConditionalGeneration

tokenizer = T5Tokenizer.from_pretrained("laituan245/molt5-large-smiles2caption", model_max_length=512)
model = T5ForConditionalGeneration.from_pretrained('laituan245/molt5-large-smiles2caption')

input_text = 'C1=CC2=C(C(=C1)[O-])NC(=CC2=O)C(=O)O'
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

outputs = model.generate(input_ids, num_beams=5, max_length=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))