In [1]:
import os, sys
cwd = os.getcwd()
print("Current working directory:", cwd)
home_dir = os.path.dirname(os.path.dirname(cwd))
print("Home directory:", home_dir)
sys.path.append(home_dir)

Current working directory: /home/hoon/dd-agent/llm_dd/examples/testrun
Home directory: /home/hoon/dd-agent/llm_dd


In [2]:
import json
from agentD.agents import agentD
from agentD.utils import get_tool_decorated_functions, custom_serializer
from agentD.analysis.analysis_helper import save_existing_drug_smiles, extract_random_samples
from agentD.prompts.property_prediction import PREFIX, SUFFIX, FORMAT_INSTRUCTIONS

In [3]:
ret_tools = get_tool_decorated_functions(os.path.join(home_dir, "agentD/tools/retrieval.py"))
pred_tools = get_tool_decorated_functions(os.path.join(home_dir, "agentD/tools/prediction.py"))
tools = ret_tools + pred_tools
tool_names = [tool.name for tool in tools]  
tool_desc = [tool.description for tool in tools]

# Preliminary Input File Set-up

In [4]:
# read extraction.json
save_existing_drug_smiles("extraction.json", "pool")
extract_random_samples("pool/Mol2Mol_sampling.csv", "pool/Mol2Mol_sampling.csv", num_samples=50)
extract_random_samples("pool/Reinvent_sampling.csv", "pool/Reinvent_sampling.csv", num_samples=50)

Random samples saved to pool/Mol2Mol_sampling.csv
Random samples saved to pool/Reinvent_sampling.csv


Unnamed: 0,SMILES
0,CCc1cc2c(cc1-c1cnn(C)c1)C(C)(C)c1[nH]c3cc(C#N)...
1,Cc1noc(C)c1CSc1nc2sc3c(c2c(=O)n1-c1cccc(Cl)c1)...
2,CC1(c2cc(NC(=O)c3ccc(C#N)nc3)ccc2F)N=C(N)OCC1F
3,CCCCc1ccc(Cc2c(CCC)n3ncnc3n(C3CCCCC3)c2=O)nc1
4,Cc1ccc2c(Oc3nc(C4CCCCC4)ccc3C)cccc2n1
5,CC(=O)NC(CSC(=O)C1CCC2c3ccccc3N(C(=O)C3CCCCC3)...
6,Cc1nnc(Sc2cc(=O)oc3ccc(NCc4cccc([N+](=O)[O-])c...
7,CN1C(=O)CSc2ccc(NC(=O)N3CC=C(c4ncc(CO)cc4Cl)CC...
8,CC(C)(NCC(=O)N1CCCC1C#N)c1nc2ccccc2o1
9,CCC(NC(=O)CCCc1csc2nc(-c3cccs3)nn12)C(=O)NC1CC...


# Property Prediction

In [4]:
protein = "BCL-2" 
diesease = "Lymphocytic leukemia"
file_name = "Reinvent_sampling"
candidate_smiles_file = f"pool/{file_name}.csv" 
Human_prompt = f"Make predictions for the potential drug molecules for {diesease} targeting the protein {protein}. The candidate SMILES file is '{candidate_smiles_file}'."

input_data = {
            "input": Human_prompt,
            "tools": tools,
            "tool_names": tool_names,
            "tool_desc": tool_desc
        }

prediction_agent = agentD(tools, 
                          model="gpt-4o",
                          prefix=PREFIX, 
                          suffix=SUFFIX,
                          format_instructions=FORMAT_INSTRUCTIONS).agent

`embedding_function` is expected to be an Embeddings object, support for passing in a function will soon be removed.


In [5]:
response = prediction_agent.invoke(input_data)
with open(f"response/prediction_{file_name}.json", "w", encoding="utf-8") as f:
    json.dump(response, f, indent=2, default=custom_serializer)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThe task is to make predictions for potential drug molecules targeting the BCL-2 protein, which is associated with Lymphocytic leukemia. The predictions will include binding affinity and ADMET properties for the candidate molecules provided in the SMILES file 'pool/Reinvent_sampling.csv'. 

To achieve this, I will follow these steps:
1. Retrieve the UniProt IDs for the BCL-2 protein.
2. Select the most relevant UniProt ID for our target.
3. Retrieve the FASTA sequence for the selected UniProt ID.
4. Use the FASTA sequence and the SMILES file to predict binding affinity.
5. Use the SMILES file to predict ADMET properties.

Let's begin with the first step.

Action: get_uniprot_ids
Action Input: BCL-2[0m
Observation: [33;1m[1;3m[('P10415', 'Apoptosis regulator Bcl-2')][0m
Thought:[32;1m[1;3mThe UniProt ID for the BCL-2 protein is P10415, which corresponds to the "Apoptosis regulator Bcl-2". This is the relevant protein for

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
100%|██████████| 50/50 [00:08<00:00,  5.89it/s]



Observation: [38;5;200m[1;3m('Affinity predictions saved to ', 'property/affinity_Reinvent_sampling.csv')[0m
Thought:[32;1m[1;3mThe binding affinity predictions have been successfully saved to 'property/affinity_Reinvent_sampling.csv'. Now, I will proceed to predict the ADMET properties using the SMILES file 'pool/Reinvent_sampling.csv'.

Action: get_admet_predictions
Action Input: pool/Reinvent_sampling.csv[0m
Observation: [36;1m[1;3mADMET predictions saved to property/admet_Reinvent_sampling.csv[0m
Thought:[32;1m[1;3mThe ADMET predictions have been successfully saved to 'property/admet_Reinvent_sampling.csv'.

Here's your final answer:
Final Answer: The binding affinity predictions for the potential drug molecules targeting the BCL-2 protein have been saved to 'property/affinity_Reinvent_sampling.csv'. The ADMET properties for these molecules have been saved to 'property/admet_Reinvent_sampling.csv'.[0m

[1m> Finished chain.[0m
