In [1]:
import os, sys
cwd = os.getcwd()
print("Current working directory:", cwd)
home_dir = os.path.dirname(os.path.dirname(cwd))
print("Home directory:", home_dir)
sys.path.append(home_dir)

Current working directory: /home/hoon/dd-agent/llm_dd/examples/testrun
Home directory: /home/hoon/dd-agent/llm_dd


In [2]:
import json
import pandas as pd
from agentD.agents import agentD
from agentD.utils import get_tool_decorated_functions, custom_serializer
from agentD.prompts.structure_generation import PREFIX, SUFFIX, FORMAT_INSTRUCTIONS

In [3]:
ret_tools = get_tool_decorated_functions(os.path.join(home_dir, "agentD/tools/retrieval.py"))
gen_tools = get_tool_decorated_functions(os.path.join(home_dir, "agentD/tools/generation.py"))

tools = ret_tools + gen_tools 
tool_names = [tool.name for tool in tools]  
tool_desc = [tool.description for tool in tools]

# Complex Structure Generation

In [4]:
protein = "BCL-2" 
disease = "Lymphocytic leukemia"
smiles_list = pd.read_csv("analysis_result/filtered_r2_q0.5_a6.csv")['SMILES'].to_list()

Human_prompt = f"Generate the ligand-protein complex structure for potential drug molecules in the given list {smiles_list}, targeting the {protein} protein associated with {disease}."
input_data = {
            "input": Human_prompt,
            "tools": tools,
            "tool_names": tool_names,
            "tool_desc": tool_desc
        }

structure_agent = agentD(tools, 
                         model="gpt-4o",
                         prefix=PREFIX, 
                         suffix=SUFFIX,
                         format_instructions=FORMAT_INSTRUCTIONS).agent

`embedding_function` is expected to be an Embeddings object, support for passing in a function will soon be removed.


In [5]:
response = structure_agent.invoke(input_data)
with open(f"response/generation.json", "w", encoding="utf-8") as f:
    json.dump(response, f, indent=2, default=custom_serializer)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThe task is to generate the protein-ligand complex structure for potential drug molecules targeting the BCL-2 protein, which is associated with Lymphocytic leukemia. I will follow the protocol to achieve this:

1. Retrieve the UniProt IDs for the BCL-2 protein.
2. Decide the most relevant protein for our target.
3. Retrieve the FASTA sequence of the BCL-2 protein.
4. Search for an existing drug molecule for the BCL-2 protein.
5. Retrieve the SMILES from the given list of SMILES.
6. Generate the protein-ligand complex structure for the final optimized molecule and protein.

Let's begin with the first step.

Action: get_uniprot_ids
Action Input: "BCL-2"[0m
Observation: [33;1m[1;3m[('P10415', 'Apoptosis regulator Bcl-2')][0m
Thought:[32;1m[1;3mThe UniProt ID for the BCL-2 protein is P10415, which corresponds to the Apoptosis regulator Bcl-2. This is the relevant protein for our target. Now, I will proceed to retrieve the F