## Example 5 - WDR5 Ligand Inverse Design

In this example, we take a look at identifying molecules with high binding affinity against WDR5 protein tagert. WDR5 is a key protein involved in chromatin remodeling, serving as a core component of the COMPASS complex, which regulates histone methylation and plays a crucial role in gene expression, making it a significant target for cancer and epigenetic therapies.
We set the inverse design criteria to design potential molecules with a docking score of -9 ± 0.5 kcal/mol. 

In [1]:
import warnings
warnings.filterwarnings('ignore')
from langchain._api import LangChainDeprecationWarning
warnings.simplefilter("ignore", category=LangChainDeprecationWarning)
import sys
import dziner
import os

os.environ["SERPER_API_KEY"] = "YOUR_SERPER_API_KEY"
os.environ["ANTHROPIC_API_KEY"] = "YOUR_ANTHROPIC_API_KEY"

### Defining Agent Tools

The ReAct agent can be augmented with many different tools depneding on your problem. Here, we load Google search tool with `SERPER API` and a docking score predictor with [GNINA](https://github.com/gnina/gnina). GNINA can be downloaded as a linux binary and made executable (see this [notebook](https://colab.research.google.com/drive/1QYo5QLUE80N_G28PlpYs6OKGddhhd931?usp=sharing) for installation). The live Google search tool allows the model to find information on how to interact with WDR5 binding, and guide the possible adaptation to the core structure of the molecule. You also can define your own cutom tools for the agent (check other examples [here](https://mehradans92.github.io/dZiner/)) 

In [12]:
from dziner.tools import search
print(f'Search tool:\n{search.description}\n')

from langchain.agents import tool
from rdkit import Chem
from rdkit.Chem import AllChem
import subprocess

@tool
def predictGNINA_docking(smiles):
    '''
    This tool predicts the docking score to WDR5 for a SMILES molecule. Lower docking score means a larger binding affinity.
    This model is based on GNINA from this paper: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00522-2
    '''
    mol = Chem.MolFromSmiles(smiles)
    mol = Chem.AddHs(mol)
    
    # Generate 3D coordinates
    AllChem.EmbedMolecule(mol)
    AllChem.UFFOptimizeMolecule(mol)
    
    # Save to PDB file (required format for GNINA)
    output_file = "molecule.pdb"
    Chem.MolToPDBFile(mol, output_file)

     # Command to run GNINA
    command = "../dziner/surrogates/gnina -r ../dziner/surrogates/Binding/WDR5_target.pdbqt -l molecule.pdb --autobox_ligand ../dziner/surrogates/Binding/WDR5_target.pdbqt --seed 0"
    
    # Run the command and capture the output
    result = subprocess.getoutput(command)
    
    # Initialize a list to store affinity values
    affinities = []
    
    # Loop through the output lines to find affinity scores
    for line in result.splitlines():
        # Skip lines that don't start with a mode index (i.e., a number)
        if line.strip() and line.split()[0].isdigit():
            # Split the line and extract the affinity (2nd column)
            try:
                affinity = float(line.split()[1])
                affinities.append(affinity)
            except ValueError:
                # Skip lines that don't have a valid number in the second column
                continue
    
    # Find the minimum affinity score
    if affinities:
        best_docking_score = min(affinities)
        # print(f"Best docking score: {best_docking_score}")
    else:
        # print("No valid docking scores found.")
        best_docking_score = None
    return best_docking_score

print(f'WDR5 docking eval tool:\n{predictGNINA_docking.description}')

Search tool:
This tool performs a search on Google using SerpAPI and returns relevant information for the query.
It is useful for retrieving information from the web or when you need to look up current data.

WDR5 docking eval tool:
This tool predicts the docking score to WDR5 for a SMILES molecule. Lower docking score means a larger binding affinity.
This model is based on GNINA from this paper: https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00522-2


### Defining custom system messages for the agent

In [13]:
FORMAT_INSTRUCTIONS = """Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [{tool_names}]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
When you have a response to say to the Human, or if you do not need to use a tool, you MUST use the format:
'''
Thought: Here's your final answer:
Final Answer: [your response here]
'''

Use the exact sequebce without any "\n" or any extra "()".
"""

SUFFIX = """You should always follow these steps:
    1. Lookup and summarize design guidelines on what functional groups can help with better binding.
    Make sure to add citations (paper DOI) for design guidelines if you use them, in your final response.
    Your final response should also contain the source for the tools used from their summary in description in {tool_desc}.
    2. Evaluate the docking of the initial molecule you start with.
    3. Start making changes to the molecule based on guidelines you found in step 1. Try to use different changes at each iteration.
    4. Evaluate the docking score for the new molecule
     If the docking score does not change properly, revert change to the best previous sequence and try something else by
    redoing steps 3 and 4. Iterate until you have 2 new candidates that exactly follow the acceptance docking score criteria and then stop.

    Start by describing the problem: \n\nBegin!\n \n\nQuestion: {input}
Thought:{agent_scratchpad}\n"""

We use Claude 3.5 Sonnet to drive our agent.

In [14]:
from dziner.agents import dZiner
from langchain_anthropic import ChatAnthropic

tools = [predictGNINA_docking, search]
tool_names = [tool.name for tool in tools]  
tool_desc = [tool.description for tool in tools]

agent_model = ChatAnthropic(model="claude-3-5-sonnet-20240620", api_key=os.environ["ANTHROPIC_API_KEY"], temperature=0.2, max_tokens=8192)
agent = dZiner(tools, property="Binding affinity",
               model=agent_model, verbose=True, n_design_iterations=2,
              suffix=SUFFIX, format_instructions=FORMAT_INSTRUCTIONS).agent

In [16]:
Human_prompt = f"Suggest two molecules (SMILES) with docking score of docking score of -9±0.5 against WDR5 protein target.\
                Make sure both candidates do not violate this docking score acceptance criteria"

input_data = {
            "input": Human_prompt,
            "tools": tools,
            "tool_names": tool_names,
            "tool_desc": tool_desc
        }

result = agent.invoke(input_data)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: To begin this task, I need to first research the design guidelines for molecules that bind well to WDR5. Then, I'll evaluate an initial molecule and make iterative changes to optimize its binding affinity.

Action: search
Action Input: WDR5 inhibitor design guidelines
[0m
Observation: [33;1m[1;3mWDR5 is a critical chromatin cofactor of MYC. WDR5 interacts with MYC through the WBM pocket and is hypothesized to anchor MYC to chromatin ... Missing: guidelines | Show results with:guidelines. WDR5 inhibitors were originally conceived as epigenetic modulators, proposed to inhibit cancer cells by reversing oncogenic patterns of histone H3 lysine 4 ... WDR5 small molecule inhibition reduces the interaction between WDR5 and WRAD complex members and broadly diminishes histone 3 lysine 4 trimethylation (H3K4me3). We have discovered WDR5 WIN-site inhibitors that feature bicyclic heteroaryl P7 units through structure-based des