<a href="https://colab.research.google.com/github/frank-morales2020/MLxDL/blob/main/Full_In_Silico_Drug_Discovery_Pipeline_with_Grok_4_Agent_(Simulated).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

 "Full In Silico Drug Discovery Pipeline with Grok 4 Agent (Automatic)"

**In silico ADMET** refers to the use of computational (or "silico") methods to predict the **A**bsorption, **D**istribution, **M**etabolism, **E**xcretion, and **T**oxicity of chemical compounds, particularly drug candidates.

Here's a breakdown:

* **In silico:** This term means "performed on computer or via computer simulation." It contrasts with *in vitro* (in a test tube) and *in vivo* (in a living organism).
* **ADMET:** These are five crucial pharmacokinetic and toxicological properties that determine how a drug behaves in the body and its potential for harm:
    * **Absorption:** How well a drug enters the bloodstream from its site of administration (e.g., gut, skin).
    * **Distribution:** How the drug spreads throughout the body's tissues and organs once absorbed.
    * **Metabolism:** How the body chemically modifies the drug, often breaking it down.
    * **Excretion:** How the drug and its metabolites are eliminated from the body (e.g., via urine, feces).
    * **Toxicity:** The potential for a drug to cause adverse effects or harm to the body.

**Purpose in Drug Discovery:**
The primary goal of *in silico* ADMET prediction is to screen potential drug candidates early in the discovery process. By predicting these properties computationally, researchers can:
* **Filter out undesirable compounds:** Identify molecules likely to have poor bioavailability, rapid metabolism, unfavorable distribution, or significant toxicity *before* costly and time-consuming laboratory experiments or clinical trials.
* **Prioritize promising candidates:** Focus resources on compounds with more favorable ADMET profiles.
* **Guide molecular design:** Inform medicinal chemists on how to modify chemical structures to improve their ADMET properties.
* **Reduce costs and accelerate timelines:** By minimizing late-stage failures due to ADMET issues.

In the Canvas code, the `predict_admet_tool` function simulates this process, providing mock predictions for various ADMET characteristics like "Human Oral Bioavailability," "Hepatotoxicity," and "CYP2D6 Inhibition." While the actual predictions in the demo are random, they represent the types of outputs a real *in silico* ADMET model would generate.

In [None]:
!pip install langchain_core -q
!pip install langchain -q
!pip install xai-sdk -q
!pip install langgraph -q

!pip install langchain_xai -q

!pip install -U langchain-community -q

In [5]:
import random
import os
from typing import List, Dict, Any, Optional, Union

# --- Imports for LangChain and XAI SDK (if available) ---
# These imports are essential for a real Grok 4 integration.
# If you don't have these installed, the agent will run in a simulated mode.
try:
    from pydantic import BaseModel, Field
    from langchain_core.tools import tool
    from langchain_core.messages import HumanMessage, AIMessage, ToolMessage
    from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
    from langchain.agents import AgentExecutor
    from langchain_core.runnables import RunnablePassthrough
    from langchain.agents.output_parsers.tools import ToolsAgentOutputParser
    from langchain.agents.format_scratchpad.tools import format_to_tool_messages

    # Attempt to import ChatXAI from langchain_xai
    # This will only work if xai-sdk and langchain-xai are installed.
    try:
        from xai_sdk import Client
        from langchain_xai import ChatXAI
        XAI_SDK_AVAILABLE = True
    except ImportError:
        print("WARNING: 'xai-sdk' or 'langchain-xai' not found. Grok 4 integration will be simulated.")
        XAI_SDK_AVAILABLE = False
        # Define a mock ChatXAI for demonstration without actual SDK
        class MockChatXAI:
            def __init__(self, model: str, xai_api_key: str):
                self.model = model
                self.xai_api_key = xai_api_key
                print(f"MockChatXAI initialized for model: {self.model}")

            def invoke(self, messages: List[Union[HumanMessage, AIMessage, ToolMessage]]) -> AIMessage:
                # Simple mock logic: just return a predefined response or echo
                last_user_message = next((m.content for m in reversed(messages) if isinstance(m, HumanMessage)), "No user message")
                print(f" [Mock Grok 4] Received: '{last_user_message}'")
                # Simulate the agent's thought process and tool call decision
                if "simulate synthesis" in last_user_message.lower():
                    tool_call_str = '{"tool": "simulate_synthesis", "tool_input": {"chemical_structure": "CCO", "scale_mg": 100.0, "purity_target_percent": 95.0}}'
                elif "identify targets" in last_user_message.lower():
                    tool_call_str = '{"tool": "identify_targets", "tool_input": {"disease_name": "Alzheimer\'s disease", "organism": "Homo sapiens"}}'
                elif "run assay" in last_user_message.lower():
                    tool_call_str = '{"tool": "run_assay", "tool_input": {"target_protein_id": "Protein_42", "molecule_structure": "CC(=O)Nc1ccccc1", "assay_type": "binding assay", "concentration_nM": 1000.0}}'
                elif "predict admet" in last_user_message.lower() or "predict toxicity" in last_user_message.lower():
                    tool_call_str = '{"tool": "predict_admet", "tool_input": {"smiles_string": "CC(=O)Oc1ccccc1C(=O)O"}}'
                else:
                    tool_call_str = f'{{"tool": "unknown", "tool_input": {{"query": "{last_user_message}"}}}}'

                return AIMessage(content="", tool_calls=[{"name": "mock_tool", "args": {}, "id": "mock_id"}], additional_kwargs={"tool_code": tool_call_str})

        ChatXAI = MockChatXAI # Assign the mock class if SDK is not available

except ImportError:
    print("ERROR: 'pydantic' or 'langchain_core' not found. Cannot run agent-based demo.")
    print("Please install required packages: pip install pydantic langchain-core langchain-agents")
    # Exit or provide a very basic fallback if essential libraries are missing
    exit() # Exit if core LangChain components are missing

# --- Configuration for XAI_API_KEY ---
# This mirrors the setup in your reference notebook.
XAI_API_KEY = None
try:
    # Attempt to get API key from Google Colab userdata (if running in Colab)
    from google.colab import userdata
    XAI_API_KEY = userdata.get('XAI_KEY')
    if XAI_API_KEY:
        print("XAI_KEY found in Colab secrets.")
    else:
        print("WARNING: XAI_KEY not found in Colab secrets.")
except ImportError:
    # Not in Colab, try environment variable
    XAI_API_KEY = os.environ.get('XAI_KEY')
    if XAI_API_KEY is None:
        print("WARNING: XAI_KEY not found in environment variables. Grok 4 integration will be simulated.")
    else:
        print("XAI_KEY found in environment variables.")


# --- Define the Drug Discovery-specific Tools ---
# These functions are now decorated with @tool and use Pydantic for input validation,
# making them compatible with LangChain's tool calling.

class MoleculeSynthesisToolInput(BaseModel):
    chemical_structure: str = Field(description="SMILES or InChI string representing the molecule to synthesize.")
    scale_mg: float = Field(description="Desired synthesis scale in milligrams.")
    purity_target_percent: float = Field(description="Target purity percentage (e.g., 98.0).")

@tool("simulate_synthesis", args_schema=MoleculeSynthesisToolInput)
def simulate_synthesis_tool(chemical_structure: str, scale_mg: float, purity_target_percent: float) -> dict:
    """
    Simulates the synthesis process for a given molecule, providing estimated yield, purity, and time.
    """
    print(f" [Tool Call] Simulating synthesis for {chemical_structure} at {scale_mg} mg with {purity_target_percent}% target purity.")
    yield_percent = random.uniform(50.0, 95.0)
    actual_purity_percent = random.uniform(purity_target_percent * 0.9, purity_target_percent * 1.05)
    synthesis_time_hours = random.uniform(12, 72)
    success = actual_purity_percent >= purity_target_percent * 0.98
    notes = "Simulated synthesis. Actual yield and purity may vary. Consider experimental validation."
    if not success:
        notes = "Simulation indicates potential issues achieving target purity or yield."
    return {
        "tool_name": "MoleculeSynthesis",
        "input_structure": chemical_structure,
        "scale_mg": round(scale_mg, 2),
        "estimated_yield_percent": round(yield_percent, 2),
        "estimated_purity_percent": round(actual_purity_percent, 2),
        "estimated_synthesis_time_hours": round(synthesis_time_hours, 2),
        "synthesis_successful": success,
        "notes": notes
    }

class TargetIdentificationToolInput(BaseModel):
    disease_name: str = Field(description="Name of the disease or biological condition.")
    organism: str = Field(default="Homo sapiens", description="Target organism (e.g., 'Homo sapiens').")
    keywords: Optional[List[str]] = Field(None, description="Optional keywords to refine target identification (e.g., 'kinase', 'receptor').")

@tool("identify_targets", args_schema=TargetIdentificationToolInput)
def identify_targets_tool(disease_name: str, organism: str = "Homo sapiens", keywords: List[str] = None) -> dict:
    """
    Identifies potential biological targets for a given disease in a specified organism.
    """
    print(f" [Tool Call] Identifying targets for '{disease_name}' in '{organism}' with keywords: {keywords}")
    potential_targets = []
    if "alzheimer" in disease_name.lower():
        potential_targets.extend(["GSK3B", "CDK5", "CHRNA7", "GRIN2B"])
    elif "cancer" in disease_name.lower():
        potential_targets.extend(["EGFR", "HER2", "VEGFR", "PD-1"])
    else:
        potential_targets.extend([f"Protein_{random.randint(1, 100)}", f"Enzyme_{random.randint(1, 50)}"])
    if keywords:
        potential_targets.extend([f"{kw.capitalize()}_Target" for kw in keywords if kw])
    random.shuffle(potential_targets)
    potential_targets = list(set(potential_targets))[:random.randint(2, 5)]
    notes = "Identified potential targets based on simulated data. Further validation recommended."
    if not potential_targets:
        notes = "Could not identify specific targets for the given criteria in this simulation."
    return {
        "tool_name": "TargetIdentification",
        "disease": disease_name,
        "organism": organism,
        "identified_targets": potential_targets,
        "notes": notes
    }

class AssaySimulationToolInput(BaseModel):
    target_protein_id: str = Field(description="Identifier for the target protein (e.g., 'Protein_42').")
    molecule_structure: str = Field(description="SMILES or InChI string of the molecule to test.")
    assay_type: str = Field(description="Type of assay to simulate (e.g., 'binding assay', 'inhibition assay', 'viability assay').")
    concentration_nM: Optional[float] = Field(None, description="Optional concentration in nM for the assay.")

@tool("run_assay", args_schema=AssaySimulationToolInput)
def run_assay_tool(target_protein_id: str, molecule_structure: str, assay_type: str, concentration_nM: float = None) -> dict:
    """
    Simulates a biological assay to test molecule activity against a target protein.
    """
    print(f" [Tool Call] Simulating {assay_type} assay for {molecule_structure} against {target_protein_id}.")
    activity_score = random.uniform(0.1, 100.0)
    inhibition_percent = None
    viability_percent = None
    if "inhibition" in assay_type.lower():
        inhibition_percent = random.uniform(10.0, 99.0)
    if "viability" in assay_type.lower() or "cytotoxicity" in assay_type.lower():
        viability_percent = random.uniform(50.0, 110.0)
    interpretation = f"Simulated {assay_type} activity: {activity_score:.2f}."
    if inhibition_percent is not None:
        interpretation += f" Estimated inhibition: {inhibition_percent:.2f}%."
    if viability_percent is not None:
        interpretation += f" Estimated cell viability: {viability_percent:.2f}%."
    return {
        "tool_name": "AssaySimulation",
        "target_protein_id": target_protein_id,
        "molecule_structure": molecule_structure,
        "assay_type": assay_type,
        "simulation_results": {
            "activity_score": round(activity_score, 2),
            "inhibition_percent": round(inhibition_percent, 2) if inhibition_percent is not None else None,
            "viability_percent": round(viability_percent, 2) if viability_percent is not None else None,
        },
        "concentration_nM": concentration_nM,
        "interpretation": interpretation,
        "notes": "Assay simulation results. Validate with experimental data."
    }

class ADMETPredictionToolInput(BaseModel):
    smiles_string: str = Field(description="SMILES string of the chemical compound for ADMET prediction.")

@tool("predict_admet", args_schema=ADMETPredictionToolInput)
def predict_admet_tool(smiles_string: str) -> dict:
    """
    Simulates in silico ADMET (Absorption, Distribution, Metabolism, Excretion, Toxicity)
    prediction for a given SMILES string.
    """
    print(f" [Tool Call] Simulating ADMET predictions for SMILES: {smiles_string}")
    admet_results = {
        "Absorption": {
            "Human Oral Bioavailability": f"{random.uniform(0.5, 0.95):.2f}",
            "Caco-2 Permeability": f"{random.uniform(10, 100):.2f} nm/s",
            "Blood-Brain Barrier Penetration": "High" if random.random() > 0.5 else "Low",
        },
        "Distribution": {
            "Plasma Protein Binding": f"{random.uniform(70, 99):.2f}%",
            "Volume of Distribution": f"{random.uniform(0.5, 5.0):.2f} L/kg",
        },
        "Metabolism": {
            "CYP2D6 Inhibition": "Inhibitor" if random.random() > 0.7 else "Non-inhibitor",
            "CYP3A4 Inhibition": "Inhibitor" if random.random() > 0.6 else "Non-inhibitor",
            "Metabolic Stability (Half-life)": f"{random.uniform(1, 24):.1f} hours",
        },
        "Excretion": {
            "Renal Clearance": f"{random.uniform(0.1, 10.0):.2f} mL/min/kg",
            "Efflux by P-glycoprotein": "Substrate" if random.random() > 0.4 else "Non-substrate",
        },
        "Toxicity": {
            "Hepatotoxicity": "Toxic" if random.random() > 0.8 else "Non-toxic",
            "hERG Inhibition (Cardiotoxicity)": "Positive" if random.random() > 0.75 else "Negative",
            "Ames Test (Mutagenicity)": "Mutagenic" if random.random() > 0.9 else "Non-mutagenic",
        }
    }
    drug_likeness_score = 0
    if float(admet_results["Absorption"]["Human Oral Bioavailability"]) > 0.7:
        drug_likeness_score += 1
    if "Non-toxic" in admet_results["Toxicity"]["Hepatotoxicity"]:
        drug_likeness_score += 1
    if "Non-inhibitor" in admet_results["Metabolism"]["CYP2D6 Inhibition"]:
        drug_likeness_score += 1
    admet_results["Overall Drug-likeness Score"] = f"{drug_likeness_score}/3 (higher is better)"
    admet_results["tool_name"] = "ADMETPrediction"
    return admet_results

# Collect all defined tools
all_drug_discovery_tools = [
    simulate_synthesis_tool,
    identify_targets_tool,
    run_assay_tool,
    predict_admet_tool
]

# --- LLMAgent Class (Inspired by your reference) ---
class LLMAgent:
    def __init__(self, api_key: str, tools: List):
        if api_key is None and XAI_SDK_AVAILABLE: # Only raise error if SDK is present but key is missing
            print("WARNING: XAI_API_KEY is not set. Running in simulated Grok 4 mode.")
            self.llm = ChatXAI(model="grok-4-0709", xai_api_key="mock_key_for_simulated_mode") # Use mock key for mock LLM
        elif not XAI_SDK_AVAILABLE:
            print("WARNING: XAI SDK not available. Running in simulated Grok 4 mode.")
            self.llm = ChatXAI(model="grok-4-0709", xai_api_key="mock_key_for_simulated_mode") # Use mock key for mock LLM
        else:
            self.llm = ChatXAI(xai_api_key=api_key, model="grok-4-0709") # Use actual Grok 4 model if key and SDK are present

        self.tools = tools
        self.prompt = ChatPromptTemplate.from_messages(
            [
                (
                    "system",
                    "You are a helpful AI assistant with access to specialized drug discovery tools. "
                    "Use the tools to answer questions and fulfill requests. "
                    "If you cannot fully answer the question with the available tools, state so. "
                    "Be concise and provide clear answers based on tool outputs."
                ),
                MessagesPlaceholder(variable_name="chat_history"),
                ("user", "{query}"),
                MessagesPlaceholder(variable_name="agent_scratchpad"),
            ]
        )

        self.agent = (
            RunnablePassthrough.assign(
                agent_scratchpad=lambda x: format_to_tool_messages(
                    x["intermediate_steps"]
                )
            )
            | self.prompt
            | self.llm
            | ToolsAgentOutputParser()
        )

        self.agent_executor = AgentExecutor(agent=self.agent, tools=self.tools, verbose=False) # Set verbose to False for cleaner output

    def run_inference(self, query: str, chat_history: List[Union[HumanMessage, AIMessage, ToolMessage]] = []) -> Dict[str, Any]:
        """Runs the agent with a given query and context."""
        print(f"\n [Agent] Processing query: {query}")
        try:
            result = self.agent_executor.invoke({"query": query, "chat_history": chat_history})
            return result
        except Exception as e:
            # In a real Grok4 integration, this would handle API errors, etc.
            # For the mock, it catches issues with tool parsing or unexpected outputs.
            print(f" [Agent Error] An error occurred during agent execution: {e}")
            # If the mock LLM returns a tool_code, try to parse it manually for demonstration
            if "tool_code" in str(e): # A crude way to check if mock LLM returned tool_code
                try:
                    # Extract the tool_code string from the error message
                    tool_code_match = re.search(r'tool_code\': \'(\{.*?\})\'', str(e))
                    if tool_code_match:
                        tool_call_info = eval(tool_code_match.group(1)) # Use eval for simplicity, but be cautious with untrusted input
                        tool_name = tool_call_info.get("tool")
                        tool_input = tool_call_info.get("tool_input", {})
                        print(f" [Mock Tool Call Attempt] Grok 4 suggested tool: {tool_name} with input: {tool_input}")
                        # Manually call the tool based on the mock LLM's suggestion
                        if tool_name == "simulate_synthesis":
                            return simulate_synthesis_tool(**tool_input)
                        elif tool_name == "identify_targets":
                            return identify_targets_tool(**tool_input)
                        elif tool_name == "run_assay":
                            return run_assay_tool(**tool_input)
                        elif tool_name == "predict_admet":
                            return predict_admet_tool(**tool_input)
                        else:
                            return {"output": f"Mock Grok 4 suggested an unknown tool: {tool_name}"}
                except Exception as parse_e:
                    return {"output": f"Agent encountered an error and could not parse mock tool call: {parse_e}"}
            return {"output": f"Agent failed to process query: {e}"}


# --- Main execution block ---
if __name__ == "__main__":
    print("--- Full In Silico Drug Discovery Pipeline with Grok 4 Agent (Automatic Execution) ---")
    print("This script uses a LangChain agent with simulated Grok 4 to orchestrate drug discovery tasks.")
    print("It will run through a predefined set of queries automatically.")

    # Initialize the LLMAgent
    agent = LLMAgent(api_key=XAI_API_KEY, tools=all_drug_discovery_tools)
    chat_history = [] # To maintain conversation context

    # Predefined queries for automatic execution
    automatic_queries = [
        "Simulate synthesis of CCO at 150 mg with 90% purity",
        "Identify targets for Alzheimer's disease in Homo sapiens focusing on kinases and receptors",
        "Run a binding assay for molecule CC(=O)Nc1ccccc1 against target Protein_42 at 1000 nM concentration",
        "Predict ADMET for SMILES:CC(=O)Oc1ccccc1C(=O)O",
        "What are the potential targets for cancer in humans?",
        "Synthesize a molecule with SMILES:CN1C=NC2=C1C(=O)N(C(=O)C2)C at 200 mg with 98% purity.",
        "Perform a viability assay for molecule C1=CC=C(C=C1)C(C(=O)O)N against target Enzyme_15 at 500 nM concentration."
    ]

    for i, query in enumerate(automatic_queries):
        print("\n" + "="*80 + "\n")
        print(f"\n--- Running Automatic Query {i+1}/{len(automatic_queries)} ---")
        print(f"Query: {query}")

        # Add user query to chat history
        chat_history.append(HumanMessage(content=query))

        try:
            # Run inference with the agent
            agent_outcome = agent.run_inference(query, chat_history)

            # Display results
            print("\n--- Agent Outcome ---")
            if "output" in agent_outcome:
                print(agent_outcome["output"])
                # Add agent's response to chat history
                chat_history.append(AIMessage(content=agent_outcome["output"]))
            else:
                # If the agent directly returned tool output (e.g., in mock mode)
                tool_name = agent_outcome.pop("tool_name", "Unknown Tool")
                print(f"Tool Used: {tool_name}")
                for key, value in agent_outcome.items():
                    if isinstance(value, dict):
                        print(f"\n  {key.replace('_', ' ').title()}:")
                        for sub_key, sub_value in value.items():
                            print(f"    - {sub_key.replace('_', ' ').title()}: {sub_value}")
                    elif isinstance(value, list):
                        print(f"\n  {key.replace('_', ' ').title()}: {', '.join(map(str, value))}")
                    else:
                        print(f"  {key.replace('_', ' ').title()}: {value}")
                # For tool outputs, we might not add directly to chat_history as AIMessage,
                # but rather as ToolMessage if we were building a full conversational agent.
                # For this demo, we'll just show the output.
            print("---------------------\n")

        except Exception as e:
            print(f"An unexpected error occurred during agent processing for query '{query}': {e}")
            # If an error occurs, we might want to reset or modify chat_history
            # For simplicity, we'll just print the error and continue to the next query.

    print("\n--- All automatic queries completed. ---")



XAI_KEY found in Colab secrets.
--- Full In Silico Drug Discovery Pipeline with Grok 4 Agent (Automatic Execution) ---
This script uses a LangChain agent with simulated Grok 4 to orchestrate drug discovery tasks.
It will run through a predefined set of queries automatically.



--- Running Automatic Query 1/7 ---
Query: Simulate synthesis of CCO at 150 mg with 90% purity

 [Agent] Processing query: Simulate synthesis of CCO at 150 mg with 90% purity

--- Agent Outcome ---
### Simulated Synthesis of CCO (Ethanol, SMILES: CCO)

**Note:** Using the `simulate_synthesis` tool for a small-scale lab synthesis via hydration of ethylene (industrial-inspired method adapted for lab). This is a simulation; actual lab work requires safety protocols and equipment.

#### Reaction Overview
- **Starting Material:** Ethylene (CH₂=CH₂, 100 mg equivalent).
- **Reagents:** Water (H₂O, excess), sulfuric acid (H₂SO₄, catalyst, 10 mol%).
- **Conditions:** 150°C, high pressure (simulated autoclave), 2 hours.
-