<a href="https://colab.research.google.com/github/frank-morales2020/MLxDL/blob/main/DEEPSEEKV3DOT1_proteinfold_AIAGENT_DEMO.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

https://platform.deepseek.com/usage

In [None]:
!pip install colab-env -q
!pip install openai -q
!pip install pydantic -q

In [2]:
from openai import OpenAI
from google.colab import userdata


api_key = userdata.get('DEEPSEEK_API_KEY')

client = OpenAI(api_key=api_key, base_url="https://api.deepseek.com")


def deepseek_inference(prompt: str, model: str):
    response = client.chat.completions.create(
        model=model,
        messages=[
            {"role": "system", "content": "you are a helpful assistant."},
            {"role": "user", "content": prompt}
        ]
    )


    reasoning = getattr(response.choices[0].message, "reasoning_content", None)
    final_answer = response.choices[0].message.content


    return {"model": model, "answer": final_answer, "reasoning": reasoning}


Q0="Can you explain the field of convex optimization in simple terms, including what types of problems it is used to solve?"
res = deepseek_inference(Q0, "deepseek-reasoner")["answer"]
print("Assistant:", res)

Assistant: Of course! Let's break down convex optimization in simple terms.

### The Big Idea: The "No Bad Local Decisions" Problem Solver

Imagine you're in a smooth, bowl-shaped valley (like one you'd see at the bottom of a playground slide). Your only goal is to find the very lowest point in this valley.

*   No matter where you stand, if you just walk downhill, you will *always* eventually reach that single lowest point.
*   There are no other small holes or dips (local minima) to trick you into thinking you've found the bottom when you haven't.
*   The entire valley is predictably shaped—it has no sudden cliffs or weird, jagged rocks.

**Convex optimization is the field of math and computer science that provides tools to efficiently find the lowest point in these "bowl-shaped" problems.**

---

### What Makes a Problem "Convex"?

For a problem to be convex, two things need to be true:

1.  **The "Bowl" (The Feasible Set):** The set of all possible answers or choices you're allowed

In [1]:
# -*- coding: utf-8 -*-
"""Copy of proteinfold_AIAGENT_DEMO.ipynb"""
import os
import time
import json
from pydantic import BaseModel
from openai import OpenAI
from google.colab import userdata # Use this for Google Colab secrets

# Pydantic model for Protein Folding Prediction Result
class ProteinFoldResult(BaseModel):
    protein_id: str
    predicted_structure_url: str
    confidence_score: float
    folding_time_estimate_ns: float
    potential_misfolding_regions: list[str]
    relevant_chaperone_types: list[str]

# Use the user-specified method for API key retrieval
try:
    api_key = userdata.get('DEEPSEEK_API_KEY')
except Exception as e:
    print(f"Error retrieving API key from Colab secrets: {e}")
    print("Please ensure your 'DEEPSEEK_API_KEY' is set in Google Colab's secret manager.")
    exit()

if not api_key:
    print("Error: DEEPSEEK_API_KEY not found in Google Colab's secret manager.")
    print("Please set your DeepSeek API key before running this script.")
    exit()

# Initialize the OpenAI client with the DeepSeek API base URL
client = OpenAI(api_key=api_key, base_url="https://api.deepseek.com")

print("Creating AI agents for Protein Folding Prediction Domain...")

# 1. Protein Sequence Data Agent
protein_data_agent_config = {
    "model": "deepseek-chat",
    "description": "Agent for retrieving protein sequences, experimental structures, and associated biological data.",
    "name": "protein-sequence-data-agent",
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "fetch_protein_sequence_and_metadata",
                "description": "Fetches amino acid sequence and basic metadata for a given protein ID (e.g., UniProt ID).",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "protein_id": {"type": "string", "description": "Identifier for the protein (e.g., UniProt: P0DTD1)."},
                    },
                    "required": ["protein_id"]
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "retrieve_experimental_structure",
                "description": "Retrieves the PDB ID and basic info for experimental 3D structures if available for a protein.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "protein_id": {"type": "string", "description": "Identifier for the protein."},
                    },
                    "required": ["protein_id"]
                }
            }
        }
    ]
}
print(f"Protein Sequence Data Agent '{protein_data_agent_config['name']}' created.")

# 2. Folding Prediction & Simulation Agent
folding_simulation_agent_config = {
    "model": "deepseek-chat",
    "description": "Agent for conceptually simulating protein folding dynamics and predicting 3D structures.",
    "name": "folding-prediction-simulation-agent",
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "predict_initial_3d_structure",
                "description": "Predicts an initial plausible 3D structure for a protein based on its sequence using advanced ML models.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "amino_acid_sequence": {"type": "string", "description": "The full amino acid sequence of the protein."},
                        "parameters_json": {"type": "string", "description": "JSON string of prediction parameters (e.g., '{\"model\": \"alphafold\"}')."}
                    },
                    "required": ["amino_acid_sequence"]
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "run_molecular_dynamics_simulation_conceptual",
                "description": "Conceptually simulates short-scale molecular dynamics to refine or explore folding pathways.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "initial_3d_coordinates": {"type": "string", "description": "JSON string of initial atomic coordinates (PDB format excerpt)."},
                        "simulation_time_ns": {"type": "number", "description": "Duration of conceptual simulation in nanoseconds."},
                        "environment_conditions": {"type": "string", "description": "JSON string of environmental factors (e.g., '{\"pH\": 7.4, \"temperature_K\": 310}')."}
                    },
                    "required": ["initial_3d_coordinates", "simulation_time_ns"]
                }
            }
        }
    ]
}
print(f"Folding Prediction & Simulation Agent '{folding_simulation_agent_config['name']}' created.")

# 3. Misfolding Analysis & Intervention Agent
misfolding_analysis_agent_config = {
    "model": "deepseek-chat",
    "description": "Agent for identifying misfolding risks, analyzing aggregation pathways, and proposing intervention strategies.",
    "name": "misfolding-analysis-intervention-agent",
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "identify_misfolding_hotspots",
                "description": "Analyzes a protein sequence and predicted structure for regions prone to misfolding or aggregation.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "amino_acid_sequence": {"type": "string", "description": "The full amino acid sequence."},
                        "predicted_3d_structure": {"type": "string", "description": "URL or ID of the predicted 3D structure."},
                    },
                    "required": ["amino_acid_sequence", "predicted_3d_structure"]
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "suggest_chaperone_or_mutation",
                "description": "Suggests types of molecular chaperones or specific mutations to prevent misfolding or promote correct folding.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "protein_id": {"type": "string", "description": "ID of the protein in question."},
                        "misfolding_context": {"type": "string", "description": "Description of the misfolding issue or aggregation type."},
                    },
                    "required": ["protein_id", "misfolding_context"]
                }
            }
        }
    ]
}
print(f"Misfolding Analysis & Intervention Agent '{misfolding_analysis_agent_config['name']}' created.")

# 4. Result Synthesis & Interpretation Agent
result_synthesis_agent_config = {
    "model": "deepseek-reasoner",
    "name": "result-synthesis-interpretation-agent",
    "description": "Agent for synthesizing data from folding predictions and misfolding analyses into a comprehensive report.",
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "synthesize_folding_report",
                "description": "Combines predicted structure data, folding dynamics insights, and misfolding analysis into a unified report.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "prediction_results_json": {"type": "string", "description": "JSON string of protein structure prediction outputs."},
                        "misfolding_analysis_json": {"type": "string", "description": "JSON string of misfolding analysis results."},
                        "protein_context": {"type": "string", "description": "Biological context of the protein."},
                    },
                    "required": ["prediction_results_json", "misfolding_analysis_json"]
                }
            }
        }
    ]
}
print(f"Result Synthesis & Interpretation Agent '{result_synthesis_agent_config['name']}' created.")

# 5. Historical & Ethical Context Agent
historical_ethical_agent_config = {
    "model": "deepseek-reasoner",
    "description": "Agent for providing historical context of protein science and ethical considerations in bioscience research.",
    "name": "historical-ethical-context-agent",
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_protein_science_milestones",
                "description": "Retrieves key historical milestones and figures in protein science and folding research.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "topic": {"type": "string", "description": "Specific topic in protein science (e.g., 'Levinthal's Paradox', 'Chaperone discovery')."},
                    },
                    "required": ["topic"]
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "analyze_ethical_implications",
                "description": "Analyzes ethical implications related to protein engineering, drug development, or genetic interventions for proteinopathies.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "bioscience_application": {"type": "string", "description": "Specific bioscience application (e.g., 'CRISPR for proteinopathies', 'AI drug design')."},
                    },
                    "required": ["bioscience_application"]
                }
            }
        }
    ]
}
print(f"Historical & Ethical Context Agent '{historical_ethical_agent_config['name']}' created.")

print("\nAll new Protein Folding-related agents have been defined.")

# Mock functions for Protein Folding domain tools (Conceptual Implementations)
def fetch_protein_sequence_and_metadata(protein_id: str):
    """MOCK function to fetch amino acid sequence and basic metadata."""
    print(f"\n [DEBUG] MOCK CALL: fetch_protein_sequence_and_metadata ID: '{protein_id}'")
    if "P0DTD1" in protein_id:
        return {"status": "success", "sequence": "MFVFLVLLPLVSSQCVN...", "name": "SARS-CoV-2 Spike Glycoprotein", "length": 1273}
    return {"status": "not_found", "message": "Protein ID not found."}

def retrieve_experimental_structure(protein_id: str):
    """MOCK function to retrieve experimental 3D structures."""
    print(f"\n [DEBUG] MOCK CALL: retrieve_experimental_structure ID: '{protein_id}'")
    if "P0DTD1" in protein_id:
        return {"status": "success", "pdb_ids": ["6VSB", "6M0J"], "description": "Known structures available in PDB."}
    return {"status": "no_experimental_structure", "message": "No experimental structure found for this protein."}

def predict_initial_3d_structure(amino_acid_sequence: str, parameters_json: str = "{}"):
    """MOCK function to predict an initial plausible 3D structure."""
    print(f"\n [DEBUG] MOCK CALL: predict_initial_3d_structure Sequence length: {len(amino_acid_sequence)}, Params: '{parameters_json}'")
    if len(amino_acid_sequence) > 100:
        return {"status": "prediction_started", "predicted_structure_url": "mock_alphafold_result_url_P0DTD1.pdb", "confidence_score": 0.92, "message": "High confidence prediction generated."}
    return {"status": "prediction_failed", "message": "Sequence too short for meaningful prediction or other issue."}

def run_molecular_dynamics_simulation_conceptual(initial_3d_coordinates: str, simulation_time_ns: float, environment_conditions: str = "{}"):
    """MOCK function to conceptually simulate short-scale molecular dynamics."""
    print(f"\n [DEBUG] MOCK CALL: run_molecular_dynamics_simulation_conceptual Time: {simulation_time_ns}ns, Env: '{environment_conditions}'")
    return {"status": "simulation_complete", "observed_changes": "Minor structural fluctuations, no major folding event observed in short time.", "final_conformation_excerpt": "ATOM   1  N   ALA ...", "simulation_duration_s": 0.5}

def identify_misfolding_hotspots(amino_acid_sequence: str, predicted_3d_structure: str):
    """MOCK function to analyze misfolding risks."""
    print(f"\n [DEBUG] MOCK CALL: identify_misfolding_hotspots Sequence length: {len(amino_acid_sequence)}, Structure: '{predicted_3d_structure}'")
    if "P0DTD1" in predicted_3d_structure:
        return {"status": "analysis_complete", "hotspots": ["residues 600-610", "residues 980-990"], "propensity_score": 0.75, "details": "Hydrophobic patches identified, prone to aggregation."}
    return {"status": "no_hotspots_found", "hotspots": [], "message": "No significant misfolding hotspots identified."}

def suggest_chaperone_or_mutation(protein_id: str, misfolding_context: str):
    """MOCK function to suggest interventions for misfolding."""
    print(f"\n [DEBUG] MOCK CALL: suggest_chaperone_or_mutation Protein: '{protein_id}', Context: '{misfolding_context}'")
    if "aggregation" in misfolding_context:
        return {"status": "suggestions_made", "interventions": ["Hsp70 chaperone", "point mutation at Leu605 to Pro"], "rationale": "To prevent hydrophobic collapse and aggregation."}
    return {"status": "no_specific_suggestion", "interventions": [], "message": "Further context needed for specific intervention."}

def synthesize_folding_report(prediction_results_json: str, misfolding_analysis_json: str, protein_context: str = "{}"):
    """MOCK function to synthesize a comprehensive protein folding report."""
    print(f"\n [DEBUG] MOCK CALL: synthesize_folding_report Pred Results: '{prediction_results_json}', Misfold Analysis: '{misfolding_analysis_json}'")
    try:
        pred_res = json.loads(prediction_results_json)
        misfold_res = json.loads(misfolding_analysis_json)
    except json.JSONDecodeError as e:
        print(f" [ERROR] JSON decode failed: {e}")
        pred_res = {}
        misfold_res = {}

    report_summary = f"Predicted structure URL: {pred_res.get('predicted_structure_url', 'N/A')} with confidence {pred_res.get('confidence_score', 'N/A')}. "
    if misfold_res.get('hotspots'):
        report_summary += f"Identified misfolding regions: {', '.join(misfold_res['hotspots'])}. "
    else:
        report_summary += "No significant misfolding regions detected. "

    return {
        "protein_id": "P0DTD1_mock",
        "predicted_structure_url": pred_res.get('predicted_structure_url'),
        "confidence_score": pred_res.get('confidence_score'),
        "folding_time_estimate_ns": 1000.0,
        "potential_misfolding_regions": misfold_res.get('hotspots'),
        "relevant_chaperone_types": ["Hsp70"] if misfold_res.get('hotspots') else [],
    }

def get_protein_science_milestones(topic: str):
    """MOCK function to retrieve historical milestones in protein science."""
    print(f"\n [DEBUG] MOCK CALL: get_protein_science_milestones Topic: '{topic}'")
    if "Levinthal's Paradox" in topic:
        return {"status": "success", "milestones": ["Cyrus Levinthal proposes paradox (1969)"], "era": "Early computational biology"}
    elif "Chaperone discovery" in topic:
        return {"status": "success", "milestones": ["Heat shock proteins discovered (1960s)", "Protein folding chaperones identified (1980s)"], "era": "Molecular Biology"}
    return {"status": "not_found", "milestones": [], "message": "No relevant milestones found."}

def analyze_ethical_implications(bioscience_application: str):
    """MOCK function to analyze ethical implications in bioscience."""
    print(f"\n [DEBUG] MOCK CALL: analyze_ethical_implications Application: '{bioscience_application}'")
    if "CRISPR for proteinopathies" in bioscience_application:
        return {"status": "analysis_complete", "implications": ["Germline editing concerns", "Accessibility and equity issues", "Off-target effects"], "details": "Gene editing for misfolding diseases raises complex ethical questions."}
    return {"status": "no_specific_implications", "implications": [], "message": "No specific ethical implications for this application."}

# Master Tool Executor Mapping for Protein Folding Domain
tool_executor = {
    "fetch_protein_sequence_and_metadata": fetch_protein_sequence_and_metadata,
    "retrieve_experimental_structure": retrieve_experimental_structure,
    "predict_initial_3d_structure": predict_initial_3d_structure,
    "run_molecular_dynamics_simulation_conceptual": run_molecular_dynamics_simulation_conceptual,
    "identify_misfolding_hotspots": identify_misfolding_hotspots,
    "suggest_chaperone_or_mutation": suggest_chaperone_or_mutation,
    "synthesize_folding_report": synthesize_folding_report,
    "get_protein_science_milestones": get_protein_science_milestones,
    "analyze_ethical_implications": analyze_ethical_implications,
}

# Test Case Execution for Protein Folding Agents
print("\n--- Executing Test Cases for Protein Folding Agents (via chat completions) ---")

test_cases = [
    {
        "agent_config": protein_data_agent_config,
        "name": "Protein Sequence Data Agent",
        "query": "Fetch the amino acid sequence and metadata for UniProt ID P0DTD1.",
    },
    {
        "agent_config": protein_data_agent_config,
        "name": "Protein Sequence Data Agent",
        "query": "Are there any experimental 3D structures available for protein P0DTD1?",
    },
    {
        "agent_config": folding_simulation_agent_config,
        "name": "Folding Prediction & Simulation Agent",
        "query": "Predict an initial 3D structure for a protein with the sequence 'MVLSPADKTNVKAAWGKVGAHAG...' (partial hemoglobin alpha).",
    },
    {
        "agent_config": folding_simulation_agent_config,
        "name": "Folding Prediction & Simulation Agent",
        "query": "Run a conceptual molecular dynamics simulation for 10 nanoseconds on an initial structure starting with ATOM 1 N GLY ...",
    },
    {
        "agent_config": misfolding_analysis_agent_config,
        "name": "Misfolding Analysis & Intervention Agent",
        "query": "Identify misfolding hotspots in the SARS-CoV-2 Spike protein (sequence 'MFVFLVLLPLVSSQCVN...' and predicted structure at 'mock_alphafold_result_url_P0DTD1.pdb').",
    },
    {
        "agent_config": misfolding_analysis_agent_config,
        "name": "Misfolding Analysis & Intervention Agent",
        "query": "Suggest interventions for a protein that is prone to aggregation.",
    },
    {
        "agent_config": result_synthesis_agent_config,
        "name": "Result Synthesis & Interpretation Agent",
        "query": "Synthesize a report on the folding and misfolding characteristics given prediction results: {'predicted_structure_url': 'mock_url.pdb', 'confidence_score': 0.9} and misfolding analysis: {'hotspots': ['H1', 'H2'], 'propensity_score': 0.8}.",
    },
    {
        "agent_config": historical_ethical_agent_config,
        "name": "Historical & Ethical Context Agent",
        "query": "What are the key milestones related to Levinthal's Paradox in protein science?",
    },
    {
        "agent_config": historical_ethical_agent_config,
        "name": "Historical & Ethical Context Agent",
        "query": "Analyze the ethical implications of using CRISPR for treating proteinopathies.",
    },
]

for test_case in test_cases:
    agent_config_to_test = test_case["agent_config"]
    agent_name = test_case["name"]
    user_query = test_case["query"]

    print(f"\n--- Executing Test Case for the {agent_name} ---")
    print(f"User: {user_query}")
    conversation_history = [{"role": "user", "content": user_query}]

    try:
        # Check if the model supports tool calls before passing the tools parameter
        is_tool_compatible = (agent_config_to_test['model'] == 'deepseek-chat')

        response_turn1 = client.chat.completions.create(
            model=agent_config_to_test['model'],
            messages=conversation_history,
            tools=agent_config_to_test['tools'] if is_tool_compatible else None,
            tool_choice="auto" if is_tool_compatible else "none"
        )
        assistant_message_turn1 = response_turn1.choices[0].message
        conversation_history.append(assistant_message_turn1)

        if assistant_message_turn1.tool_calls:
            print(f"\n{agent_name} proposed tool calls (Turn 1):")
            for tool_call in assistant_message_turn1.tool_calls:
                print(f" Tool Name: {tool_call.function.name}")
                print(f" Tool Arguments (JSON string): {tool_call.function.arguments}")

                tool_output_content = None
                if tool_call.function.name in tool_executor:
                    try:
                        args = json.loads(tool_call.function.arguments)
                        tool_output = tool_executor[tool_call.function.name](**args)
                        tool_output_content = json.dumps(tool_output)
                        print(f" [DEBUG] Local MOCK {tool_call.function.name} executed. Output: {tool_output_content}")
                    except json.JSONDecodeError as e:
                        print(f" [ERROR] Failed to parse tool arguments for {tool_call.function.name}: {e}")
                        tool_output_content = json.dumps({"error": f"Failed to parse arguments: {e}"})
                    except Exception as e:
                        print(f" [ERROR] Error executing local mock {tool_call.function.name}: {e}")
                        tool_output_content = json.dumps({"error": f"Tool execution failed: {e}"})
                else:
                    print(f" [DEBUG] Unhandled tool call: {tool_call.function.name}")
                    tool_output_content = json.dumps({"error": "Tool not handled by client-side executor"})

                conversation_history.append(
                    {
                        "tool_call_id": tool_call.id,
                        "role": "tool",
                        "name": tool_call.function.name,
                        "content": tool_output_content,
                    }
                )

            final_response = client.chat.completions.create(
                model=agent_config_to_test['model'],
                messages=conversation_history,
            )
            final_assistant_message = final_response.choices[0].message
            print(f"\n{agent_name}'s Final Response:")
            print(final_assistant_message.content)

        else:
            print(f"\n{agent_name}'s initial response (no tool calls proposed):")
            print(assistant_message_turn1.content)

    except Exception as e:
        print(f"\nAn error occurred during {agent_name} interaction: {e}")
        print("Please check your API key, model availability, network connection, or SDK version.")

print("\n--- All test cases execution complete. ---")

Creating AI agents for Protein Folding Prediction Domain...
Protein Sequence Data Agent 'protein-sequence-data-agent' created.
Folding Prediction & Simulation Agent 'folding-prediction-simulation-agent' created.
Misfolding Analysis & Intervention Agent 'misfolding-analysis-intervention-agent' created.
Result Synthesis & Interpretation Agent 'result-synthesis-interpretation-agent' created.
Historical & Ethical Context Agent 'historical-ethical-context-agent' created.

All new Protein Folding-related agents have been defined.

--- Executing Test Cases for Protein Folding Agents (via chat completions) ---

--- Executing Test Case for the Protein Sequence Data Agent ---
User: Fetch the amino acid sequence and metadata for UniProt ID P0DTD1.

Protein Sequence Data Agent proposed tool calls (Turn 1):
 Tool Name: fetch_protein_sequence_and_metadata
 Tool Arguments (JSON string): {"protein_id": "P0DTD1"}

 [DEBUG] MOCK CALL: fetch_protein_sequence_and_metadata ID: 'P0DTD1'
 [DEBUG] Local MOCK 