# Databricks: Author and deploy an MCP tool-calling LangGraph agent

This notebook shows how to author a LangGraph agent that connects to MCP servers hosted on Databricks. LangGraph's graph-based architecture gives you complete control over agent behavior, making it the right choice when you need custom workflows or multi-step reasoning patterns.

Connect your agent to data and tools through MCP servers. Databricks provides managed MCP servers for Unity Catalog functions, vector search, and Genie spaces. You can also connect to custom MCP servers that you host as Databricks Apps. See [MCP on Databricks](https://docs.databricks.com/aws/en/generative-ai/mcp/).

In this notebook, you:

- Author a LangGraph agent
- Connect the agent to MCP servers to access Databricks-hosted tools
- Test the agent and evaluate its responses using MLflow Evaluation
- Log the agent with MLflow and deploy it to a model serving endpoint

This notebook uses the  [`ResponsesAgent`](https://mlflow.org/docs/latest/api_reference/python_api/mlflow.pyfunc.html#mlflow.pyfunc.ResponsesAgent) for Databrick compatibility.

To learn more about authoring an agent using Mosaic AI Agent Framework, see Databricks documentation ([AWS](https://docs.databricks.com/aws/generative-ai/agent-framework/author-agent) | [Azure](https://learn.microsoft.com/azure/databricks/generative-ai/agent-framework/create-chat-model)).

## Prerequisites

- Address all `TODO`s in this notebook.

In [0]:
%pip install -U -qqqq --force-reinstall databricks-langchain databricks-agents uv

In [0]:
dbutils.library.restartPython()

### Define the agent code

Define the agent code in a single cell below. This lets you easily write the agent
code to a local Python file, using the `%%writefile` magic command, for subsequent
logging and deployment.

**What this code does at a high level:**

1. **Connect to MCP servers using adapters**
    The `DatabricksMCPServer` and `DatabricksMultiServerMCPClient` from `databricks_langchain` handle:
    - Connections to Databricks MCP servers
    - Authentication
    - Automatic tool discovery and conversion to LangChain-compatible format

2. **Build a LangGraph agent workflow using LangGraph `StateGraph`**

3. **Handle streaming responses**
    The `MCPToolCallingAgent` class wraps the LangGraph workflow to:
    - Process streaming events from the agent graph in real-time
    - Convert LangChain message formats to Mosaic AI-compatible format
    - Enable MLflow tracing for each step of the agent workflow

4. **Wrap with ResponsesAgent**
    The agent is wrapped using `ResponsesAgent` for compatibility with Databricks
    features like evaluation, deployment, and feedback collection.

5. **MLflow autotracing**
    Enable MLflow autologging to automatically trace LLM calls, tool invocations,
    and agent state transitions.

#### Agent tools

This example connects to the Unity Catalog functions MCP server to access
`system.ai.python_exec` (a built-in Python code interpreter). The code also
includes commented-out examples for connecting to:
- Custom MCP servers (hosted as Databricks Apps)
- Vector search MCP servers (for semantic search over your data)


In [0]:
%%writefile agent_mcp.py

import asyncio
from typing import Annotated, Any, AsyncGenerator, Generator, Optional, Sequence, TypedDict, Union

import mlflow
import nest_asyncio
from databricks.sdk import WorkspaceClient
from databricks_langchain import (
    ChatDatabricks,
    DatabricksMCPServer,
    DatabricksMultiServerMCPClient,
)
from langchain.messages import AIMessage, AIMessageChunk, AnyMessage
from langchain_core.language_models import LanguageModelLike
from langchain_core.runnables import RunnableConfig, RunnableLambda
from langchain_core.tools import BaseTool
from langgraph.graph import END, StateGraph
from langgraph.graph.message import add_messages
from langgraph.prebuilt.tool_node import ToolNode
from mlflow.pyfunc import ResponsesAgent
from mlflow.types.responses import (
    ResponsesAgentRequest,
    ResponsesAgentResponse,
    ResponsesAgentStreamEvent,
    output_to_responses_items_stream,
    to_chat_completions_input,
)
from langchain_core.messages.tool import ToolMessage
import json

nest_asyncio.apply()
############################################
## Define your LLM endpoint and system prompt
############################################
LLM_ENDPOINT_NAME = "databricks-claude-sonnet-4-5"
GENIE_SPACE_ID = "01f101086a1711319ead12a273bb07f9"
VECTOR_SEARCH_SCHEMA = "mkr_gcp_sandbox_euw3/default"
llm = ChatDatabricks(endpoint=LLM_ENDPOINT_NAME)

# TODO: Update with your system prompt
system_prompt = """
You are a helpful assistant for our sports data customers.
Use the vector search, which containd product documentation, to answer questions our product.
Use the Genie to answer data related questions.
If you don't know the answer, say 'I don't know', don't make up answers.
"""


workspace_client = WorkspaceClient()
host = workspace_client.config.host


## There are three connection types:
## 1. Managed MCP servers — fully managed by Databricks
## 2. External MCP servers — hosted outside Databricks but proxied through a
##    Managed MCP server proxy
## 3. Custom MCP servers — MCP servers hosted as Databricks Apps
##
###############################################################################

def create_tools():
    # Import OBO credentials inside the function to ensure it's available at runtime
    try:
        from databricks_ai_bridge import ModelServingUserCredentials
        obo_workspace_client = WorkspaceClient(credentials_strategy=ModelServingUserCredentials())
    except Exception as e:
        print(f"Warning: Could not create OBO workspace client: {e}")
        # Fallback to default workspace client if OBO fails
        obo_workspace_client = workspace_client

    databricks_mcp_client = DatabricksMultiServerMCPClient(
        [
            DatabricksMCPServer(
                name="obo_vs_client",
                url=f"{host}/api/2.0/mcp/vector-search/{VECTOR_SEARCH_SCHEMA}",
                workspace_client=obo_workspace_client
            ),
            DatabricksMCPServer(
                name="obo_genie_client",
                url=f"{host}/api/2.0/mcp/genie/{GENIE_SPACE_ID}",
                workspace_client=obo_workspace_client
            )
        ]
    )
    return databricks_mcp_client.get_tools()



# The state for the agent workflow, including the conversation and any custom data
class AgentState(TypedDict):
    messages: Annotated[Sequence[AnyMessage], add_messages]
    custom_inputs: Optional[dict[str, Any]]
    custom_outputs: Optional[dict[str, Any]]


def create_tool_calling_agent(
    model: LanguageModelLike,
    system_prompt: Optional[str] = None,
):
    tools = asyncio.run(create_tools())
    model = model.bind_tools(tools)  # Bind tools to the model

    # Function to check if agent should continue or finish based on last message
    def should_continue(state: AgentState):
        messages = state["messages"]
        last_message = messages[-1]
        # If function (tool) calls are present, continue; otherwise, end
        if isinstance(last_message, AIMessage) and last_message.tool_calls:
            return "continue"
        else:
            return "end"

    # Preprocess: optionally prepend a system prompt to the conversation history
    if system_prompt:
        preprocessor = RunnableLambda(
            lambda state: [{"role": "system", "content": system_prompt}] + state["messages"]
        )
    else:
        preprocessor = RunnableLambda(lambda state: state["messages"])

    model_runnable = preprocessor | model  # Chain the preprocessor and the model

    # The function to invoke the model within the workflow
    def call_model(
        state: AgentState,
        config: RunnableConfig,
    ):
        response = model_runnable.invoke(state, config)
        return {"messages": [response]}

    workflow = StateGraph(AgentState)  # Create the agent's state machine

    workflow.add_node("agent", RunnableLambda(call_model))  # Agent node (LLM)
    workflow.add_node("tools", ToolNode(tools))  # Tools node

    workflow.set_entry_point("agent")  # Start at agent node
    workflow.add_conditional_edges(
        "agent",
        should_continue,
        {
            "continue": "tools",  # If the model requests a tool call, move to tools node
            "end": END,  # Otherwise, end the workflow
        },
    )
    workflow.add_edge("tools", "agent")  # After tools are called, return to agent node

    # Compile and return the tool-calling agent workflow
    return workflow.compile()


# ResponsesAgent class to wrap the compiled agent and make it compatible with Mosaic AI Responses API
class LangGraphResponsesAgent(ResponsesAgent):
    # def __init__(self, agent):
    #     self.agent = agent

    # Make a prediction (single-step) for the agent
    def predict(self, request: ResponsesAgentRequest) -> ResponsesAgentResponse:
        outputs = [
            event.item
            for event in self.predict_stream(request)
            if event.type == "response.output_item.done" or event.type == "error"
        ]
        return ResponsesAgentResponse(output=outputs, custom_outputs=request.custom_inputs)

    async def _predict_stream_async(
        self,
        request: ResponsesAgentRequest,
    ) -> AsyncGenerator[ResponsesAgentStreamEvent, None]:
        agent = create_tool_calling_agent(model=llm, system_prompt=system_prompt)
        cc_msgs = to_chat_completions_input([i.model_dump() for i in request.input])
        # Stream events from the agent graph
        async for event in agent.astream(
            {"messages": cc_msgs}, stream_mode=["updates", "messages"]
        ):
            if event[0] == "updates":
                # Stream updated messages from the workflow nodes
                for node_data in event[1].values():
                    if len(node_data.get("messages", [])) > 0:
                        all_messages = []
                        for msg in node_data["messages"]:
                            if isinstance(msg, ToolMessage) and not isinstance(msg.content, str):
                                msg.content = json.dumps(msg.content)
                            all_messages.append(msg)
                        for item in output_to_responses_items_stream(all_messages):
                            yield item
            elif event[0] == "messages":
                # Stream generated text message chunks
                try:
                    chunk = event[1][0]
                    if isinstance(chunk, AIMessageChunk) and (content := chunk.content):
                        yield ResponsesAgentStreamEvent(
                            **self.create_text_delta(delta=content, item_id=chunk.id),
                        )
                except:
                    pass

    # Stream predictions for the agent, yielding output as it's generated
    def predict_stream(
        self, request: ResponsesAgentRequest
    ) -> Generator[ResponsesAgentStreamEvent, None, None]:
        agen = self._predict_stream_async(request)

        try:
            loop = asyncio.get_event_loop()
        except RuntimeError:
            loop = asyncio.new_event_loop()
            asyncio.set_event_loop(loop)

        ait = agen.__aiter__()

        while True:
            try:
                item = loop.run_until_complete(ait.__anext__())
            except StopAsyncIteration:
                break
            else:
                yield item


# Initialize the entire agent, including MCP tools and workflow
def initialize_agent():
    return LangGraphResponsesAgent()


mlflow.langchain.autolog()
AGENT = initialize_agent()
mlflow.models.set_model(AGENT)

## Test the agent

Interact with the agent to test its output and tool-calling abilities. Since this notebook called `mlflow.langchain.autolog()`, you can view the trace for each step the agent takes.

In [0]:
dbutils.library.restartPython()

In [0]:
from agent_mcp import AGENT

AGENT.predict({"input": [{"role": "user", "content": "What is can I do with fixtures in the product?"}]})

In [0]:
for chunk in AGENT.predict_stream(
    {"input": [{"role": "user", "content": "What is can I do with fixtures in the product?"}]}
):
    print(chunk, "-----------\n")

## Log the agent as an MLflow model

Log the agent as code from the `agent_mcp.py` file. See [Deploy an agent that connects to Databricks MCP servers](https://docs.databricks.com/aws/en/generative-ai/mcp/managed-mcp#deploy-your-agent).

In [0]:
import mlflow
from mlflow.models.auth_policy import AuthPolicy, SystemAuthPolicy, UserAuthPolicy
from mlflow.models.resources import DatabricksServingEndpoint
from pkg_resources import get_distribution


LLM_ENDPOINT_NAME = "databricks-claude-sonnet-4-5"
# System policy: resources accessed with system credentials
system_policy = SystemAuthPolicy(
    resources=[DatabricksServingEndpoint(endpoint_name=LLM_ENDPOINT_NAME)]
)

# User policy: API scopes for OBO access
user_policy = UserAuthPolicy(api_scopes=[
    "serving.serving-endpoints",
    "mcp.vectorsearch",
    "mcp.genie"
])

# Log the agent with both policies
with mlflow.start_run():
    logged_agent_info = mlflow.pyfunc.log_model(
        name="agent",
        python_model="agent_mcp.py",
        auth_policy=AuthPolicy(
            system_auth_policy=system_policy,
            user_auth_policy=user_policy,
        ),
        pip_requirements=[
            f"langgraph=={get_distribution('langgraph').version}",
            f"mcp=={get_distribution('mcp').version}",
            f"databricks-mcp=={get_distribution('databricks-mcp').version}",
            f"databricks-langchain=={get_distribution('databricks-langchain').version}",
        ]
    )

## Evaluate the agent with [Agent Evaluation](https://docs.databricks.com/mlflow3/genai/eval-monitor)

You can edit the requests or expected responses in your evaluation dataset and run evaluation as you iterate your agent, leveraging mlflow to track the computed quality metrics.

Evaluate your agent with one of our [predefined LLM scorers](https://docs.databricks.com/mlflow3/genai/eval-monitor/predefined-judge-scorers), or try adding [custom metrics](https://docs.databricks.com/mlflow3/genai/eval-monitor/custom-scorers).

In [0]:
import mlflow
from mlflow.genai.scorers import RelevanceToQuery, Safety, RetrievalRelevance, RetrievalGroundedness

eval_dataset = [
    {
        "inputs": {
            "input": [
                {
                    "role": "user",
                    "content": "What is can I do with fixtures in the product?"
                }
            ]
        },
        "expected_response": "Based on the product documentation, here's what you can do with fixtures in the product:\n\n## Fixture Ordering\n- **Order fixtures** for InPlay or PreMatch events across different hierarchy levels (sport, location, competition, and individual fixtures)\n- **View fixtures** with various statuses:\n  - **InPlay**: NSY, About to Start, In Progress, Lost Coverage, Interrupted (max 5 days from start date), Postponed\n  - **PreMatch**: NSY, About to Start, Postponed\n- **Configure market settings** with a hierarchy: Package configuration → Sport → Location → Competition → Fixture\n\n## Fixture Management\n- **Remove orders/subscriptions** from sport/location/competition levels\n  - When removing an ordered component, all related fixtures are automatically removed\n  - For fixtures already in progress, all relevant markets are suspended and end-of-event messages are sent\n\n## Trading Floor Operations\n- **Suspend/Unsuspend odds** at different levels:\n  - **Suspend all** - suspend all fixture's markets\n  - **Suspend market** - suspend only a specific market\n  - **Suspend line** - suspend only a specific line\n- **Manual suspension control** - easily suspend/unsuspend with visual indicators (red button and red frame around suspended markets)\n\n## Fixture Logs\n- **View comprehensive logs** of all your ordered fixtures\n- **Access fixture-specific logs** by selecting individual fixtures from the list\n\nThe system provides flexibility to manage fixtures at various levels of granularity, from broad sport-level ordering down to specific fixture and market control."
    }
]

eval_results = mlflow.genai.evaluate(
    data=eval_dataset,
    predict_fn=lambda input: AGENT.predict({"input": input}),
    scorers=[RelevanceToQuery(), Safety()], # add more scorers here if they're applicable
)

# Review the evaluation results in the MLfLow UI (see console output)

In [0]:
mlflow.models.predict(
    model_uri=f"runs:/{logged_agent_info.run_id}/agent",
    input_data={"input": [{"role": "user", "content": "What is can I do with fixtures in the product?"}]},
    env_manager="uv",
)

## Register the model to Unity Catalog

Before you deploy the agent, you must register the agent to Unity Catalog.

- **TODO** Update the `catalog`, `schema`, and `model_name` below to register the MLflow model to Unity Catalog.

In [0]:
mlflow.set_registry_uri("databricks-uc")

# TODO: define the catalog, schema, and model name for your UC model
catalog = "mkr_gcp_sandbox_euw3"
schema = "default"
model_name = "agent_mcp"
UC_MODEL_NAME = f"{catalog}.{schema}.{model_name}"

# register the model to UC
uc_registered_model_info = mlflow.register_model(
    model_uri=logged_agent_info.model_uri, name=UC_MODEL_NAME
)

## Deploy the agent

In [0]:
from databricks import agents

agents.deploy(
    UC_MODEL_NAME, 
    uc_registered_model_info.version,
    # ==============================================================================
    # TODO: ONLY UNCOMMENT AND CONFIGURE THE ENVIRONMENT_VARS SECTION BELOW
    #       IF YOU ARE USING OAUTH/SERVICE PRINCIPAL FOR CUSTOM MCP SERVERS.
    #       For managed MCP (the default), LEAVE THIS SECTION COMMENTED OUT.
    # ==============================================================================
    # environment_vars={
    #     "DATABRICKS_CLIENT_ID": DATABRICKS_CLIENT_ID,
    #     "DATABRICKS_CLIENT_SECRET": f"{{{{secrets/{client_secret_scope_name}/{client_secret_key_name}}}}}"
    # },
    tags = {"endpointSource": "docs"},
    deploy_feedback_model=False
)
