# Neo4j MCP Agent: Test, Evaluate & Deploy

This notebook tests and deploys the Neo4j MCP tool-calling LangGraph agent that connects to a Neo4j graph database through an external MCP server hosted on Azure Container Apps.

## What This Notebook Does

1. **Test the agent** - Verify the agent can query Neo4j via MCP tools
2. **Log as MLflow model** - Package the agent for deployment
3. **Evaluate with Agent Evaluation** - Assess quality with MLflow scorers
4. **Register to Unity Catalog** - Store the model in UC for governance
5. **Deploy to Model Serving** - Create a serving endpoint

## Prerequisites

- **HTTP Connection**: `neo4j_azure_beta_mcp` created (see `neo4j-mcp-http-connection.ipynb`)
- **MCP Flag Enabled**: "Is MCP connection" checkbox checked in Catalog Explorer
- **Secrets Configured**: Run `scripts/setup_databricks_secrets.sh` first
- **Neo4j MCP Server**: Running on Azure Container Apps

## Architecture

```
┌─────────────────┐     ┌──────────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  LangGraph      │────▶│ Unity Catalog HTTP   │────▶│ Neo4j MCP Server│────▶│ Neo4j Database  │
│  Agent          │     │ Connection Proxy     │     │ (Azure)         │     │                 │
│                 │     │ /api/2.0/mcp/external│     │                 │     │                 │
└─────────────────┘     └──────────────────────┘     └─────────────────┘     └─────────────────┘
       MCP tool calls           Bearer Token              Cypher Queries
       (JSON-RPC)               from Secrets              (read-cypher tool)
```

## Setup

**Important:** Before running this notebook, ensure your cluster is configured with the required libraries.

See the [README.md](./README.md#cluster-setup) for cluster setup instructions, including:
- Enabling **Machine Learning** runtime (17.3 LTS ML or later)
- Installing required PyPI packages: `databricks-agents`, `databricks-langchain`, `langgraph`, `mcp`, `databricks-mcp`

The cluster must be restarted after installing libraries.

## Test the Agent

Import the agent from `neo4j_mcp_agent.py` and test its tool-calling abilities.
Since the agent uses `mlflow.langchain.autolog()`, you can view traces in the MLflow UI.

In [None]:
from neo4j_mcp_agent import AGENT, CONNECTION_NAME, SECRET_SCOPE

print(f"Agent loaded successfully!")
print(f"Using HTTP connection: {CONNECTION_NAME}")
print(f"Using secrets scope: {SECRET_SCOPE}")

### Test 1: Get Database Schema

Ask the agent to retrieve the Neo4j database schema using the `get-schema` tool.

In [None]:
# Test get-schema tool
response = AGENT.predict({
    "input": [{"role": "user", "content": "What is the schema of the Neo4j database? Show me the node labels and relationship types."}]
})
print("Schema Response:")
print(response)

### Test 2: Execute a Cypher Query

Ask the agent to count nodes by label using the `read-cypher` tool.

In [None]:
# Test read-cypher tool
response = AGENT.predict({
    "input": [{"role": "user", "content": "How many nodes are there in the database? Break it down by node label."}]
})
print("Query Response:")
print(response)

### Test 3: Streaming Response

Test the streaming capability to see the agent's thought process in real-time.

In [None]:
# Test streaming
print("Streaming response:")
print("=" * 60)

for chunk in AGENT.predict_stream(
    {"input": [{"role": "user", "content": "What relationships exist in the database? List the top 5 by count."}]}
):
    print(chunk)
    print("-" * 40)

## Log the Agent as an MLflow Model

Log the agent as code from `neo4j_mcp_agent.py`. This packages the agent with its dependencies for deployment.

See [Deploy an agent that connects to Databricks MCP servers](https://docs.databricks.com/aws/en/generative-ai/mcp/managed-mcp#deploy-your-agent).

In [None]:
import mlflow
from neo4j_mcp_agent import LLM_ENDPOINT_NAME, CONNECTION_NAME
from mlflow.models.resources import DatabricksServingEndpoint, DatabricksUCConnection
from pkg_resources import get_distribution

# Define resources the agent depends on
resources = [
    DatabricksServingEndpoint(endpoint_name=LLM_ENDPOINT_NAME),
    DatabricksUCConnection(connection_name=CONNECTION_NAME),
]

# Log the agent model
with mlflow.start_run():
    logged_agent_info = mlflow.pyfunc.log_model(
        name="neo4j-mcp-agent",
        python_model="neo4j_mcp_agent.py",
        resources=resources,
        pip_requirements=[
            f"langgraph=={get_distribution('langgraph').version}",
            f"mcp=={get_distribution('mcp').version}",
            f"databricks-mcp=={get_distribution('databricks-mcp').version}",
            f"databricks-langchain=={get_distribution('databricks-langchain').version}",
        ]
    )

print(f"Model logged successfully!")
print(f"Run ID: {logged_agent_info.run_id}")
print(f"Model URI: {logged_agent_info.model_uri}")

## Evaluate the Agent with Agent Evaluation

Evaluate the agent using [MLflow Agent Evaluation](https://docs.databricks.com/mlflow3/genai/eval-monitor).
This uses predefined LLM scorers to assess response quality and safety.

You can customize the evaluation dataset and add [custom scorers](https://docs.databricks.com/mlflow3/genai/eval-monitor/custom-scorers).

In [None]:
import mlflow
from mlflow.genai.scorers import RelevanceToQuery, Safety

# Define evaluation dataset with Neo4j-specific queries
eval_dataset = [
    {
        "inputs": {
            "input": [
                {
                    "role": "user",
                    "content": "What node labels exist in the Neo4j database?"
                }
            ]
        },
        "expected_response": "The agent should call get-schema and return a list of node labels from the database."
    },
    {
        "inputs": {
            "input": [
                {
                    "role": "user",
                    "content": "Count all nodes in the database"
                }
            ]
        },
        "expected_response": "The agent should execute a Cypher query like MATCH (n) RETURN count(n) and return the total count."
    },
    {
        "inputs": {
            "input": [
                {
                    "role": "user",
                    "content": "What are the most common relationship types?"
                }
            ]
        },
        "expected_response": "The agent should query relationship types and their counts, returning the most common ones."
    }
]

print(f"Evaluation dataset contains {len(eval_dataset)} test cases")

In [None]:
# Run evaluation
eval_results = mlflow.genai.evaluate(
    data=eval_dataset,
    predict_fn=lambda input: AGENT.predict({"input": input}),
    scorers=[RelevanceToQuery(), Safety()],
)

print("Evaluation complete! Review results in the MLflow UI.")
print(f"Results: {eval_results}")

### Test the Logged Model

Verify the logged model works by running a prediction using `mlflow.models.predict`.

In [None]:
# Test the logged model
result = mlflow.models.predict(
    model_uri=f"runs:/{logged_agent_info.run_id}/neo4j-mcp-agent",
    input_data={"input": [{"role": "user", "content": "How many nodes are in the database?"}]},
    env_manager="uv",
)

print("Logged model prediction:")
print(result)

## Register the Model to Unity Catalog

Before deploying, register the agent to Unity Catalog for governance and versioning.

See [README.md](./README.md#usage) for instructions on creating the catalog and schema if needed.

In [None]:
import mlflow

# Set Unity Catalog as the model registry
mlflow.set_registry_uri("databricks-uc")

# Register the model
uc_registered_model_info = mlflow.register_model(
    model_uri=logged_agent_info.model_uri,
    name=UC_MODEL_NAME
)

print(f"Model registered successfully!")
print(f"Name: {uc_registered_model_info.name}")
print(f"Version: {uc_registered_model_info.version}")

## Deploy the Agent

Deploy the registered model to a Databricks Model Serving endpoint.

This creates a REST API endpoint that can be called to interact with the Neo4j MCP agent.

In [None]:
from databricks import agents

# Deploy the agent
deployment = agents.deploy(
    UC_MODEL_NAME,
    uc_registered_model_info.version,
    tags={"endpointSource": "neo4j-mcp", "connection": CONNECTION_NAME},
    deploy_feedback_model=False
)

print(f"Deployment initiated!")
print(f"Endpoint: {deployment}")

## Usage Examples

Once deployed, you can call the agent endpoint using the Databricks SDK or REST API.

In [None]:
# Example: Query the deployed endpoint
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# Get the endpoint name from deployment
# endpoint_name = deployment.endpoint_name  # Uncomment after deployment

# Example query structure
example_query = {
    "input": [
        {"role": "user", "content": "What is the structure of the Neo4j database?"}
    ]
}

print("Example query format:")
print(example_query)
print("")
print("To query the endpoint:")
print("")
print("from databricks.sdk import WorkspaceClient")
print("w = WorkspaceClient()")
print(f"response = w.serving_endpoints.query(name='<endpoint_name>', inputs=example_query)")

## Cleanup

Uncomment the cells below to delete resources when no longer needed.

In [None]:
# Uncomment to delete the serving endpoint
# from databricks.sdk import WorkspaceClient
# w = WorkspaceClient()
# w.serving_endpoints.delete(name="<endpoint_name>")
# print("Endpoint deleted.")

## Next Steps

1. **Customize the system prompt** - Edit `neo4j_mcp_agent.py` to tailor the agent's behavior
2. **Add more evaluation cases** - Expand the eval dataset with domain-specific queries
3. **Monitor in production** - Use MLflow to track agent performance and latency
4. **Share access** - Grant permissions to the serving endpoint for other users

## Resources

- [External MCP Servers Documentation](https://docs.databricks.com/aws/en/generative-ai/mcp/external-mcp)
- [MLflow Agent Evaluation](https://docs.databricks.com/mlflow3/genai/eval-monitor)
- [Databricks Model Serving](https://docs.databricks.com/aws/en/machine-learning/model-serving/)
- [Neo4j Cypher Manual](https://neo4j.com/docs/cypher-manual/current/)