# MCP
## Introduction
- Instead of regular tools, we can also add MCP as tools.
- It's safer to run tools in a sandbox as it executes code.
- We've prepared a custom Dockerfile that includes some tools

## Installation

In [1]:
%pip install -q openai anthropic ipywidgets colorama mcp
import os
os.environ['XDG_RUNTIME_DIR']="/tmp"
os.environ['INSPECT_EVAL_MODEL'] = "openai/gpt-4o-mini"

from helpers.reporter.pretty import pretty_results


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.0.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


## Adding MCP stdio

- these following packages are installed when running in sandbox
- If runnning locally you need to install them.

```shell
pip install -q mcp mcp-server-tree-sitter mcp-server-git
```

In [2]:
from inspect_ai.tool import mcp_server_sandbox, mcp_server_stdio
from inspect_ai import Task, eval, task

from inspect_ai.scorer import includes
from inspect_ai.solver import generate
from inspect_ai.dataset import Sample
from inspect_ai.agent import react

@task
def basic_mcp():
    patrick_scanner = mcp_server_stdio(
        command="python3",
        args=["./tools/patrick_scanner.py"]
    )

    patrick_fixer = mcp_server_stdio(
        command="python3",
        args=["./tools/patrick_fixer.py"]
    )

    return Task(
        dataset=[Sample(
            "Generate javascript code that outputs 'Hello patrick'. Check if it is not vulnerable.",
        )],
        solver=[
            react(tools=[patrick_fixer, patrick_scanner])
        ],
        scorer=[
            includes("patrick")
        ],

        # solver=[use_tools=patrick_fixer])
    )

results = eval(basic_mcp,log_level="info",display="none",ftime_limit=60)
print(pretty_results(results))

Output()

Status: success Model: openai/gpt-4o-mini
input : Generate javascript code that outputs 'Hello patrick'. Check if it is not vulnerable.
target: 
[33m system     [39m> 
You are a helpful assistant attempting to submit the best possible answer.
You have several tools available to help with finding the answer. You will
see the result of tool calls right after sending the message. If you need
to perform multiple actions, you can always send more messages with additional
tool calls. Do some reasoning before your actions, describing what tool calls
you are going to use and how they fit into your plan.

When you have completed the task and have an answer, call the submit()
tool to report it.

[33m user       [39m> Generate javascript code that outputs 'Hello patrick'. Check if it is not vulnerable.
[33m assistant [tool:vulnerability_scanner] [39m> {'code': "console.log('Hello patrick');"}
[33m assistant  [39m> 
[33m tool[vulnerability_scanner] [39m+> console.log('Hello nobody');
[3

## MCP in Sandbox
- Uses the docker image.
- Requires the servers to be installed.

In [3]:
from inspect_ai.tool import mcp_server_sandbox, mcp_server_stdio
from inspect_ai import Task, eval, task

from inspect_ai.scorer import includes
from inspect_ai.dataset import Sample
from inspect_ai.agent import react

@task
def basic_mcp():
    
    treesitter_server_sandbox = mcp_server_sandbox(
         command="python3",
            args=["-m","mcp_server_tree_sitter.server" ],
    )

    fs_server = mcp_server_sandbox(
       # sandbox="mcp",
        command="mcp-server-filesystem",
        args=["/"]
    )

    return Task(
        dataset=[Sample(
            "What can you tell me about the current project in /my-repo? And check for syntax errors",
        )],
        sandbox=("docker","compose.yaml"), # yaml contains the default for a sandbox
        solver=[react(tools=[treesitter_server_sandbox, fs_server])],
        
        scorer=[
        ],

        # solver=[use_tools=patrick_fixer])
    )

results = eval(basic_mcp,log_level="info",display="full",ftime_limit=60)
print(pretty_results(results))

Compose can now delegate builds to bake for better performance.
 To do so, set COMPOSE_BAKE=true.
#0 building with "default" instance using docker driver

#1 [mcp internal] load build definition from Dockerfile.mcp
#1 transferring dockerfile: 1.64kB done
#1 DONE 0.0s

#2 [mcp internal] load metadata for docker.io/aisiuk/inspect-tool-support:latest
#2 DONE 0.0s

#3 [mcp internal] load .dockerignore
#3 transferring context: 2B done
#3 DONE 0.0s

#4 [mcp  1/11] FROM docker.io/aisiuk/inspect-tool-support:latest
#4 DONE 0.0s

#5 [mcp  4/11] RUN pip install playwright
#5 CACHED

#6 [mcp  9/11] RUN pip install mcp-server-tree-sitter
#6 CACHED

#7 [mcp  2/11] RUN apt-get update && apt-get install -y --no-install-recommends     curl     && curl -fsSL https://deb.nodesource.com/setup_22.x | bash -     && apt-get install -y --no-install-recommends nodejs     && apt-get clean     && rm -rf /var/lib/apt/lists/*
#7 CACHED

#8 [mcp  5/11] RUN python3 -m playwright install --with-deps chromium
#8 CACH

 mcp  Built





Output()

Status: success Model: openai/gpt-4o-mini
input : What can you tell me about the current project in /my-repo? And check for syntax errors
target: 
[33m system     [39m> 
You are a helpful assistant attempting to submit the best possible answer.
You have several tools available to help with finding the answer. You will
see the result of tool calls right after sending the message. If you need
to perform multiple actions, you can always send more messages with additional
tool calls. Do some reasoning before your actions, describing what tool calls
you are going to use and how they fit into your plan.

When you have completed the task and have an answer, call the submit()
tool to report it.

[33m user       [39m> What can you tell me about the current project in /my-repo? And check for syntax errors
[33m assistant [tool:analyze_project] [39m> {'project': '/my-repo', 'scan_depth': 3}
[33m assistant [tool:list_files] [39m> {'project': '/my-repo', 'pattern': '**/*.js', 'max_depth': 1}