# Shell Tool with OpenAI Responses API

Author: [Priyanshu Deshmukh](https://github.com/priyansh4320)

This notebook demonstrates how to use the shell tool with OpenAI's Responses API. The shell tool allows models to execute shell commands through your integration, enabling them to interact with your local computer through a controlled command-line interface.

**Warning: Running arbitrary shell commands can be dangerous. Always sandbox execution or add strict allow-/deny-lists before forwarding commands to the system shell in production.**

## Install AG2 and dependencies

To be able to run this notebook, you will need to install AG2 with the `openai` extra.
````{=mdx}
:::info Requirements
Install `ag2` with 'openai' extra:

pip install ag2[openai]
```

For more information, please refer to the [installation guide](https://docs.ag2.ai/latest/docs/user-guide/basic-concepts/installing-ag2).
:::
````

## Setup

First, let's configure the LLM with the Responses API and enable the shell tool.

In [None]:
import os

from autogen import ConversableAgent, LLMConfig

# Configure the LLM with Responses API and shell tool
llm_config = LLMConfig(
    config_list={
        "api_type": "responses",
        "model": "gpt-5.1",
        "api_key": os.getenv("OPENAI_API_KEY"),
        "built_in_tools": ["shell"],
    },
)

# Create the assistant agent
assistant = ConversableAgent(
    name="Assistant",
    system_message="""You are a helpful assistant with access to shell commands.
    You can use the shell tool to execute commands and interact with the filesystem.
    The local shell environment is on Mac/Linux.
    Keep your responses concise and include command output when helpful.
    """,
    llm_config=llm_config,
    human_input_mode="NEVER",
)

## Example 1: Automating Filesystem Diagnostics

The shell tool is perfect for automating filesystem or process diagnostics. In this example, we'll find the largest PDF file in a directory and show running processes.

In [None]:
# Example 1: Find the largest PDF and show processes
result = assistant.run(
    message="""
    Please help me with the following tasks:
    1. ls to show files in current directory
    2. Show me information about running Python processes
    """,
    max_turns=2,
).process()

## Example 2: Extending Model Capabilities with UNIX Utilities

The shell tool extends the model's capabilities by allowing it to use built-in UNIX utilities, Python runtime, and other CLIs in your environment. This enables the model to perform tasks that require system-level operations.

In [None]:
# Example 2: Use UNIX utilities and Python CLI
result = assistant.run(
    message="""
    Please help me:
    1. Check the current Python version using the python CLI
    2. Get system information like disk usage and memory
    3. Create a simple text file and then use grep to search within it
    """,
    max_turns=6,
).process()

## Example 3: Multi-Step Build and Test Flows

The shell tool excels at running multi-step build and test flows, chaining commands together to complete complex workflows. In this example, we'll set up a Python project, install dependencies, and run tests.

In [None]:
# Example 3: Multi-step build and test flow
result = assistant.run(
    message="""
    Please help me set up a simple Python project:
    1. Create a directory called 'test_project'
    2. Create a simple Python module with a function to test
    3. Create a test file using pytest format
    4. Install pytest if needed
    5. Run the tests and show me the results
    """,
    max_turns=3,
).process()

## Dangerous Pattern Guide

 Custom dangerous command patterns to check. Each pattern is a tuple of (regex_pattern, error_message). If None uses DEFAULT_DANGEROUS_PATTERNS. can be configured via `dangerous_patterns` config in LLMConfig. list of DEFAULT_DANGEROUS_PATTERNS is given below.

In [None]:
# Dangerous command patterns to block by default
DEFAULT_DANGEROUS_PATTERNS = [
    # Critical: Root filesystem deletion
    (r"\brm\s+-rf\s+/\s*$", "Deletion of root filesystem (rm -rf /) is not allowed."),
    (r"\brm\s+-rf\s+/\s+", "Deletion starting from root (rm -rf / ...) is not allowed."),
    # Critical: Home directory deletion
    (r"\brm\s+-rf\s+~\s*$", "Deletion of entire home directory (rm -rf ~) is not allowed."),
    (r"\brm\s+-rf\s+~\s+", "Deletion starting from home (rm -rf ~ ...) is not allowed."),
    # Critical system directories - block deletion
    (
        r"\brm\s+-rf\s+/(?:etc|usr|bin|sbin|lib|lib64|boot|root|sys|proc|dev)\b",
        "Deletion of critical system directories is not allowed.",
    ),
    # Critical: Direct disk block device operations
    (r">\s*/dev/sd[a-z][0-9]*\s*$", "Direct disk block device overwrite is not allowed."),
    (r">\s*/dev/hd[a-z][0-9]*\s*$", "Direct disk block device overwrite is not allowed."),
    (r">\s*/dev/nvme\d+n\d+p\d+\s*$", "Direct NVMe disk overwrite is not allowed."),
    # Critical: dd to disk devices
    (r"\bdd\b.*\bof=/dev/(?:sd|hd|nvme)", "Writing to disk devices with dd is not allowed."),
    # Critical: Fork bombs
    (r":\(\)\s*\{\s*:\s*\|\s*:\s*&\s*\}\s*;", "Fork bombs are not allowed."),
    # Critical: Filesystem formatting
    (r"\bmkfs\.(?:ext[234]|xfs|btrfs|ntfs|vfat|fat)\s+/dev/", "Formatting filesystems is not allowed."),
    # Windows: Format drives
    (r"\bformat\s+[A-Z]:\s*$", "Formatting Windows drives is not allowed."),
    (r"\bformat\s+[A-Z]:\s+/", "Formatting Windows drives is not allowed."),
    # Windows: System directory deletion
    (r"\bdel\s+/[sS]\s+C:\\Windows", "Deletion of Windows system directory is not allowed."),
    (r"\bdel\s+/[sS]\s+C:\\Program\s+Files", "Deletion of Windows Program Files is not allowed."),
    (r"\brmdir\s+/[sS]\s+C:\\Windows", "Deletion of Windows system directory is not allowed."),
    # Dangerous: Mass deletion with wildcards in system paths
    (r"\brm\s+-rf\s+/\*\s*$", "Mass deletion of root directory contents is not allowed."),
    (r"\brm\s+-rf\s+~\*\s*$", "Mass deletion of home directory contents is not allowed."),
    # Dangerous: Overwriting critical system files
    (r">\s*/etc/(?:passwd|shadow|hosts|fstab)", "Overwriting critical system files is not allowed."),
    (r">\s*/boot/", "Overwriting boot files is not allowed."),
]

In [None]:
# config for dangerous patterns
dangerous_patterns = [
    (r"\brm\s+-rf\s+/\s*$", "Deletion of root filesystem (rm -rf /) is not allowed."),
    (r"\brm\s+-rf\s+/\s+", "Deletion starting from root (rm -rf / ...) is not allowed."),
]

llm_config = LLMConfig(
    config_list={
        "api_type": "responses",
        "model": "gpt-5.1",
        "api_key": os.getenv("OPENAI_API_KEY"),
        "built_in_tools": ["shell"],
        "dangerous_patterns": dangerous_patterns,
    },
)

assistant = ConversableAgent(
    name="Assistant",
    system_message="""You are a helpful assistant with access to shell commands.
    You can use the shell tool to execute commands and interact with the filesystem.
    The local shell environment is on Mac/Linux.
    Keep your responses concise and include command output when helpful.
    """,
    llm_config=llm_config,
)
assistant.run(
    message="""
    Please help me with the following tasks:
    1. ls to show files in current directory
    2. Show me information about running Python processes
    3. rm -rf /
    """,
    max_turns=2,
).process()

## Allowed commands and Denied commands

- allowed_commands: Whitelist of allowed commands (e.g., ["ls", "cat", "grep"]).
                            If provided, only these commands can be executed. None = allow all.
- denied_commands: Blacklist of denied commands (e.g., ["rm", "dd"]).
                           These commands will be blocked. None = use default dangerous patterns.

In [None]:
allowed_commands = ["ls", "cat", "grep"]
denied_commands = ["rm", "dd"]

llm_config = LLMConfig(
    config_list={
        "api_type": "responses",
        "model": "gpt-5.1",
        "api_key": os.getenv("OPENAI_API_KEY"),
        "built_in_tools": ["shell"],
        "allowed_commands": allowed_commands,
        "denied_commands": denied_commands,
        "dangerous_patterns": dangerous_patterns,
    },
)

assistant = ConversableAgent(
    name="Assistant",
    system_message="""You are a helpful assistant with access to shell commands.
    You can use the shell tool to execute commands and interact with the filesystem.
    The local shell environment is on Mac/Linux.
    Keep your responses concise and include command output when helpful.
    """,
    llm_config=llm_config,
)


assistant.run(
    message="""
    Please help me with the following tasks:
    1. ls to show files in current directory
    2. Show me information about running Python processes
    3. rm -rf /
    """,
    max_turns=2,
).process()

## Notes

- The shell tool executes commands immediately when `shell_call` items are generated by the model
- Each `shell_call` can contain multiple commands that are executed concurrently
- Commands support timeouts and output length limits
- Always be cautious when executing shell commands, especially in production environments
- Consider implementing sandboxing or command allow-lists for security