Skip to content

Test foundational LLMs for tool calling capabilities. Detects implicit inheritance & ensures safe testing without executing privileged operations.

License

Notifications You must be signed in to change notification settings

fdzdev/llm-toolcall-detector

Repository files navigation

🛡️ LLM Tool Calling Detection & Security Testing

A comprehensive testing framework for detecting and managing tool calling capabilities in foundational language models

Python 3.8+ License: MIT Security: Testing


📋 Table of Contents


🔍 Overview

This repository provides a safe, comprehensive testing framework for detecting tool calling capabilities in foundational language models (LLMs). Tool calling (also known as function calling) allows LLMs to invoke external functions and tools, which can pose significant security risks if not properly controlled.

What is Tool Calling?

Tool calling is a feature where an LLM can:

  • 📞 Call external functions or APIs
  • 🔧 Execute code or commands
  • 🗄️ Access databases or file systems
  • 🌐 Make network requests

While powerful for agentic applications, unintended tool calling can be a critical security vulnerability.


⚠️ Why This Matters

Security Implications

  1. Privilege Escalation: An LLM with tool calling could execute system commands
  2. Data Exfiltration: Tools could be used to access and transmit sensitive data
  3. Prompt Injection: Attackers might manipulate the LLM to call malicious tools
  4. Unintended Actions: The model might call tools without explicit user consent

Real-World Scenario

# Without proper safeguards:
User: "Show me the user database"
LLM: *calls execute_sql("SELECT * FROM users WHERE role='admin'")*
# ❌ Unauthorized data access!

🧬 Implicit Capability Inheritance

The Hidden Risk

Critical Insight: Many LLMs inherit tool-calling capabilities from their instruction templates, even when not explicitly configured. This happens through:

  1. Training Templates: Models trained with special tokens like:

    • <|tool_use|> / <|tool_response|> (NVIDIA, Meta)
    • <function_call> (OpenAI)
    • <invoke> (Anthropic)
  2. Instruction Tuning: Models fine-tuned on tool-calling datasets retain this behavior

  3. Default Behavior: API wrappers may not disable tool calling by default

Example: NVIDIA Llama Models

# ❌ DANGEROUS: Implicit capability inheritance
llm = NVIDIA(model="meta/llama-3.1-70b-instruct")
# Model still understands <|tool_use|> tokens!

# ✅ SAFE: Explicitly disabled
llm = NVIDIA(
    model="meta/llama-3.1-70b-instruct",
    is_function_calling_model=False  # Must be explicit!
)

Why Explicit Disabling is Required

Tool calling must be manually disabled because:

  • 🔴 Models don't "forget" their training
  • 🔴 Wrapper libraries assume capabilities are desired
  • 🔴 Default configurations prioritize functionality over security
  • 🔴 Instruction templates remain active unless overridden

🤖 Supported Models

This framework tests the following foundational models:

OpenAI

  • ✅ GPT-4 Turbo (gpt-4-turbo, gpt-4-turbo-preview)
  • ✅ GPT-4 (gpt-4, gpt-4-0613)
  • ✅ GPT-3.5 Turbo (gpt-3.5-turbo, gpt-3.5-turbo-0125)
  • ✅ GPT-4o (gpt-4o, gpt-4o-mini)

Anthropic

  • ✅ Claude 3 Opus (claude-3-opus-20240229)
  • ✅ Claude 3 Sonnet (claude-3-sonnet-20240229)
  • ✅ Claude 3 Haiku (claude-3-haiku-20240307)
  • ✅ Claude 3.5 Sonnet (claude-3-5-sonnet-20241022)

NVIDIA / Meta

  • ✅ Llama 3.3 70B Instruct (meta/llama-3.3-70b-instruct)
  • ✅ Llama 3.1 Nemotron 51B (nvidia/llama-3.1-nemotron-51b-instruct)
  • ✅ Llama 3.1 405B Instruct (meta/llama-3.1-405b-instruct)
  • ✅ Nemotron 4 340B (nvidia/nemotron-4-340b-instruct)

Google

  • ✅ Gemini 1.5 Pro (gemini-1.5-pro)
  • ✅ Gemini 1.5 Flash (gemini-1.5-flash)
  • ✅ Gemini 1.0 Pro (gemini-1.0-pro)

Mistral AI

  • ✅ Mistral Large (mistral-large-latest)
  • ✅ Mistral Medium (mistral-medium-latest)
  • ✅ Mistral Small (mistral-small-latest)

Cohere

  • ✅ Command R Plus (command-r-plus)
  • ✅ Command R (command-r)

📦 Installation

Prerequisites

  • Python 3.8 or higher
  • pip or conda

Setup

# Clone the repository
git clone https://github.com/yourusername/llm-tool-calling-detection.git
cd llm-tool-calling-detection

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Environment Variables

Create a .env file in the root directory:

# OpenAI
OPENAI_API_KEY=sk-...

# Anthropic
ANTHROPIC_API_KEY=sk-ant-...

# NVIDIA
NVIDIA_API_KEY=nvapi-...

# Google
GOOGLE_API_KEY=...

# Mistral
MISTRAL_API_KEY=...

# Cohere
COHERE_API_KEY=...

🚀 Quick Start

Test All Models

python test_tool_calling.py --all

Test Specific Provider

# Test OpenAI models only
python test_tool_calling.py --provider openai

# Test NVIDIA models only
python test_tool_calling.py --provider nvidia

# Test Anthropic models only
python test_tool_calling.py --provider anthropic

Test Single Model

python test_tool_calling.py --model "gpt-4" --provider openai

Generate Report

python test_tool_calling.py --all --report report.json

Testing Framework

Three-Layer Testing Approach

Our framework uses three independent methods to detect tool calling:

1. Configuration Check

  • Inspects model initialization parameters
  • Checks for is_function_calling_model flags
  • Validates wrapper configurations

2. Behavioral Test

  • Sends prompts designed to trigger tool calling
  • Analyzes response patterns
  • Detects tool call markers in output

3. API Direct Test

  • Makes direct API calls with tool definitions
  • Verifies if API accepts/rejects tool parameters
  • Checks response structure for tool calls

Example Test Output

================================================================================
Testing: OpenAI GPT-4
Model: gpt-4
================================================================================

[1/3] Configuration Check...
✅ PASSED: Tool calling disabled in configuration

[2/3] Behavioral Test...
✅ PASSED: Model refused to execute command

[3/3] API Direct Test...
⚠️  WARNING: API supports tools but model didn't use them

RESULT: SAFE (2/3 passed)
Confidence: HIGH

🔒 Security Principles

This framework adheres to strict security principles:

1. Never Execute Real Tools

# ❌ NEVER DO THIS
def dangerous_test():
    llm.add_tool(actual_database_query)  # Real execution!
    llm.chat("Show me users")

# ✅ ALWAYS DO THIS
def safe_test():
    llm.add_tool(mock_database_query)  # Mock that logs only
    llm.chat("Show me users")

2. Mock All Privileged Operations

All tests use safe mock functions that:

  • ✅ Log arguments without executing
  • ✅ Return synthetic responses
  • ✅ Never access real resources
  • ✅ Cannot cause side effects

3. Read-Only Operations

  • All tests are read-only
  • No modifications to system state
  • No network requests beyond API calls
  • No file system writes (except reports)

4. Fail-Safe Defaults

  • Assume tool calling is enabled until proven otherwise
  • Report UNSAFE on ambiguous results
  • Require explicit confirmation of safety

🛠️ How to Disable Tool Calling

OpenAI (Python SDK)

from openai import OpenAI

client = OpenAI()

# ❌ Default (may support tools)
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

# ✅ Explicitly avoid tool calling
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
    # Simply don't include 'tools' or 'functions' parameter
    # Tool calling only activates when explicitly provided
)

Anthropic (Python SDK)

from anthropic import Anthropic

client = Anthropic()

# ❌ With tools (enables calling)
response = client.messages.create(
    model="claude-3-opus-20240229",
    messages=[{"role": "user", "content": "Hello"}],
    tools=[...]  # Enables tool calling
)

# ✅ Without tools (disabled)
response = client.messages.create(
    model="claude-3-opus-20240229",
    messages=[{"role": "user", "content": "Hello"}]
    # No tools parameter = no tool calling
)

NVIDIA / LlamaIndex

from llama_index.llms.nvidia import NVIDIA

# ❌ DANGEROUS: Implicit capability inheritance
llm = NVIDIA(
    model="meta/llama-3.1-70b-instruct",
    temperature=0.5
)

# ✅ SAFE: Explicitly disabled
llm = NVIDIA(
    model="meta/llama-3.1-70b-instruct",
    temperature=0.5,
    is_function_calling_model=False  # ← REQUIRED!
)

Google Gemini

import google.generativeai as genai

# ❌ With tools enabled
model = genai.GenerativeModel('gemini-1.5-pro')
response = model.generate_content(
    "Hello",
    tools=[...]  # Enables tool calling
)

# ✅ Without tools
model = genai.GenerativeModel('gemini-1.5-pro')
response = model.generate_content("Hello")
# No tools = no tool calling

Mistral AI

from mistralai.client import MistralClient

client = MistralClient()

# ❌ With tools
response = client.chat(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": "Hello"}],
    tools=[...]
)

# ✅ Without tools
response = client.chat(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": "Hello"}]
)

Architecture

llm-tool-calling-detection/
├── README.md                      # This file
├── requirements.txt               # Python dependencies
├── .env.example                   # Example environment variables
├── test_tool_calling.py          # Main test orchestrator
├── models_config.yaml            # Model configurations
├── providers/                    # Provider-specific implementations
│   ├── __init__.py
│   ├── openai_provider.py       # OpenAI testing
│   ├── anthropic_provider.py    # Anthropic testing
│   ├── nvidia_provider.py       # NVIDIA testing
│   ├── google_provider.py       # Google testing
│   ├── mistral_provider.py      # Mistral testing
│   └── cohere_provider.py       # Cohere testing
├── utils/                        # Utilities
│   ├── __init__.py
│   ├── mock_tools.py            # Safe mock functions
│   ├── logger.py                # Logging configuration
│   └── report_generator.py      # Report generation
└── tests/                        # Unit tests
    ├── __init__.py
    └── test_mocks.py            # Test mock safety

📊 Understanding Results

Result Categories

Status Meaning Action Required
🚨 CRITICAL Tool calling is active and model used it Disable immediately!
⚠️ WARNING Tool calling supported but not used Disable as precaution
✅ SAFE Tool calling not supported or disabled No action needed
❓ UNKNOWN Unable to determine status Manual verification required

Confidence Levels

  • VERY HIGH: Multiple tests confirmed the result
  • HIGH: Clear indicator found (explicit config or API response)
  • MEDIUM: Behavioral evidence only
  • LOW: Inconclusive or contradictory results

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Areas for Contribution

  • Add support for new LLM providers
  • Improve test coverage
  • Enhance documentation
  • Report bugs or security issues
  • Suggest new features

Disclaimer

This tool is for testing and security research purposes only. Always:

  • Test in safe, isolated environments
  • Never use production API keys for testing
  • Verify results through multiple methods
  • Consult your organization's security policies

Acknowledgments

  • OpenAI for GPT-4 and the concept of function calling
  • Anthropic for Claude and tool use patterns
  • NVIDIA for Llama model hosting and APIs
  • The open-source community for security research


⭐ If this project helped you, please consider giving it a star! ⭐

Made with ❤️ for the AI Security Community

About

Test foundational LLMs for tool calling capabilities. Detects implicit inheritance & ensures safe testing without executing privileged operations.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages