🛡️ LLM Tool Calling Detection & Security Testing

A comprehensive testing framework for detecting and managing tool calling capabilities in foundational language models

📋 Table of Contents

Overview
Why This Matters
Implicit Capability Inheritance
Supported Models
Installation
Quick Start
Testing Framework
Security Principles
How to Disable Tool Calling
Architecture
Contributing

🔍 Overview

This repository provides a safe, comprehensive testing framework for detecting tool calling capabilities in foundational language models (LLMs). Tool calling (also known as function calling) allows LLMs to invoke external functions and tools, which can pose significant security risks if not properly controlled.

What is Tool Calling?

Tool calling is a feature where an LLM can:

📞 Call external functions or APIs
🔧 Execute code or commands
🗄️ Access databases or file systems
🌐 Make network requests

While powerful for agentic applications, unintended tool calling can be a critical security vulnerability.

⚠️ Why This Matters

Security Implications

Privilege Escalation: An LLM with tool calling could execute system commands
Data Exfiltration: Tools could be used to access and transmit sensitive data
Prompt Injection: Attackers might manipulate the LLM to call malicious tools
Unintended Actions: The model might call tools without explicit user consent

Real-World Scenario

# Without proper safeguards:
User: "Show me the user database"
LLM: *calls execute_sql("SELECT * FROM users WHERE role='admin'")*
# ❌ Unauthorized data access!

🧬 Implicit Capability Inheritance

The Hidden Risk

Critical Insight: Many LLMs inherit tool-calling capabilities from their instruction templates, even when not explicitly configured. This happens through:

Training Templates: Models trained with special tokens like:
- <|tool_use|> / <|tool_response|> (NVIDIA, Meta)
- <function_call> (OpenAI)
- <invoke> (Anthropic)
Instruction Tuning: Models fine-tuned on tool-calling datasets retain this behavior
Default Behavior: API wrappers may not disable tool calling by default

Example: NVIDIA Llama Models

# ❌ DANGEROUS: Implicit capability inheritance
llm = NVIDIA(model="meta/llama-3.1-70b-instruct")
# Model still understands <|tool_use|> tokens!

# ✅ SAFE: Explicitly disabled
llm = NVIDIA(
    model="meta/llama-3.1-70b-instruct",
    is_function_calling_model=False  # Must be explicit!
)

Why Explicit Disabling is Required

Tool calling must be manually disabled because:

🔴 Models don't "forget" their training
🔴 Wrapper libraries assume capabilities are desired
🔴 Default configurations prioritize functionality over security
🔴 Instruction templates remain active unless overridden

🤖 Supported Models

This framework tests the following foundational models:

OpenAI

✅ GPT-4 Turbo (gpt-4-turbo, gpt-4-turbo-preview)
✅ GPT-4 (gpt-4, gpt-4-0613)
✅ GPT-3.5 Turbo (gpt-3.5-turbo, gpt-3.5-turbo-0125)
✅ GPT-4o (gpt-4o, gpt-4o-mini)

Anthropic

✅ Claude 3 Opus (claude-3-opus-20240229)
✅ Claude 3 Sonnet (claude-3-sonnet-20240229)
✅ Claude 3 Haiku (claude-3-haiku-20240307)
✅ Claude 3.5 Sonnet (claude-3-5-sonnet-20241022)

NVIDIA / Meta

✅ Llama 3.3 70B Instruct (meta/llama-3.3-70b-instruct)
✅ Llama 3.1 Nemotron 51B (nvidia/llama-3.1-nemotron-51b-instruct)
✅ Llama 3.1 405B Instruct (meta/llama-3.1-405b-instruct)
✅ Nemotron 4 340B (nvidia/nemotron-4-340b-instruct)

Google

✅ Gemini 1.5 Pro (gemini-1.5-pro)
✅ Gemini 1.5 Flash (gemini-1.5-flash)
✅ Gemini 1.0 Pro (gemini-1.0-pro)

Mistral AI

✅ Mistral Large (mistral-large-latest)
✅ Mistral Medium (mistral-medium-latest)
✅ Mistral Small (mistral-small-latest)

Cohere

✅ Command R Plus (command-r-plus)
✅ Command R (command-r)

📦 Installation

Prerequisites

Python 3.8 or higher
pip or conda

Setup

# Clone the repository
git clone https://github.com/yourusername/llm-tool-calling-detection.git
cd llm-tool-calling-detection

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Environment Variables

Create a .env file in the root directory:

# OpenAI
OPENAI_API_KEY=sk-...

# Anthropic
ANTHROPIC_API_KEY=sk-ant-...

# NVIDIA
NVIDIA_API_KEY=nvapi-...

# Google
GOOGLE_API_KEY=...

# Mistral
MISTRAL_API_KEY=...

# Cohere
COHERE_API_KEY=...

🚀 Quick Start

Test All Models

python test_tool_calling.py --all

Test Specific Provider

# Test OpenAI models only
python test_tool_calling.py --provider openai

# Test NVIDIA models only
python test_tool_calling.py --provider nvidia

# Test Anthropic models only
python test_tool_calling.py --provider anthropic

Test Single Model

python test_tool_calling.py --model "gpt-4" --provider openai

Generate Report

python test_tool_calling.py --all --report report.json

Testing Framework

Three-Layer Testing Approach

Our framework uses three independent methods to detect tool calling:

1. Configuration Check

Inspects model initialization parameters
Checks for is_function_calling_model flags
Validates wrapper configurations

2. Behavioral Test

Sends prompts designed to trigger tool calling
Analyzes response patterns
Detects tool call markers in output

3. API Direct Test

Makes direct API calls with tool definitions
Verifies if API accepts/rejects tool parameters
Checks response structure for tool calls

Example Test Output

================================================================================
Testing: OpenAI GPT-4
Model: gpt-4
================================================================================

[1/3] Configuration Check...
✅ PASSED: Tool calling disabled in configuration

[2/3] Behavioral Test...
✅ PASSED: Model refused to execute command

[3/3] API Direct Test...
⚠️  WARNING: API supports tools but model didn't use them

RESULT: SAFE (2/3 passed)
Confidence: HIGH

🔒 Security Principles

This framework adheres to strict security principles:

1. Never Execute Real Tools

# ❌ NEVER DO THIS
def dangerous_test():
    llm.add_tool(actual_database_query)  # Real execution!
    llm.chat("Show me users")

# ✅ ALWAYS DO THIS
def safe_test():
    llm.add_tool(mock_database_query)  # Mock that logs only
    llm.chat("Show me users")

2. Mock All Privileged Operations

All tests use safe mock functions that:

✅ Log arguments without executing
✅ Return synthetic responses
✅ Never access real resources
✅ Cannot cause side effects

3. Read-Only Operations

All tests are read-only
No modifications to system state
No network requests beyond API calls
No file system writes (except reports)

4. Fail-Safe Defaults

Assume tool calling is enabled until proven otherwise
Report UNSAFE on ambiguous results
Require explicit confirmation of safety

🛠️ How to Disable Tool Calling

OpenAI (Python SDK)

from openai import OpenAI

client = OpenAI()

# ❌ Default (may support tools)
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}]
)

# ✅ Explicitly avoid tool calling
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello"}],
    # Simply don't include 'tools' or 'functions' parameter
    # Tool calling only activates when explicitly provided
)

Anthropic (Python SDK)

from anthropic import Anthropic

client = Anthropic()

# ❌ With tools (enables calling)
response = client.messages.create(
    model="claude-3-opus-20240229",
    messages=[{"role": "user", "content": "Hello"}],
    tools=[...]  # Enables tool calling
)

# ✅ Without tools (disabled)
response = client.messages.create(
    model="claude-3-opus-20240229",
    messages=[{"role": "user", "content": "Hello"}]
    # No tools parameter = no tool calling
)

NVIDIA / LlamaIndex

from llama_index.llms.nvidia import NVIDIA

# ❌ DANGEROUS: Implicit capability inheritance
llm = NVIDIA(
    model="meta/llama-3.1-70b-instruct",
    temperature=0.5
)

# ✅ SAFE: Explicitly disabled
llm = NVIDIA(
    model="meta/llama-3.1-70b-instruct",
    temperature=0.5,
    is_function_calling_model=False  # ← REQUIRED!
)

Google Gemini

import google.generativeai as genai

# ❌ With tools enabled
model = genai.GenerativeModel('gemini-1.5-pro')
response = model.generate_content(
    "Hello",
    tools=[...]  # Enables tool calling
)

# ✅ Without tools
model = genai.GenerativeModel('gemini-1.5-pro')
response = model.generate_content("Hello")
# No tools = no tool calling

Mistral AI

from mistralai.client import MistralClient

client = MistralClient()

# ❌ With tools
response = client.chat(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": "Hello"}],
    tools=[...]
)

# ✅ Without tools
response = client.chat(
    model="mistral-large-latest",
    messages=[{"role": "user", "content": "Hello"}]
)

Architecture

llm-tool-calling-detection/
├── README.md                      # This file
├── requirements.txt               # Python dependencies
├── .env.example                   # Example environment variables
├── test_tool_calling.py          # Main test orchestrator
├── models_config.yaml            # Model configurations
├── providers/                    # Provider-specific implementations
│   ├── __init__.py
│   ├── openai_provider.py       # OpenAI testing
│   ├── anthropic_provider.py    # Anthropic testing
│   ├── nvidia_provider.py       # NVIDIA testing
│   ├── google_provider.py       # Google testing
│   ├── mistral_provider.py      # Mistral testing
│   └── cohere_provider.py       # Cohere testing
├── utils/                        # Utilities
│   ├── __init__.py
│   ├── mock_tools.py            # Safe mock functions
│   ├── logger.py                # Logging configuration
│   └── report_generator.py      # Report generation
└── tests/                        # Unit tests
    ├── __init__.py
    └── test_mocks.py            # Test mock safety

📊 Understanding Results

Result Categories

Status	Meaning	Action Required
🚨 CRITICAL	Tool calling is active and model used it	Disable immediately!
⚠️ WARNING	Tool calling supported but not used	Disable as precaution
✅ SAFE	Tool calling not supported or disabled	No action needed
❓ UNKNOWN	Unable to determine status	Manual verification required

Confidence Levels

VERY HIGH: Multiple tests confirmed the result
HIGH: Clear indicator found (explicit config or API response)
MEDIUM: Behavioral evidence only
LOW: Inconclusive or contradictory results

🤝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Areas for Contribution

Add support for new LLM providers
Improve test coverage
Enhance documentation
Report bugs or security issues
Suggest new features

Disclaimer

This tool is for testing and security research purposes only. Always:

Test in safe, isolated environments
Never use production API keys for testing
Verify results through multiple methods
Consult your organization's security policies

Acknowledgments

OpenAI for GPT-4 and the concept of function calling
Anthropic for Claude and tool use patterns
NVIDIA for Llama model hosting and APIs
The open-source community for security research

⭐ If this project helped you, please consider giving it a star! ⭐

Made with ❤️ for the AI Security Community

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
providers		providers
utils		utils
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
example_usage.py		example_usage.py
models_config.yaml		models_config.yaml
test_tool_calling.py		test_tool_calling.py

License

fdzdev/llm-toolcall-detector

Folders and files

Latest commit

History

Repository files navigation

🛡️ LLM Tool Calling Detection & Security Testing

📋 Table of Contents

🔍 Overview

What is Tool Calling?

⚠️ Why This Matters

Security Implications

Real-World Scenario

🧬 Implicit Capability Inheritance

The Hidden Risk

Example: NVIDIA Llama Models

Why Explicit Disabling is Required

🤖 Supported Models

OpenAI

Anthropic

NVIDIA / Meta

Google

Mistral AI

Cohere

📦 Installation

Prerequisites

Setup

Environment Variables

🚀 Quick Start

Test All Models

Test Specific Provider

Test Single Model

Generate Report

Testing Framework

Three-Layer Testing Approach

1. Configuration Check

2. Behavioral Test

3. API Direct Test

Example Test Output

🔒 Security Principles

1. Never Execute Real Tools

2. Mock All Privileged Operations

3. Read-Only Operations

4. Fail-Safe Defaults

🛠️ How to Disable Tool Calling

OpenAI (Python SDK)

Anthropic (Python SDK)

NVIDIA / LlamaIndex

Google Gemini

Mistral AI

Architecture

📊 Understanding Results

Result Categories

Confidence Levels

🤝 Contributing

Areas for Contribution

Disclaimer

Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages