Skip to content

fruitfoxlu/ToolCommander

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ToolCommander: Adversarial Tool Scheduling Framework

This repository contains the official implementation of the paper, "From Allies to Adversaries: Manipulating LLM Tool Scheduling through Adversarial Injection," submitted to NAACL 2025. The paper introduces ToolCommander, a novel framework that identifies and exploits vulnerabilities in the tool scheduling mechanisms of Large Language Model (LLM) agents. By leveraging adversarial tool injection, ToolCommander can lead to privacy theft, denial-of-service (DoS) attacks, and the manipulation of tool-calling behaviors.

Table of Contents


Data

The dataset used in this project is located in the data directory. The files follow this naming convention:

g1_<train/eval>_<a/b/c>.json

Where:

  • g1 refers to the original category from the ToolBench dataset.
  • train and eval denote the training and evaluation sets, respectively.
  • a, b, and c represent different keywords used to generate the data:
    • a: YouTube
    • b: Email
    • c: Stock

ToolBench Dataset

In addition to the provided data, you will need to download the ToolBench dataset from its official repository. Specifically, you will need the following components:

  • corpus.tsv
  • tools folder

Once downloaded, place the dataset in the data/toolbench directory. The final directory structure should look like this:

/data
├── toolbench
│   ├── corpus.tsv
│   └── tools
│       ├── ...
├── g1_train_a.json
├── g1_train_b.json
├── g1_train_c.json
├── g1_eval_a.json
├── g1_eval_b.json
├── g1_eval_c.json
└── ...

Prerequisites

To set up the environment, first install the required dependencies:

pip install -r requirements.txt

OpenAI API Setup

For evaluation using OpenAI's models, you need to set the OPENAI_API_KEY environment variable with your OpenAI API key. Detailed instructions can be found in the OpenAI API documentation.


Usage

We provide several scripts to help reproduce the results presented in the paper.

Running the Adversarial Attack

To execute the adversarial injection attack and evaluate the results, use the following command:

bash attack_all.sh && bash eval_all.sh
  • attack_all.sh: Executes the adversarial injection attack across all retrievers and datasets.
  • eval_all.sh: Evaluates the performance of the retrievers after the attack.

The results will be printed directly in the console.


Baselines

We compare ToolCommander against the PoisonedRAG baseline. For more details, visit the PoisonedRAG repository.

Baseline Data

The attack results generated by PoisonedRAG have been provided in the data directory as:

g1_train_{a/b/c}_poisonedRAG_generated.pkl

Baseline Evaluation

To evaluate the baseline performance, run the following command:

python evaluate.py --data_path data/g1_train_{a/b/c}.json --attack_path data/g1_train_{a/b/c}_poisonedRAG_generated.pkl

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 96.5%
  • Shell 3.5%