This is the source code for the Honeyval evaluation framework. The goal of the framework is to provide a systematic evaluation environment for LLM-powered HTTP honeypots. For more details, please read our paper.
🌍 Website & Leaderboard: https://honeyval.xyz/
This is not an officially supported Google product. This project is not eligible for the Google Open Source Software Vulnerability Rewards Program.
The project requires docker to be run. Please follow an installation guide fitting to your system here.
The project uses miniconda for package management. Install it like this:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod +x Miniconda3-latest-Linux-x86_64.sh
bash ./Miniconda3-latest-Linux-x86_64.shOnce installed, make sure that conda is initialized. Then, install the environment of this project:
conda env create -f env.yaml
conda activate honeyval
pip install litellm[proxy]Then, a tiny little hack to use the src package:
# with the project environment activated
pip install -e .Finally, to access the model APIs for running experiments, populate the following environment variables:
# we recommend using vertex for gemini models
export GOOGLE_CLOUD_PROJECT=""
export GOOGLE_APPLICATION_CREDENTIALS=""
# alternatively, you can use a gemini api key
export GEMINI_API_KEY=""
# other providers
export OPENAI_API_KEY=""
export TOGETHERAI_API_KEY=""
export ANTHROPIC_API_KEY=""src/main.py is the CLI entry point for running benchmarks and evaluating their
results. Run it from the repository root after activating the conda environment
and installing the package in editable mode.
python src/main.py \
--mode run \
--meta_experiment_type llm-honeypot-vs-agent \
--pentest_model vertex_ai/gemini-3-flash-preview-python-curl \
--honeypot_type llm \
--honeypot_model vertex_ai/gemini-3-flash-preview \
--benchmark_tasks agent-vs-http-app \
--n_samples 5Evaluate a completed run by reusing the same identifying options and changing the mode:
python src/main.py \
--mode evaluate \
--meta_experiment_type llm-honeypot-vs-agent \
--pentest_model vertex_ai/gemini-3-flash-preview-python-curl \
--honeypot_type llm \
--honeypot_model vertex_ai/gemini-3-flash-preview \
--benchmark_tasks agent-vs-http-app \
--n_samples 5 \
--skip_incompleteUseful options:
--mode:run,run-only-metrics, orevaluate. This is required.--meta_experiment_type: names the experiment group underresults/.--benchmark_tasks: one or more ofagent-vs-http-app,agent-vs-real,static-vs-http-app, orall.--benchmark_apps: one or more benchmark app names, orall.--pentest_model: pentesting model. Suffix it with-python-curl,-python,-gemini-cli,-codex, or-claude-codeto select that agent backend.--honeypot_type:llmorrule-based. Forllm, also set--honeypot_model.--pentesting_prompt:exploit,exploit-detect, orexploit-detect-hide.--honeypot_additional_instructions:none,careful_pi,aggressive_pi,mislead,convince, orvulnerable.--pi_judgeand--refusal_judge:none,llm, orheuristic-llm.--n_samples,--max_workers,--starting_port, and--forcecontrol sampling, parallelism, port allocation, and reruns.--pentest_timeout,--pentest_max_steps, cost limits, rate limits, temperatures, and reasoning efforts tune agent and honeypot execution.--ci_target,--target_password, and--attacker_domainconfigure the callback checks used by relevant benchmark apps.
Runs are written to
results/<meta_experiment_type>/<pentest>-<prompt>-<honeypot>-<instructions>/.
The scripts in scripts/experiments_paper/ contain the exact configurations
used for the paper experiments.
To reproduce the results of the paper, run the scripts in scripts/experiments_paper. Before running, we recommend building all docker images by running python scripts/build_all_dockerimages.py.
To add your own honeypot to the evaluation, please implement the abstract interface src/http_apps/base_http_app.py in src.
See CONTRIBUTING.md for contribution guidelines.
This repository uses pre-commit hooks. The project conda environment installs
the pre-commit package; if you are not using that environment, install it
with pip install pre-commit. Then install the hooks:
pre-commit installSee LICENSE.md
