SafePro: Evaluating the Safety of Professional-Level AI Agents

Introduction

This repo is based on a fork of OpenHands. Please follow the instruction here to set up the environment for OpenHands, and LLM info in config.toml.

The SafePro dataset is here. Follow the instructions here to test LLM agents on SafePro and get the safety evaluation results.

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 5,435 Commits
.devcontainer		.devcontainer
.github		.github
.openhands		.openhands
.vscode		.vscode
containers		containers
dev_config/python		dev_config/python
enterprise		enterprise
evaluation		evaluation
figures		figures
frontend		frontend
kind		kind
microagents		microagents
openhands-cli		openhands-cli
openhands-ui		openhands-ui
openhands		openhands
scripts		scripts
tests		tests
third_party		third_party
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.nvmrc		.nvmrc
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
COMMUNITY.md		COMMUNITY.md
CONTRIBUTING.md		CONTRIBUTING.md
CREDITS.md		CREDITS.md
Development.md		Development.md
ISSUE_TRIAGE.md		ISSUE_TRIAGE.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
build.sh		build.sh
build_vscode.py		build_vscode.py
config.template.toml		config.template.toml
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pydoc-markdown.yml		pydoc-markdown.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
safepro_v0.json		safepro_v0.json
trigger_commit.txt		trigger_commit.txt