SocialReasoningBench

Evaluate the social reasoning capabilities of LLM agents in multi-party environments.

Install

Requires Python 3.11+ and uv.

git clone https://github.com/microsoft/social-reasoning-bench.git srbench
cd srbench
uv sync --all-packages --all-groups --all-extras
source .venv/bin/activate

Usage

Evaluate the social reasoning ability of your own LLM. For example's sake, we'll assume your LLM is served as my-model via an OpenAI compatible endpoint at http://localhost:8000.

# To reproduce our results use Gemini as the counterparty.
GEMINI_API_KEY=<your api key>

# Run the v0.1.0 experiment sweep with your model as the assistant
srbench experiment experiments/v0.1.0 \
    --output-base outputs/my-model
    --assistant-model openai/my-model \
    --assistant-base-url http://localhost:8000/v1 \
    --assistant-api-key none
    # To just test a few examples per experiment in the sweep
    # --set limit=10

# View the results
srbench dashboard outputs/my-model

See Installation, Experiments, and LLMs for detailed instructions.

Name		Name	Last commit message	Last commit date
Latest commit History 483 Commits
.github		.github
data		data
docs		docs
experiments		experiments
outputs		outputs
packages		packages
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
example.env		example.env
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SocialReasoningBench

Install

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SocialReasoningBench

Install

Usage

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages