BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents

News

[APR. 2024] Check new benchmark implementation with AgentLite.
[March 2024] Our paper BOLAA is accepted to LLMAgent@ICLR workshop!
[Aug. 2023] Initial Release of BOLAA paper and implementation code!

Introduction

This is the repo for BOLAA paper. In this paper, we create benchmark on LLM-augmented Autonoumous Agents (LAA). We compare 6 different LAA architecture, including 5 existing intuitions and 1 new BOLAA agent. And all those agents are paired with different LLMs to compare the performance. BOLAA is able to communicate and orchestrate multiple specialitist agents: We tested on two types of enviroments: the webshop navigation environment, and HotPotQA enviroment. An example of the BOLAA web agent simulation on webshop enviroment is:

Besides BOLAA arch, we also devise five standard LAA arches, the Zeroshot (ZS), Zeroshot-Think (ZST), ReAct, PlanAct, PlanReAct as follows:

Installation

Setup the fastchat to use local open-source LLMs. Go to next step if you only test openai API.
Setup OPENAI API KEY in both webrun/config and hotpotqa_run/config. Skip this if you only test open-source LLMs.
Setup the webshop environment if you are testing web agent
Setup the agent_benchmarking environment as follows:

conda create -n agent_benchmark python=3.10 -y
conda activate agent_benchmark
pip install -r requirements.txt

Web Agent Simulation

python run_webagent.py --agent_name Search_Click_Control_Webrun_Agent --llm_name gpt-3.5-turbo --max_context_len 4000

other agent options can be found in test_webagent.sh. The implementation code for various web agents is in web_run

HotpotQA Agent Simulation

python run_hotpotqaagent.py --agent_name React_HotPotQA_run_Agent --llm_name gpt-3.5-turbo --max_context_len 4000

other agent options commands can be found in test_hotpotqa.sh. The implementation code for various web agents is in hotpotqa_run

Citation

If you find our paper or code useful, please cite

@misc{liu2023bolaa,
      title={BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents}, 
      author={Zhiwei Liu and Weiran Yao and Jianguo Zhang and Le Xue and Shelby Heinecke and Rithesh Murthy and Yihao Feng and Zeyuan Chen and Juan Carlos Niebles and Devansh Arpit and Ran Xu and Phil Mui and Huan Wang and Caiming Xiong and Silvio Savarese},
      year={2023},
      eprint={2308.05960},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

Acknowledge

Part of our environment code reuse ReAct code.
Our LLM API is based on Langchain
We use the WebShop and HotPotQA for testing.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
hotpotqa_run		hotpotqa_run
page		page
web_run		web_run
webshop		webshop
.gitignore		.gitignore
CODEOWNERS		CODEOWNERS
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.txt		LICENSE.txt
README.md		README.md
SECURITY.md		SECURITY.md
requirements.txt		requirements.txt
run_hotpotqaagent.py		run_hotpotqaagent.py
run_webagent.py		run_webagent.py
test_hotpotqa.sh		test_hotpotqa.sh
test_webagent.sh		test_webagent.sh

License

salesforce/BOLAA

Folders and files

Latest commit

History

Repository files navigation

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents

News

Introduction

Installation

Web Agent Simulation

HotpotQA Agent Simulation

Citation

Acknowledge

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages