Sotopia-π: Interactive Learning of Socially Intelligent Language Agents

📢 Release

[05/01] 🎆Our custome model, Sotopia-Pi, is available for demo, thanks to Hugging Face ZeroGPU.
[03/14] 🎆We released our paper on arXiv on 3/14 PI day and the paper was reported by AK on twitter (here).
[03/07] 🔥We released our model checkpoints (BC, SR, BC+SR) on huggingface (BC model, SR model, BC+SR model).
[03/04] 📊We released our social converation data on huggingface (here).

Introduction

We introduce Sotopia-π, a method that improves the social intelligence of large language models (LLMs) through social interaction. The method involves three steps: (1) automatically generates new social tasks, (2) collects data from both expert policy and agent policy for training, and (3) updates agent policy based on positive data rated by GPT-4. The training and evaluation environment is based on the Sotopia framework.

Step 0 - Preparations

Install dependencies:

pip install -r requirements.txt
pip install flash-attn --no-build-isolation

Set up OpenAI API key in conda environment

conda env config vars set OPENAI_API_KEY=api_key

A Redis database needs to be set up prior to running this repo. For detailed instructions of setting up Redis database, please refer to this tutorial. Make sure to set up Redis OM url in conda environment
```
conda env config vars set REDIS_OM_URL="redis://user:password@host:port"
```

Step 1 - Social Task Generation

The first step is to generate synthesized social tasks by sampling keywords from datasets and prompting GPT-4 Turbo to generate corresponding social tasks. For detailed implementation, please refer to this section.

Step 2 - Training Data Collection

The second step is to collect data from expert (GPT-4 vs. GPT-4) as behavior cloning trajectories and from self (our model vs. our model) as self-reinforcement trajectories. To collect behavior cloning data, run

cd data_generate
python3 generate_conversations.py --eval-script scripts/eval_sft.sh --env-file env_files/used_env.json --experiment-name your_exp_name --tag your_tag --agent1-model gpt-4 --agent2-model gpt-4 --push-to-db True

To collect self-reinforcement data, run

cd data_generate
python3 generate_conversations.py --eval-script scripts/eval_sft.sh --env-file env_files/used_env.json --experiment-name your_exp_name --tag your_tag --agent1-model custom_model --agent2-model custom_model --push-to-db True

For detailed implementation, please refer to this section

Step 3 - Agent Policy Update

This step requires (1) filter the collected conversation data based on GPT-4 ratings and (2) update the LLM's policy through fine-tuning.

We filter data following this pipeline and reformat data into training format.
We fine-tune the model based on Llama Factory. Please follow this section to implement QLoRA fine-tuning.

Step 4a - Automatic Evaluation

We first deploy the trained model on a server and inference the model via OpenAI API. See this section for detailed instructions of deploying a model via FastChat and vllm.
Then we evaluate our model based on the Sotopia framework. Please refer to this section and the Sotopia repo for more details.

Step 4b - Human Evaluation

We develop a personalized project based on oTree and release the human evaluation project via Prolific.
Detailed instruction on reproducing human evaluation is mentioned here.

Citation

@misc{wang2024sotopiapi,
title={SOTOPIA-$\pi$: Interactive Learning of Socially Intelligent Language Agents},
author={Ruiyi Wang and Haofei Yu and Wenxin Zhang and Zhengyang Qi and Maarten Sap and Graham Neubig and Yonatan Bisk and Hao Zhu},
year={2024},
eprint={2403.08715},
archivePrefix={arXiv},
primaryClass={cs.CL}
}

Name		Name	Last commit message	Last commit date
Latest commit History 147 Commits
.github		.github
assets		assets
data_generate		data_generate
data_process		data_process
human_eval		human_eval
llm_deploy		llm_deploy
llm_rl		llm_rl
llm_self_train		llm_self_train
plot_code		plot_code
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sotopia-π: Interactive Learning of Socially Intelligent Language Agents

📢 Release

📌 Table of Contents

Introduction

Step 0 - Preparations

Step 1 - Social Task Generation

Step 2 - Training Data Collection

Step 3 - Agent Policy Update

Step 4a - Automatic Evaluation

Step 4b - Human Evaluation

Citation

About

Releases

Packages

Contributors 91

Languages

License

sotopia-lab/sotopia-pi

Folders and files

Latest commit

History

Repository files navigation

Sotopia-π: Interactive Learning of Socially Intelligent Language Agents

📢 Release

📌 Table of Contents

Introduction

Step 0 - Preparations

Step 1 - Social Task Generation

Step 2 - Training Data Collection

Step 3 - Agent Policy Update

Step 4a - Automatic Evaluation

Step 4b - Human Evaluation

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 91

Languages

Packages