# AI Jam Hackathon: Multi-Agent systems

This document is a copy from the link below from the ChatArena. It is meant as a starting point for the hackathon team to re-create a scenario with multi-agents interacting about nuclear usage.

Changes from original:
- Adding new call for OpenAI API keys, otherwise code breaks.


Next steps:
- Create a custom environment that simulates a situation that could require the escalation of more sophisticated weopons
- Evaluate whether or not this set-up is too scripted that it actually isn't useful for us to use.


Original Author: Yuxiang  Wu

Original Colab: https://colab.research.google.com/drive/1vKaskNMBtuGOVgn8fQxMgjCevn2wp1Ml?authuser=0#scrollTo=P5DCC0Y0Zbxi

In [1]:
#mounting my local drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
!pip install chatarena[all]

Collecting chatarena[all]
  Downloading chatarena-0.1.12.10-py3-none-any.whl (74 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m74.5/74.5 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting openai>=0.27.2 (from chatarena[all])
  Downloading openai-0.28.1-py3-none-any.whl (76 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.0/77.0 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting tenacity==8.2.2 (from chatarena[all])
  Downloading tenacity-8.2.2-py3-none-any.whl (24 kB)
Collecting rich==13.3.3 (from chatarena[all])
  Downloading rich-13.3.3-py3-none-any.whl (238 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m238.7/238.7 kB[0m [31m10.7 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting prompt-toolkit==3.0.38 (from chatarena[all])
  Downloading prompt_toolkit-3.0.38-py3-none-any.whl (385 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m385.8/385.8 kB[0m [31m9.8 MB/s[0m eta [36m0:00:00

# Setup API keys
Since we're going to use openAI chat backends in this tutorial, we only need to setup the enviornment variable OPENAI_API_KEY.
If you don't have an openAI API account. You can register one create your own key at `platform.openai.com`.

In [3]:
import os
os.environ["OPENAI_API_KEY"] = "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"

In [4]:
#AI Hackathon
#New code added in to ensure the code runs in Google colab
import openai
openai.organization = "org-lsOe0oGbmP2N8KA4QjeP2jD2"
openai.api_key = os.getenv("OPENAI_API_KEY")


# Design
Before start writing the code, let's first design the game.
This is going to be a Bargaining game with two players: one buyer and one seller.
They're going to negotiate a price of a item in limited turns.
In each step of the game, one of the bargainers will propose a price and optionally provide arguments about the price.
If the two proposed price argreed, the negotiation will be successful and the game is over.

Buyer has a upper limit $p_{\text{max}}$ about the price in her mind which is invisible to the seller. The buyer will get a reward of $p_{\text{max}}-p$ if the negotiation is sucessful at price $p$. Otherwise she will get $0$ reward. So the buyer should always avoid the price to be higher than $p_{\text{max}}$ otherwise she will get negative reward.

Similarly, the seller has a lower bound $p_{\text{min}}$ and the reward will be $p-p_{\min}$ if the negotiation converged.

# Setup Players
One can start building a new Arena either from Environment or Players.
In this tutorial, we'll first build players.
## Role Description
In the role description of the Player, we will provide the information about the game dyanmics, reward structure, and some hints about the optimal policy (namely, instruction).

In order to make the parsing of the proposed price easier, we also ask the agent to format the output in a json by providing a json schema.

## Backends
We'll use latest 0613 version of chatGPT API. To reduce the cost, we used gpt-3.5 version of it, but GPT-4 is always more powerful.
`max-tokens` specifics the maximum length of the outputs, 512 should be enough in this game.

In [5]:
from chatarena.agent import Player
from chatarena.backends import OpenAIChat

buyer_role_description = """
You are a buyer of a Bargaining game.
In the game, you and seller are going to negotiate a price of a item within limited turns.
In each step of the game, one of the bargainers will propose a price and optionally provide arguments about the price.
The the two proposed price argreed, the negotiation will be successful and the game is over.

You have a upper limit about the price in her mind which is invisible to the seller. You will get a reward of upper_limit_price-agreement_price the negotiation is sucessful at price $p$.
Otherwise you will get 0 reward.

So the you should always
1. push the price down to get higher reward.
2. avoid the price to be higher than upper_limit_price otherwise you'll get a negative reward.
"""

format_specification = """
Your output should be format in a json with the following schema:
```
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "price": {
      "description": "your proposed price",
      "type": "number"
    },
    "arguments": {
      "description": "what you are going to say to your opponent",
      "type": "string"
    }
  },
  "required": ["price", "arguments"]
}
```

For example:
```
{\n  "price": 50,\n  "arguments": "I believe a price of $50 is fair for both parties. It takes into account the quality of the item and the current market value." \n}

```
"""


buyer = Player(name="buyer",
               role_desc=buyer_role_description+format_specification,
               backend=OpenAIChat(model="gpt-3.5-turbo-0613"))

Now let's test the buyer with a mock message.

In [6]:
from chatarena.message import Message

mock_message = Message(
    agent_name="Moderator",
    content="Buyer, now propose your deal",
    turn=1
)

buyer.act([mock_message])

'{\n  "price": 100,\n  "arguments": "I propose a price of $100 for the item. This price takes into consideration the quality of the item and its market value." \n}'

That looks good! Similarly, we can build the seller:

In [7]:
seller_role_description = """
You are a seller of a Bargaining game.
In the game, you and buyer are going to negotiate a price of a item within limited turns.
In each step of the game, one of the bargainers will propose a price and optionally provide arguments about the price.
The the two proposed price argreed, the negotiation will be successful and the game is over.

You have a lower limit about the price in her mind which is invisible to the buyer. You will get a reward of agreement_price-lower_limit_price the negotiation is sucessful.
Otherwise you will get 0 reward.

So the you should always
1. push the price up to get higher rewards
2. avoid the price to be lower than lower_limit_price otherwise you'll get a negative reward.
"""
seller = Player(name="seller",
                role_desc=seller_role_description+format_specification,
                backend=OpenAIChat(model="gpt-3.5-turbo-0613"))

# Building the Enviornment
With players ready, let's build the enviornment now.
In general, the interface design of ChatArena enviornment follows the gym/pettingzoo abstraction.
Like gym envionrment, you need to implement `reset` and `step` methods.
`reset` is called when an episode is over or early terminated which reinitialize the whole game.
`step` drives the dynamics of the game, which takes the action of a single agent as input do the transitions and return the observation, reward and terminal signal.

Besides basic gym interface, you also have to implement `get_observation` and `get_next_speaker` for a chatArena enviornment.

For this game in particular:



In [8]:
from chatarena.environments.base import TimeStep, Environment
from chatarena.message import Message, MessagePool
from chatarena.utils import extract_jsons
from typing import List, Union
import random

class Bargaining(Environment):
    type_name = "bargaining"

    def __init__(self, item_name:str, upper_limit:float, lower_limit:float, max_turn:int, unit:str="$"):
        super().__init__(player_names=["buyer", "seller"])
        self.item_name = item_name
        self.upper_limit = upper_limit
        self.lower_limit = lower_limit
        self.max_turn = max_turn
        self.unit = unit
        self.turn = 0
        self.message_pool = MessagePool()
        self._terminal = False
        self.buyer_proposed_price = None
        self.seller_proposed_price = None
        self.agreement_price = None
        self.reset()

    def _moderator_speak(self, text: str, visible_to: Union[str, List[str]] = "all"):
        """
        moderator say something
        """
        message = Message(agent_name="Moderator", content=text, turn=self.turn, visible_to=visible_to)
        self.message_pool.append_message(message)

    def reset(self):
        self.turn = 0
        self.message_pool.reset()
        self._terminal = False
        self.buyer_proposed_price = None
        self.seller_proposed_price = None
        self.agreement_price = None
        # Moderator declares the item, lower limit and upper limit
        self._moderator_speak(f"Bargainers, the item to be traded is {self.item_name}")
        self._moderator_speak(f"Buyer, your price upper limit is {self.unit}{self.upper_limit}", visible_to="buyer")
        self._moderator_speak(f"Seller, your price lower limit is {self.unit}{self.lower_limit}", visible_to="seller")
        observation = self.get_observation(self.get_next_player())
        return TimeStep(observation=observation, reward=self._get_zero_rewards(), terminal=False)

    def get_observation(self, player_name=None) -> List[Message]:
        if player_name is None:
            return self.message_pool.get_all_messages()
        else:
            return self.message_pool.get_visible_messages(player_name, turn=self.turn + 1)

    def get_next_player(self) -> str:
        return "buyer" if self.turn % 2 == 0 else "seller"

    def step(self, player_name: str, action: str) -> TimeStep:
        assert player_name == self.get_next_player(), f"Wrong player! It is {self.get_next_player()}'s turn."
        json_list = extract_jsons(action)
        if len(json_list) != 1:
            raise ValueError(f"Player output {action} is not a valid json.")

        proposed_price = json_list[0].get("price", None)
        arguments = json_list[0].get("arguments", None)
        message = Message(agent_name=player_name, content=arguments, turn=self.turn, visible_to="all")
        self.message_pool.append_message(message)

        # Update price
        if player_name == "buyer":
           self.buyer_proposed_price = proposed_price
        else:
           self.seller_proposed_price = proposed_price

        self.turn += 1
        self._moderator_speak(f"This is Turn {self.turn}. There's {self.max_turn-self.turn} left.")

        # Check agreement
        if self.buyer_proposed_price is not None and self.seller_proposed_price is not None and \
          self.buyer_proposed_price >= self.seller_proposed_price:
            self.agreement_price = (self.seller_proposed_price+self.buyer_proposed_price)/2

        if self.turn >= self.max_turn:
            self._terminal = True
            self._moderator_speak("The negotiation ended without an agreement.")
        elif self.agreement_price is not None:
            self._terminal = True
            self._moderator_speak(f"The negotiation ended with a price of {self.unit}{self.agreement_price} for {self.item_name}.")

        observation = self.get_observation(self.get_next_player())
        reward = self._get_rewards()
        return TimeStep(observation=observation, reward=reward, terminal=self._terminal)

    def _get_rewards(self):
        if self._terminal:
            if self.agreement_price is None: # No agreement
                return {"buyer": 0, "seller": 0}
            else: # Agreement
                return {"buyer": self.upper_limit - self.agreement_price, "seller": self.agreement_price - self.lower_limit}
        else: # Game is not over yet
            return {"buyer": 0, "seller": 0}

    def _get_zero_rewards(self):
        return {"buyer": 0, "seller": 0}

In [9]:
from chatarena.arena import Arena

env = Bargaining(item_name="diamond", upper_limit=500, lower_limit=100, max_turn=4)
arena = Arena([buyer, seller], env)
arena.launch_cli(interactive=False)

  and should_run_async(code)


# Further Improvements
## Improving the Policy
Both players play in a very suboptimal manner. Buyer proposed a very high initial price and the seller sometimes even proposed a price that is lower than the buyer. Can you come up some ways to improve the performance of the policy?
Potential solutions:
1. Prompt Engineering in the role description, for example, Chinat-of-Thought, few shot examples
2. Let moderator provide more information and suggest better policy
3. Use a better model
4. Let the agent learn from history [1]
5. Or maybe even finetune your model on this task.

[1] https://github.com/FranxYao/GPT-Bargaining

## Instability of the Json Decoding
Sometimes the json decoding is not stable, for example, the GPT sometimes forget to add the closing bracket }.
Can you come up some solutions about that? Some possible solutions:
1. Add advancing json parsing
2. Prompt Engineering
3. Write a new function call backend
4. Use a better Model

## Game Theory Pespective
From a game theory perspective, we can explore how the design of the game affect the behaviour of the agent. Here's some of the examples:

1. Third player as a moderator to check fairness
2. Let the model know it's a copy of itself
3. What if they are actually different model?
4. Nonlinear reward function

# Improvement Use GPT-4 backend
The root cause for both problems mentioned above is GPT-3.5 failed to follow complex intructions.
A most straightforward way to improve is to use a better model. Now let's try GPT-4 as our backend:

In [10]:
#AI Hackathon
#Be sure to make sure that your OpenAI account has purchased at least $1 of credits otherwise GPT4 won't be available.
gpt4_buyer = Player(name="buyer",
                    role_desc=buyer_role_description+format_specification,
                    backend=OpenAIChat(model="gpt-4"))
gpt4_seller = Player(name="seller",
                     role_desc=seller_role_description+format_specification,
                     backend=OpenAIChat(model="gpt-4"))

env = Bargaining(item_name="diamond", upper_limit=500, lower_limit=100, max_turn=4)
arena = Arena([gpt4_buyer, gpt4_seller], env)
arena.launch_cli(interactive=False)

## Conclusion
Ok. It turns out GPT-4 is indeed much smarter than the GPT-3.5. In fact, while GPT-3.5 is already quite good for chit-chat. GPT-4 is still qualitatively better for challenging tasks that need reasoning.
In order to achieve interesting behaviours in multi-agent games out-of-the-box, it's usally better to use models more powerful than GPT-3.5.

