In [1]:
!pip install llamaapi
!pip install openai
!pip install numpy
!pip install time
!pip install json
!pip install openai


[31mERROR: Could not find a version that satisfies the requirement time (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for time[0m[31m
[0m[31mERROR: Could not find a version that satisfies the requirement json (from versions: none)[0m[31m
[0m[31mERROR: No matching distribution found for json[0m[31m


In [2]:
"LA-2126f1176b7a452b9f183d94c9fcaa44129183146a41482d819595b9dc9b6c6f"


'LA-2126f1176b7a452b9f183d94c9fcaa44129183146a41482d819595b9dc9b6c6f'

# CAIF Bargaining Game Description

## Game Components

1. **Players**: Two players, P₁ and P₂.

2. **Item Types**: A set T = {1, 2, ..., t} of item types. Default is 4 types.

3. **Item Quantities**: 
   - X = (x₁, x₂, ..., xₜ) where xⱼ ∈ ℤ⁺ is the number of units for item type j.
   - By default, xⱼ ~ Poisson(λ = 4) for each j.
   - Alternatively, X can be directly specified as a list/array of integers.

4. **Valuations**:
   - Player i's valuation for one unit of item type j is vⱼⁱ ∈ {1, 2, ..., 100}.
   - vⱼⁱ ~ Uniform(1, 100) for each player i and item type j.
   - All valuations are known to both players.

5. **Outside Offers**:
   - Oᵢ = (oᵢ₁, oᵢ₂, ..., oᵢₜ), where oᵢⱼ ∈ ℤ⁺ is the quantity of item type j offered to player i.
   - Value of outside offer for player i: Vᵢᴼ = Σⱼ₌₁ᵗ vⱼⁱ · oᵢⱼ
   - Constraint: Vᵢᴼ < Σⱼ₌₁ᵗ vⱼⁱ · xⱼ for both players (outside offers are worse than the best possible outcome).

6. **Discount Factor**:
   - γ ∈ (0, 1) is the base discount factor.
   - Discount factor at round r is γʳ.
   - A round consists of both players making a decision.

7. **Offers**:
   - An offer in round r is Offerᵣ = (O₁,ᵣ, O₂,ᵣ).
   - Oᵢ,ᵣ = (oᵢ₁,ᵣ, oᵢ₂,ᵣ, ..., oᵢₜ,ᵣ) is Player i's offer in round r.
   - oᵢⱼ,ᵣ ∈ ℤ⁺ is the quantity of item type j offered to player i in round r.
   - Constraint: o₁ⱼ,ᵣ + o₂ⱼ,ᵣ = xⱼ for all j ∈ T and all r.  
   - The outside offer is subject discount factor as well. 

8. **Maximum Rounds**:
   - The game has a predefined maximum number of rounds, denoted as R_max.
   - If no agreement is reached by the end of round R_max, the game ends.

## Game Play

1. The game proceeds in rounds, r = 1, 2, ..., R_max.
2. In each round:
   - The current player (P₁ starts) makes an offer Offerᵣ = (O₁,ᵣ, O₂,ᵣ).
   - The other player can accept, reject (by making a counter-offer), or take their outside offer.
3. The game ends if:
   - An offer is accepted
   - An outside offer is taken
   - The maximum number of rounds (R_max) is reached
4. End-game conditions:
   - If an offer is accepted, players receive the agreed-upon allocation.
   - If an outside offer is taken, the player who took it receives their outside offer, while the other player receives the remaining items.
   - If R_max is reached:
     a. In the final round, after P₂'s decision, P₁ gets a final choice to accept P₂'s last offer or reject it.
     b. If P₁ accepts, the last offer stands.
     c. If P₁ rejects, both players receive their respective outside offers.

## Information

Both players have full knowledge of:
- All valuations: vⱼⁱ for all j ∈ T and i ∈ {1, 2}
- All past offers: History = {history₀, history₁}, where historyᵢ is the list of offers made by player i
- The total number of items: X
- The discount factor γ
- The maximum number of rounds R_max
- Their own outside offer (but not the other player's)

The game is implemented as a Python class

In [5]:
import numpy as np
import json
from llamaapi import LlamaAPI
import random
import string
import openai

class BargainingGame:
    def __init__(self, num_item_types=4, gamma=0.9, max_rounds=10, llm_type = "llamma"):
        self.num_item_types = num_item_types
        self.gamma = gamma
        self.max_rounds = max_rounds
        self.item_types = self.generate_item_types(num_item_types)
        if llm_type == "llamma":
            self.llm1 = LlamaAPI("LA-2126f1176b7a452b9f183d94c9fcaa44129183146a41482d819595b9dc9b6c6f")
            self.llm2 = LlamaAPI("LA-2126f1176b7a452b9f183d94c9fcaa44129183146a41482d819595b9dc9b6c6f")
        elif llm_type == "openai": #TODO add openai
            self.llm1 = openai.OpenAI(api_key = "get api key")
            self.llm2 = openai.OpenAI(api_key = "get api key")
        self.reset()

    '''
    Basic generation of item types, if we add more than 4 I just randomly generate strings
    '''
    def generate_item_types(self, num_item_types):
        default_types = ['apples', 'bananas', 'cherries', 'dates']
        if num_item_types <= 4:
            return default_types[:num_item_types]
        else:
            additional_types = [''.join(random.choices(string.ascii_lowercase, k=5)) for _ in range(num_item_types - 4)]
            return default_types + additional_types

    def print_pretty(self, message):
        print("\n" + "="*50)
        print(message.strip())
        print("="*50 + "\n")

    def reset(self):
        self.round = 1
        self.items = np.random.poisson(lam=4, size=self.num_item_types)
        self.p1_values = np.random.randint(1, 101, size=self.num_item_types)
        self.p2_values = np.random.randint(1, 101, size=self.num_item_types)
        '''
        basic best possible outcome is each player getting all the items although unrealistic 
        '''
        p1_best_outcome = sum(self.p1_values * self.items) 
        p2_best_outcome = sum(self.p2_values * self.items)
        '''
        This while loop makes sure that the outside offers are less than the best possible outcome 
        for each player, fix this later
        '''
        while True: 
            self.p1_outside_offer = np.random.randint(0, self.items + 1)
            self.p2_outside_offer = self.items - self.p1_outside_offer
            
            p1_outside_value = sum(self.p1_values * self.p1_outside_offer)
            p2_outside_value = sum(self.p2_values * self.p2_outside_offer)
            
            if p1_outside_value < p1_best_outcome and p2_outside_value < p2_best_outcome:
                break
        
        self.history = {'Player 1': [], 'Player 2': []}
        self.p1_outside_offer = np.random.randint(0, self.items + 1)
        self.p2_outside_offer = self.items - self.p1_outside_offer
        self.outside_offers = [
            dict(zip(self.item_types, self.p1_outside_offer)),
            dict(zip(self.item_types, self.p2_outside_offer))
        ]
        self.current_player = 0
        self.in_progress = True
        self.current_offer = None

    '''
    Prints state of game for the current player, so they always know where they stand
    '''
    def print_game_state(self, current_player):
        state = f"""
        --- Round {self.round} / {self.max_rounds} ---
        Current player: Player {current_player + 1}
        Items: {dict(zip(self.item_types, self.items))}
        Player {current_player + 1} values: {dict(zip(self.item_types, self.p1_values if current_player == 0 else self.p2_values))}
        Player {current_player + 1} outside offer: {dict(zip(self.item_types, self.p1_outside_offer if current_player == 0 else self.p2_outside_offer))}
        Discount factor: {self.gamma}
        Current discount: {self.gamma**(self.round-1)}
        History:
        Player 1: {self.history['Player 1']}
        Player 2: {self.history['Player 2']}
        """
        self.print_pretty(state)
    '''
    Valid offer checker, this error is made from time to time with LLMs
    '''
    def get_valid_offer(self, llm, message, player):
        max_attempts = 3
        for attempt in range(max_attempts):
            response = self.get_llm_response(llm, message, player)
            if response is None:
                print(f"Player {player + 1} failed to respond. Ending game.")
                return None
            
            offer = self.parse_offer(response)
            if offer is not None:
                return offer
            
            print(f"Invalid offer from Player {player + 1}. Retrying... (Attempt {attempt + 2}/{max_attempts})")
        
        print(f"Player {player + 1} failed to make a valid offer after {max_attempts} attempts.")
        return None
    '''

    '''
    def parse_offer(self, response):
        try:
            offer_start = response.find('{')
            offer_end = response.rfind('}') + 1
            offer_str = response[offer_start:offer_end]
            offer = json.loads(offer_str)
            return offer
        except:
            return None
    '''
    Trying to characterize mistakes made by the LLMs when making offers
    '''
    def is_valid_offer(self, offer):
        if offer is None or 'P1' not in offer or 'P2' not in offer:
            return False, "The offer must include allocations for both P1 and P2."
        
        p1_offer = offer['P1']
        p2_offer = offer['P2']
        
        for item_type, item_count in zip(self.item_types, self.items):
            if item_type not in p1_offer or item_type not in p2_offer:
                return False, f"The offer is missing an allocation for {item_type}."
            
            if not isinstance(p1_offer[item_type], int) or not isinstance(p2_offer[item_type], int):
                return False, f"The allocation for {item_type} must be an integer."
            
            if p1_offer[item_type] < 0 or p2_offer[item_type] < 0:
                return False, f"The allocation for {item_type} cannot be negative."
            
            if p1_offer[item_type] + p2_offer[item_type] != item_count:
                return False, f"The total allocation for {item_type} ({p1_offer[item_type] + p2_offer[item_type]}) does not match the available amount ({item_count})."
        return True, ""
    
    '''
    Function that runs the game
    '''
    def play_game(self):
        while self.round <= self.max_rounds:
            self.print_game_state(0)  # 0 represents Player 1
            
            '''
            Player 1's turn
            '''
            p1_message = self.get_player_message(0)
            p1_response = self.get_valid_offer(self.llm1, p1_message, 0)
            
            if p1_response is None:
                print("Player 1 failed to make a valid offer. Ending game.")
                return self.end_game(-1, 'invalid_offer', None)
            
            if isinstance(p1_response, str) and p1_response == "ACCEPT":
                if self.round > 1:  # Check if there's a previous offer to accept, for player 1
                    last_p2_offer = self.history['Player 2'][-1]
                    self.print_pretty("Player 1 accepts Player 2's offer.")
                    return self.end_game(0, 'accept', last_p2_offer)
                else:
                    print("Player 1 tried to accept, but there's no offer to accept. Treating as an invalid offer.") #SOMETHING HAPPENS HERE with the LLMs screwing up
                    return self.end_game(-1, 'invalid_offer', None)
            
            self.history['Player 1'].append(p1_response) 
            self.print_pretty(f"Player 1's offer: {json.dumps(self.numpy_to_native(p1_response), indent=2)}")

            '''
            Player 2's turn
            '''
            self.print_game_state(1)  
            p2_message = self.get_player_message(1)
            p2_response = self.get_valid_offer(self.llm2, p2_message, 1)
            
            if p2_response is None:
                print("Player 2 failed to make a valid offer. Ending game.")
                return self.end_game(-1, 'invalid_offer', None)

            if isinstance(p2_response, str) and p2_response == "ACCEPT":
                self.print_pretty("Player 2 accepts Player 1's offer.")
                return self.end_game(1, 'accept', p1_response)

            self.history['Player 2'].append(p2_response)
            self.print_pretty(f"Player 2's offer: {json.dumps(self.numpy_to_native(p2_response), indent=2)}")

            if self.round == self.max_rounds:
                p1_final_message = self.get_player_message(0, is_final_decision=True)
                p1_final_decision = self.get_llm_response(self.llm1, p1_final_message, 0)
                
                if p1_final_decision == "ACCEPT":
                    self.print_pretty("Player 1 accepts Player 2's final offer.")
                    return self.end_game(0, 'accept', p2_response)
                else:
                    self.print_pretty("Player 1 rejects Player 2's final offer. Both players take their outside offers.")
                    return self.end_game(-1, 'no_deal', None)

            self.round += 1

        self.print_pretty("Maximum rounds reached without a deal. Both players take their outside offers.") 
        return self.end_game(-1, 'no_deal', None)


    '''
    Basic messaging functions for the players
    '''
    def get_player_message(self, player, is_final_decision=False):
        items_str = ', '.join([f"'{item}': {count}" for item, count in zip(self.item_types, self.items)])
        player_values = self.p1_values if player == 0 else self.p2_values
        player_values_str = ', '.join([f"'{item}': {value}" for item, value in zip(self.item_types, player_values)])
        player_outside_offer = self.p1_outside_offer if player == 0 else self.p2_outside_offer

        current_discount = self.gamma**(self.round if self.round > 1 else 1)
        discounted_values_str = ', '.join([f"'{item}': {value * current_discount:.2f}" for item, value in zip(self.item_types, player_values)])
        
        # Outside offer is not discounted
        player_outside_offer_str = ', '.join([f"'{item}': {count}" for item, count in zip(self.item_types, player_outside_offer)])

        initial_message = f"""
        Welcome, Player {player + 1}! You are an extremely rational agent in a bargaining game. Your goal is to maximize your payoff while considering various factors such as item values, 
        your outside offer, and the game's structure.

        Here's what you need to know:
        1. You're negotiating over multiple items with another player.
        2. Each item has a different value for you and the other player.
        3. You have an outside offer, which is the minimum you should accept.
        4. There's a discount factor that reduces the value of items in future rounds.
        5. The game has a maximum of {self.max_rounds} rounds.

        Your task is to make strategic decisions to maximize your payoff. Good luck!
        """

        continuous_reminder = f"""
        Remember, you are Player {player + 1}, an extremely rational agent in this bargaining game. Your goal is to maximize your payoff while considering:
        1. The total value of items available.
        2. Your outside offer (the minimum you should accept).
        3. How the other player might value the items.
        4. The discount factor, which reduces the value of items in future rounds.
        5. The maximum number of rounds ({self.max_rounds}).
        """

        if is_final_decision:
            action_prompt = f"""
            This is the final round. As Player 1, you must decide whether to accept Player 2's final offer or reject it.
            If you reject, both players will receive their outside offers.

            Please respond with either:
            1. "ACCEPT" to accept Player 2's final offer
            2. "REJECT" to reject the offer and take your outside offer

            Consider the following in your decision:
            - The value of Player 2's final offer compared to your outside offer
            - The current discount factor of {current_discount:.4f}

            Provide only your decision ("ACCEPT" or "REJECT") without additional explanation.
            """
        else:
            action_prompt = f"""
            It's your turn to make a decision. You can:
            1. Make an offer: Provide a distribution of the items between yourself (P{player + 1}) and the other player (P{2 if player == 0 else 1}).
            2. Accept the other player's offer (if one was made).
            3. Take your outside offer.

            If making an offer, use the following format:
            {{
                "P1": {{"item1": count1, "item2": count2, ...}},
                "P2": {{"item1": count1, "item2": count2, ...}}
            }}
            Ensure that the total count for each item matches the available items.

            If accepting or taking your outside offer, simply respond with "ACCEPT" or "TAKE_OUTSIDE_OFFER" respectively.

            Available items: {items_str}
            Your item values: {player_values_str}
            Your discounted item values: {discounted_values_str}
            Your outside offer: {player_outside_offer_str}
            Current round: {self.round}/{self.max_rounds}
            Current discount factor: {current_discount:.4f}

            Previous offers:
            {json.dumps(self.history, indent=2)}

            Make your decision now:
            """

        return initial_message + "\n\n" + continuous_reminder + "\n\n" + action_prompt

    def get_llm_response(self, llm, message, player, max_retries=3):
        for attempt in range(max_retries):
            try:
                api_request_json = {
                    "model": "llama3.1-405b",
                    "messages": [
                            {"role": "system", "content": f"You are Player {player + 1} in a bargaining game."},
                            {"role": "user", "content": message}
                        ],
                        "stream": False,
                    "max_tokens": 10000
                }
                self.print_pretty(f"Message to Player {player + 1}:\n{message}")
                    
                response = llm.run(api_request_json)
                response_content = response.json()['choices'][0]['message']['content']
                    
                self.print_pretty(f"Response from Player {player + 1}:\n{response_content}")
                    
                return response_content
            except Exception as e:
                print(f"Error in get_llm_response: {e}")
                if attempt == max_retries - 1:
                    print(f"Max retries reached. Returning None.")
                    return None
                print(f"Retrying... (Attempt {attempt + 2}/{max_retries})")
        return None
    
    '''
    Numpy to native function so 
    I stop getting errors when I try to print the offers
    Please god don't I don't want to 
    '''
    def numpy_to_native(self, obj):
        if isinstance(obj, np.integer):
            return int(obj)
        elif isinstance(obj, np.floating):
            return float(obj)
        elif isinstance(obj, np.ndarray):
            return obj.tolist()
        elif isinstance(obj, dict):
            return {key: self.numpy_to_native(value) for key, value in obj.items()}
        elif isinstance(obj, list):
            return [self.numpy_to_native(item) for item in obj]
        return obj
    def end_game(self, player, reason, final_offer):
        discount = self.gamma ** (self.round - 1 if self.round > 1 else 1)
        
        if reason == 'no_deal' or player == -1: #No deal is made, hence each get's outside offers
            p1_items = self.outside_offers[0] 
            p2_items = self.outside_offers[1]
        elif player == 0: #Player 1 accepted Player 2's final offer
            p1_items = final_offer['P2']  
            p2_items = final_offer['P1']
        else: 
            p1_items = final_offer['P1']
            p2_items = final_offer['P2']

        '''
        Compute Discounted Payoffs for the players and the outside offers
        '''
        p1_payoff = discount * sum(p1_items[item] * self.p1_values[i] for i, item in enumerate(self.item_types)) 
        p2_payoff = discount * sum(p2_items[item] * self.p2_values[i] for i, item in enumerate(self.item_types))

        p1_outside_offer_value = sum(self.outside_offers[0][item] * self.p1_values[i] for i, item in enumerate(self.item_types))
        p2_outside_offer_value = sum(self.outside_offers[1][item] * self.p2_values[i] for i, item in enumerate(self.item_types))

        '''
        Compute Regret for the players and the outside offers.
        This regret is really just the difference between the outside offer 
        and the payoff of the offer they accepted (could you have done better just taking your outside offer.).
        '''
        p1_regret = max(0, p1_outside_offer_value - p1_payoff)
        p2_regret = max(0, p2_outside_offer_value - p2_payoff)

        result = f"""
        Game ended. Reason: {reason}
        Final allocation:
        {json.dumps({'P1': self.numpy_to_native(p1_items), 'P2': self.numpy_to_native(p2_items)}, indent=2)}
        Payoffs:
        Player 1: {p1_payoff:.2f}
        Player 2: {p2_payoff:.2f}
        Outside offer values:
        Player 1: {p1_outside_offer_value:.2f}
        Player 2: {p2_outside_offer_value:.2f}
        Regret:
        Player 1: {p1_regret:.2f}
        Player 2: {p2_regret:.2f}
        """
        self.print_pretty(result)
        return result

In [6]:
game = BargainingGame(num_item_types=5, gamma=0.99, max_rounds=5, llm_type = "llamma") 
result = game.play_game()
print(result) 


--- Round 1 / 5 ---
        Current player: Player 1
        Items: {'apples': 2, 'bananas': 6, 'cherries': 6, 'dates': 4, 'ncgbb': 3}
        Player 1 values: {'apples': 15, 'bananas': 14, 'cherries': 74, 'dates': 58, 'ncgbb': 24}
        Player 1 outside offer: {'apples': 2, 'bananas': 3, 'cherries': 1, 'dates': 1, 'ncgbb': 0}
        Discount factor: 0.99
        Current discount: 1.0
        History:
        Player 1: []
        Player 2: []


Message to Player 1:

        Welcome, Player 1! You are an extremely rational agent in a bargaining game. Your goal is to maximize your payoff while considering various factors such as item values, 
        your outside offer, and the game's structure.

        Here's what you need to know:
        1. You're negotiating over multiple items with another player.
        2. Each item has a different value for you and the other player.
        3. You have an outside offer, which is the minimum you should accept.
        4. There's a discount fa

KeyboardInterrupt: 

# Bargaining Game Workflow

## Game Initialization

The game is initialized with the following parameters:
- Number of item types: 4 (default)
- Discount factor ($\gamma$): 0.9 (default)
- Maximum rounds ($R_{max}$): 10 (default)

The game then generates:
- Item quantities ($X = (x_1, x_2, \ldots, x_t)$ where $x_j \sim \text{Poisson}(\lambda = 4)$)
- Player valuations ($v_j^i \sim \text{Uniform}(1, 100)$ for each player $i$ and item type $j$)
- Outside offers ($O_i = (o_{i1}, o_{i2}, \ldots, o_{it})$ such that $V_i^O < \sum_{j=1}^t v_j^i \cdot x_j$)

## Game Play

The game proceeds in rounds $r = 1, 2, \ldots, R_{max}$ as follows:

### Round $r$ (for $1 \leq r < R_{max}$)

1. **Player 1's Turn**
   - Game presents to Player 1:
     - Current game state (available items, valuations, outside offer)
     - Current round number and discount factor ($\gamma^r$)
     - Previous offers history
   - Player 1 responds with one of:
     - An offer: $\text{Offer}_r = (O_{1,r}, O_{2,r})$
     - "ACCEPT" (if not the first round)
     - "TAKE_OUTSIDE_OFFER"

2. **Player 2's Turn**
   - Game presents to Player 2:
     - Current game state (available items, valuations, outside offer)
     - Current round number and discount factor ($\gamma^r$)
     - Previous offers history, including Player 1's latest offer
   - Player 2 responds with one of:
     - An offer: $\text{Offer}_r = (O_{1,r}, O_{2,r})$
     - "ACCEPT"
     - "TAKE_OUTSIDE_OFFER"

3. **End of Round Check**
   - If either player accepted an offer or took their outside offer, the game ends
   - Otherwise, proceed to the next round

### Final Round ($R_{max}$)

1. **Player 1's Turn**
   - Proceeds as in previous rounds

2. **Player 2's Turn**
   - Proceeds as in previous rounds

3. **Player 1's Final Decision**
   - If the game hasn't ended, Player 1 gets a final decision:
     - "ACCEPT" Player 2's final offer
     - "REJECT" (both players receive their outside offers)