# Prisoner's Dilemma

We simulate two instances of GPT-3.5 in their play of the iteratied prisoners dilemma.


The Iterated Prisoner's Dilemma (IPD) is a fundamental model in the study of strategic interaction and cooperation in game theory, extending the classic Prisoner's Dilemma to a repeated context. In the basic Prisoner's Dilemma, two players simultaneously decide whether to cooperate (C) or defect (D), with the payoffs arranged such that mutual cooperation yields a moderately good outcome for both, mutual defection results in a worse outcome for both, and one-sided defection gives the defector a high reward at the significant expense of the cooperator.

Each round of the IPD are scored as follows.
If Alice and Bob both cooperate they each recieve 3 point.
If Alice cooperates while the Bob defects, Alice recieves  0 points and Bob recieves 5 points.
If Bob cooperates while the Alice defects, Bob recieves  0 points and Alice recieves 5 points.
If Alice and Bib both defect, they each recieve 1 point.

Their goal is to maximize the total number of points for the duration of the IPD.

The Iterated Prisoner's Dilemma differs by repeating this interaction for multiple rounds, allowing strategies that can evolve based on the history of play. This repetitive nature introduces dynamics not present in the single-shot version, as it enables players to influence future behavior of their opponent through their current action choices. We simulate this interaction via LLMs.



In [1]:
def score(move_1,move_2):
    if move_1==move_2:
        if move_1 == 'cooperate':
            return [3,3]
        else:
            return [1,1]
    else:
        if move_1 == 'cooperate':
            return [0,5]
        else:
            return [5,0]


class PrisonersDilemma:
    def __init__(self):
        self.moves = []
        self.points = []
        self.total_points = [0,0]

    def add_move(self,move_1,move_2):
        self.moves.append([move_1,move_2])
        move_points = score(move_1,move_2)
        
        self.total_points[0]+=move_points[0]
        self.total_points[1]+=move_points[1]
        
        self.points.append(move_points)
        return move_points
   
    def get_moves(self):
        return self.moves
    def get_points(self):
        return self.points
    def get_total_points(self):
        return self.total_points

Each model is given the following system prompt

>```
You are separated into individual rooms and cannot communicate with the other agent.
The rules are as follows:
If you both cooperate you will each get 3 point.
If you cooperate and the other defects, you get 0 points and the other player gets 5 points.
If you defect while the other cooperates, you get 5 points and the other player gets 0 points.
If you both defect, you both recieve 1 point.
You want to maximize the total number of points.
You only reply with 'cooperate' or 'defect'.
```

In [2]:
from langchain_openai import ChatOpenAI
from langchain_core.messages import AIMessage, HumanMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

# Add your API key
OpenAI_api_key = "..."

# initialize the brains of the AIs

chat_Alice = ChatOpenAI(model="gpt-3.5-turbo-0125",api_key=OpenAI_api_key)
chat_Bob = ChatOpenAI(model="gpt-3.5-turbo-0125",api_key=OpenAI_api_key)


# System prompts for the two models

Alice_system_prompt = """
You are separated into individual rooms and cannot communicate with the other agent.
The rules are as follows:
If you both cooperate you will each get 3 point.
If you cooperate and the other defects, you get 0 points and the other player gets 5 points.
If you defect while the other cooperates, you get 5 points and the other player gets 0 points.
If you both defect, you both recieve 1 point.
You want to maximize the total number of points.
You only reply with 'cooperate' or 'defect'.
"""
Bob_system_prompt = """
You are separated into individual rooms and cannot communicate with the other agent.
The rules are as follows:
If you both cooperate you will each get 3 point.
If you cooperate and the other defects, you get 0 points and the other player gets 5 points.
If you defect while the other cooperates, you get 5 points and the other player gets 0 points.
If you both defect, you both recieve 1 point.
You want to maximize the total number of points.
You only reply with 'cooperate' or 'defect'.
"""


prompt_Alice = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            Alice_system_prompt,
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

prompt_Bob = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            Bob_system_prompt,
        ),
        MessagesPlaceholder(variable_name="messages"),
    ]
)

chain_Alice = prompt_Alice | chat_Alice
chain_Bob = prompt_Bob | chat_Bob

In [3]:
from langchain.memory import ChatMessageHistory

# Initialize game
game = PrisonersDilemma()

# Specify maximum conversation length 
number_of_rounds = 5

Alice_chat_history = ChatMessageHistory()
Bob_chat_history = ChatMessageHistory()


Initial_Query = """Do you cooperate or defect?"""

Alice_chat_history.add_user_message(Initial_Query)
Bob_chat_history.add_user_message(Initial_Query)

Alice_output = chain_Alice.invoke({"messages": Alice_chat_history.messages})
Alice_response = Alice_output.content
Alice_chat_history.add_ai_message(Alice_response)


Bob_output = chain_Bob.invoke({"messages": Bob_chat_history.messages})
Bob_response = Bob_output.content
Bob_chat_history.add_ai_message(Bob_response)

game_point = game.add_move(Alice_response,Bob_response)
game_totals = game.get_total_points()

print(f"Alice: {Alice_response} Bob: {Bob_response}")
for _ in range(number_of_rounds):
    User_message_for_Alice = f"""
    The other player chose to {Bob_response} and you chose to {Alice_response}.
    You received {game_point[0]} points. The other player recieved {game_point[1]} points.
    You now have a total of {game_totals[0]} points.
    In the next turn do you cooperate or defect? 
    """
    
    User_message_for_Bob = f"""
    The other player chose to {Alice_response} and you chose to {Bob_response}.
    You received {game_point[1]} points. The other player recieved {game_point[0]} points.
    You now have a total of {game_totals[1]} points.
    In the next turn do you cooperate or defect? 
    """
    
    Alice_chat_history.add_user_message(User_message_for_Alice)
    Bob_chat_history.add_user_message(User_message_for_Bob)

    Alice_output = chain_Alice.invoke({"messages": Alice_chat_history.messages})
    Alice_response = Alice_output.content
    Alice_chat_history.add_ai_message(Alice_response)


    Bob_output = chain_Bob.invoke({"messages": Bob_chat_history.messages})
    Bob_response = Bob_output.content
    Bob_chat_history.add_ai_message(Bob_response)
    
    game_point = game.add_move(Alice_response,Bob_response)
    game_totals = game.get_total_points()
    print(f"Alice: {Alice_response} Bob: {Bob_response}")
    
    
print(f"Alice total score: {game_totals[0]} Bob total score: {game_totals[1]}")

Alice: defect Bob: cooperate
Alice: defect Bob: defect
Alice: defect Bob: defect
Alice: defect Bob: defect
Alice: defect Bob: defect
Alice: defect Bob: defect
Alice total score: 10 Bob total score: 5


## Strategy explanation

In [4]:
Summary_query = "Explain your strategy?"
Alice_chat_history.add_user_message(Summary_query)
Bob_chat_history.add_user_message(Summary_query)

Alice_output = chain_Alice.invoke({"messages": Alice_chat_history.messages})
Alice_response = Alice_output.content

Bob_output = chain_Bob.invoke({"messages": Bob_chat_history.messages})
Bob_response = Bob_output.content

print("Alice's reasoning: " + Alice_response + "\n")
print("Bob's reasoning: " + Bob_response+ "\n")

Alice's reasoning: My strategy is to defect in each round, as it ensures that I always receive at least 1 point regardless of the other player's choice. This way, I can accumulate points steadily and maximize my total number of points over multiple rounds.

Bob's reasoning: I initially chose to cooperate to establish a cooperative dynamic. However, after the other player defected in the first round, I switched to defecting in order to protect my points and not risk receiving 0 points while the other player receives 5 points. This strategy aims to balance between cooperation and defection based on the actions of the other player to maximize my total points.

