<a href="https://colab.research.google.com/github/isabelleqian/AISafety/blob/main/Cooperative_AI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Cooperative AI**.  
Isabelle Qian's response.   
*Scenario*: Two agents face a mixed-motive game (such as a one-shot Prisoner’s Dilemma) where individual incentives conflict with the socially optimal outcome. Without any mechanism for commitment, rational self-interested agents will fail to cooperate, even if cooperation would benefit both. In the Prisoner’s Dilemma, for instance, mutual cooperation gives higher payoffs to both players than mutual defection, yet defection is the dominant strategy absent trust or enforcement. 🤝

*Goal*: Introduce a simple contract or commitment mechanism that allows the agents to achieve cooperation by making their promises credible. By enforcing commitments (e.g. via penalties for breaking a promise or a binding agreement), agents can align their incentives and trust each other’s cooperative intent. The exercise will illustrate how adding a credible commitment changes the game’s outcome from defection to cooperation.

In the one‑shot Prisoner’s Dilemma, two players choose Cooperate (C) or Defect (D). Alone, each will defect, yielding a suboptimal (D,D) payoff. A credible commitment—a mutual promise enforced by penalty—can flip incentives so both choose C. You’ll see firsthand how a simple enforcement mechanism transforms a defection‑dominant game into one where cooperation is rational—mirroring how computational contracts can align incentives in multi‑agent systems.

*Task*: Implement a minimal Python simulation to see how commitments enable cooperation.

In [None]:
# Payoff constants (you, other)
R, T, P, S = 3, 5, 1, 0  #the other scenario would be P=2
payoffs = {('C','C'):(R,R), ('C','D'):(S,T),
           ('D','C'):(T,S), ('D','D'):(P,P)}

def play(a1, a2):
    return payoffs[(a1,a2)]

#adding commitment cost from bonus
commitment_cost=0.5

def play_with_commit(c1, c2):
    if c1 and c2:
        original = payoffs[('C', 'C')]
    else:
        original = payoffs[('D', 'D')]
    po1 = original[0] - commitment_cost if c1 else original[0]
    po2 = original[1] - commitment_cost if c2 else original[1]
    return (po1, po2)

actions = [('C','C'),('C','D'),('D','C'),('D','D')]

# Loop over action pairs
for a1,a2 in actions:
  print(a1,a2, play(a1,a2))
for c1 in (False,True):
  for c2 in (False,True):
    print(c1,c2, play_with_commit(c1,c2))


C C (3, 3)
C D (0, 5)
D C (5, 0)
D D (1, 1)
False False (1, 1)
False True (1, 0.5)
True False (0.5, 1)
True True (2.5, 2.5)


# 1. Which strategy is dominant without commitments?   
The dominant strategy would be for both agents to defect:
```
D D (1, 1)
```
This is because in this setup, both agents are trying to maximize their rewards and would opt for the choice that would give them the maximum reward without considering what other agent would choose.

# 2. How does enforcing C–C when both commit change that?  
Output:
```
False False (1, 1)
False True (1, 1)
True False (1, 1)
True True (3, 3)
```
Now, with commitment would cause there to be a higher opportunity cost for each agent to choose to defect. Since, if one defects, both agents would receive 1, which discourages defection and make the decision to cooperate to be more appealing.


# 3. What if breaking a commitment only pays P=2 instead of P=1?   
If breaking a commitment only pays 2, the payoff would look like:


```
('C','C') (3, 3)
('C','D') (0, 5)
('D','C') (5, 0)
('D','D') (2, 2)
```

Raising the pay when the player breaks a commitment will make the option to defect for each player to be more appealing. Therefore, the dominant strategy would still be for both agents to defect.



# 4. Bonus: Add an upfront “commitment cost” (e.g. –0.5) and see how it affects willingness to commit.   
With P=1, the result would be:


```
False False (1, 1)
False True (1, 0.5)
True False (0.5, 1)
True True (2.5, 2.5)
```

If you both commit, you will each gain 2.5 if you both commit. However, if you defect, the most that you will get is 1 (which is not larger than your reward for committing). Therefore, there remains the incentive for agents to cooperate.

With P=2, the result would be:
```
False False (2, 2)
False True (2, 1.5)
True False (1.5, 2)
True True (2.5, 2.5)
```
In contrast with P=1, P=2 makes it more appealing for each agent to defect as the difference between both defecting and both committing is 0.5. If that agent was the only one committing, she would risk losing 1.5 instead of 0.5 (if she were to defect). Nonetheless, there is still a strong incentive for both of the agents to cooperate since each agent will be paid 2.5 instead of at most 2 (if agents were to defect).
