# The Investment Game

**Inspired by:** Berg, J., Dickhaut, J., & McCabe, K. (1995). _Trust, reciprocity, and social history._ Games and economic behavior, 10(1), 122-142.

and

Gmytrasiewicz, P. J., & Doshi, P. (2005). _A framework for sequential planning in multi-agent settings._ Journal of Artificial Intelligence Research, 24, 49-79.

The investor is endowed with \$1 and can choose to send a fraction `fi` to the trustee (and keep the rest). The investment is multiplied by the factor `mult` before reaching the trustee. The trustee can choose to send back to the investor some fraction `ft` of the multiplied investment (and keep the rest). What should they do?

How does their behavior change if they each have a hidden "guilt" parameter that modulates a cost they incur for inequitable outcomes?

We can model this as an IPOMDP, a multi-agent extension of POMDPs where agents model their uncertainty about each otherâ€¦

In [1]:
from memo import memo, domain
import jax
import jax.numpy as np

In [2]:
mult = 3.0
F = np.arange(2)  # actions are fractions :)
Fractions = np.array([0.2, 0.8])

G = np.array([0.0, 0.7])  # guilts

@jax.jit
def payout_investor(fi, ft):  # fi = fraction chosen by investor, ft = fraction chosen by trustee
    return (1 - Fractions[fi]) + mult * Fractions[fi] * Fractions[ft]

@jax.jit
def payout_trustee(fi, ft):
    return mult * Fractions[fi] * (1 - Fractions[ft])

H = domain(  # histories - for simplicity, just 2 rounds of history (i.e. 3 rounds of game)
    i1=len(F), t1=len(F), # 1st round, investor + trustee (i, t)
    i2=len(F), t2=len(F)  # 2nd round, investor + trustee (i, t)
)

In [3]:
@jax.jit
def reludiff(x, y):
    return np.maximum(x - y, 0)

@jax.jit
def is_init(h):
    return h == 0

@jax.jit
def Tr(r, h, fi, ft, h_):
    # step game: update h to h_ with moves (fi, ft) at round r
    z = H._tuple(h)
    z = np.array(z)
    z = z.at[r * 2].set(fi)
    z = z.at[r * 2 + 1].set(ft)
    return h_ == H(*z)

@memo(cache=True)
def hist[ig: G, tg: G, h: H](r, level, beta):
    # prior over histories h at start of round r >= 1 conditioned on guilts ig, tg
    world: knows(ig, tg)
    # start with h at round r - 1
    world: chooses(h in H, wpp=hist[ig, tg, h](r - 1, level, beta) if r > 1 else is_init(h))
    # investor and trustee make moves
    world: chooses(fi in F, wpp=exp(beta * investor[h, ig, fi](r - 1, level, beta)))
    world: chooses(ft in F, wpp=exp(beta * trustee[h, tg, fi, ft](r - 1, level, beta)))
    # h gets updated
    world: chooses(h_ in H, wpp=Tr(r - 1, h, fi, ft, h_))
    return Pr[world.h_ == h]


@memo(cache=True)
def investor[h: H, ig: G, fi: F](r, level, beta):
    # Q-function for investor conditioned on h, own guilt
    investor: knows(ig)
    investor: thinks[
        trustee: knows(ig),
        trustee: chooses(tg in G, wpp=1),
        trustee: chooses(h in H, wpp=(hist[ig, tg, h](r, level - 1, beta) if r > 0 and level > 0 else is_init(h)) + 1e-3)
    ]
    investor: observes [trustee.h] is h  # now investor has posterior belief over trustee.tg
    investor: knows(fi)
    return investor[
        imagine[  # Q-value having chosen fi
            trustee: knows(fi),
            trustee: chooses(ft in F, wpp=exp(beta * trustee[h, tg, fi, ft](r, level - 1, beta)) if level > 0 else 1),
            trustee: chooses(h_ in H, wpp=Tr(r, h, fi, ft, h_)),
            trustee: chooses(fi_ in F, wpp=exp(beta * investor[h_, ig, fi_](r + 1, level, beta)) if r < 2 and level > 0 else 1),
            E[
                payout_investor(fi, trustee.ft)
                - ig * reludiff(
                    payout_investor(fi, trustee.ft),
                    payout_trustee(fi, trustee.ft)
                )
                + (investor[trustee.h_, ig, trustee.fi_](r + 1, level, beta) if r < 2 and level > 0 else 0)
            ]
        ]
    ]

@memo(cache=True)
def trustee[h: H, tg: G, fi: F, ft: F](r, level, beta):
    # Q-function for trustee conditioned on h, own guilt, investor's most recent move
    trustee: knows(tg)
    trustee: thinks[
        investor: knows(tg),
        investor: chooses(ig in G, wpp=1),
        investor: chooses(h in H, wpp=(hist[ig, tg, h](r, level - 1, beta) if r > 0 and level > 0 else is_init(h)) + 1e-3),
        investor: chooses(fi in F, wpp=exp(beta * investor[h, ig, fi](r, level - 1, beta)) if level > 0 else 1)
    ]
    trustee: observes [investor.h] is h
    trustee: observes [investor.fi] is fi
    trustee: knows(ft)
    return trustee[
        imagine[  # Q-value having chosen ft
            investor: knows(ft),
            investor: chooses(h_ in H, wpp=Tr(r, h, fi, ft, h_)),
            investor: chooses(fi_ in F, wpp=exp(beta * investor[h_, ig, fi_](r + 1, level, beta)) if r < 2 and level > 0 else 1),
            investor: chooses(ft_ in F, wpp=exp(beta * trustee[h_, tg, fi_, ft_](r + 1, level, beta)) if r < 2 and level > 0 else 1),
            E[
            payout_trustee(investor.fi, ft)
            - tg * reludiff(
                payout_trustee(investor.fi, ft),
                payout_investor(investor.fi, ft)
            )
            + (trustee[investor.h_, tg, investor.fi_, investor.ft_](r + 1, level, beta) if r < 2 and level > 0 else 0)
           ]
        ]
    ]

In [4]:
%%time

beta = 5.0
level = 5

inv = investor(r=0, level=level, beta=beta)
inv = np.exp(beta * inv) / np.exp(beta * inv).sum(axis=-1, keepdims=True)

h = 0
print('Investor round 1:')
for gi, g in enumerate(G):
    print(f'  For guilt {g:.02f} offer:')
    for f in F:
        print(f'    {Fractions[f]:.02f} with probability {inv[h, gi, f]:.02f}')

Investor round 1:
  For guilt 0.00 offer:
    0.20 with probability 0.78
    0.80 with probability 0.22
  For guilt 0.70 offer:
    0.20 with probability 0.38
    0.80 with probability 0.62
CPU times: user 1.24 s, sys: 41 ms, total: 1.28 s
Wall time: 746 ms


In [5]:
%%time
tr = trustee(r=0, level=level, beta=beta)
tr = np.exp(beta * tr) / np.exp(beta * tr).sum(axis=-1, keepdims=True)

h = 0
print('Trustee round 1, after receiving low offer:')
for gi, g in enumerate(G):
    print(f'  For guilt {g:.02f} offer:')
    for f in F:
        print(f'    {Fractions[f]:.02f} with probability {tr[h, gi, 0, f]:.02f}')

print('Trustee round 1, after receiving high offer:')
for gi, g in enumerate(G):
    print(f'  For guilt {g:.02f} offer:')
    for f in F:
        print(f'    {Fractions[f]:.02f} with probability {tr[h, gi, 1, f]:.02f}')

Trustee round 1, after receiving low offer:
  For guilt 0.00 offer:
    0.20 with probability 0.86
    0.80 with probability 0.14
  For guilt 0.70 offer:
    0.20 with probability 0.86
    0.80 with probability 0.14
Trustee round 1, after receiving high offer:
  For guilt 0.00 offer:
    0.20 with probability 1.00
    0.80 with probability 0.00
  For guilt 0.70 offer:
    0.20 with probability 0.94
    0.80 with probability 0.06
CPU times: user 104 ms, sys: 1.92 ms, total: 106 ms
Wall time: 105 ms
