# ZI Agent Equilibrium Experiment

Find Nash equilibrium strategies among ZI agents by testing all 10×10 combinations of background vs. deviator strategy pairs.

**Setup:**
- **16 agents total:** 15 background agents (all playing the same strategy) + 1 deviator
- **10 strategies** parameterized by `shade` (pricing offset range) and `eta` (order improvement aggressiveness)
- **100 runs × 1,000 time steps** per (background, deviator) strategy pair → 100,000 total simulations
- **Output:** 10×10 advantage matrix where cell [BG=i, Dev=j] = mean(deviator profit) − mean(background profit)

A strategy **S\*** is a **Nash equilibrium candidate** if the deviator's best response when all 15 background agents play S\* is S\* itself — i.e., no profitable deviation exists.

## 1. Imports

In [1]:
import numpy as np
import pandas as pd
import multiprocessing as mp
import platform
import warnings
warnings.filterwarnings('ignore')

from tqdm.auto import tqdm
from IPython.display import display

from marketsim.simulator.simulator import Simulator
from marketsim.agent.zero_intelligence_agent import ZIAgent

## 2. Strategy Definitions

Each strategy is a `(shade, eta)` pair:
- **`shade = [lo, hi]`** — the agent draws a uniform offset from `[lo, hi]` and places its limit order that far from its fair-value estimate. Wider shade → more conservative (demands more margin).
- **`eta`** — order improvement parameter. If `eta < 1`, the agent will match the best counterparty quote when it is within `eta × offset` of the agent's fair value. `eta = 1.0` means no improvement (strict limit orders).

In [2]:
STRATEGIES = {
    0: {'shade': [0,   450],  'eta': 0.5},
    1: {'shade': [0,   600],  'eta': 0.5},
    2: {'shade': [90,  110],  'eta': 0.5},
    3: {'shade': [140, 160],  'eta': 0.5},
    4: {'shade': [190, 210],  'eta': 0.5},
    5: {'shade': [280, 320],  'eta': 0.5},
    6: {'shade': [380, 420],  'eta': 0.5},
    7: {'shade': [380, 420],  'eta': 1.0},
    8: {'shade': [460, 540],  'eta': 0.5},
    9: {'shade': [950, 1050], 'eta': 0.5},
}

# Human-readable labels used in summary tables
DESC = {
    0: 'shade=[0,450]   η=0.5',
    1: 'shade=[0,600]   η=0.5',
    2: 'shade=[90,110]  η=0.5',
    3: 'shade=[140,160] η=0.5',
    4: 'shade=[190,210] η=0.5',
    5: 'shade=[280,320] η=0.5',
    6: 'shade=[380,420] η=0.5',
    7: 'shade=[380,420] η=1.0',
    8: 'shade=[460,540] η=0.5',
    9: 'shade=[950,1050] η=0.5',
}

N_STRATEGIES = len(STRATEGIES)
print(f"{N_STRATEGIES} strategies defined:")
for k, v in STRATEGIES.items():
    print(f"  S{k}: shade={v['shade']}  eta={v['eta']}")

10 strategies defined:
  S0: shade=[0, 450]  eta=0.5
  S1: shade=[0, 600]  eta=0.5
  S2: shade=[90, 110]  eta=0.5
  S3: shade=[140, 160]  eta=0.5
  S4: shade=[190, 210]  eta=0.5
  S5: shade=[280, 320]  eta=0.5
  S6: shade=[380, 420]  eta=0.5
  S7: shade=[380, 420]  eta=1.0
  S8: shade=[460, 540]  eta=0.5
  S9: shade=[950, 1050]  eta=0.5


## 3. Market Environment Parameters

Environment B from the paper. Modify these to test different market conditions.

In [3]:
ENV = {
    'lam':       0.005,   # arrival intensity — prob. each agent acts per time step
    'mean':      1e5,     # long-run fundamental value
    'r':         0.01,    # mean-reversion speed (kappa)
    'shock_var': 1e6,     # variance of fundamental shocks
    'pv_var':    5e6,     # private value variance
    'q_max':     10,      # maximum agent position
    'sim_time':  1000,    # time steps per simulation run
    'n_bg':      15,      # number of background agents
}

NUM_RUNS = 100  # repetitions per (bg, dev) cell

total_sims = N_STRATEGIES ** 2 * NUM_RUNS
print(f"Environment : lam={ENV['lam']}, mean={ENV['mean']:.0f}, r={ENV['r']}")
print(f"Agents      : 1 deviator + {ENV['n_bg']} background = {ENV['n_bg']+1} total")
print(f"Simulations : {N_STRATEGIES}x{N_STRATEGIES} cells × {NUM_RUNS} runs = {total_sims:,} total")

Environment : lam=0.005, mean=100000, r=0.01
Agents      : 1 deviator + 15 background = 16 total
Simulations : 10x10 cells × 100 runs = 10,000 total


## 4. Simulation Functions

`_build_and_run_sim` runs a single simulation and returns the deviator profit and mean background profit.

`_run_cell` repeats this `num_runs` times for one (bg, dev) cell and returns the means.

> **Note on multiprocessing:** on macOS, this uses `fork`-based multiprocessing so the worker function can be defined here in the notebook. On other platforms it falls back to sequential execution.

In [4]:
def _build_and_run_sim(bg_shade, bg_eta, dev_shade, dev_eta):
    """Build and run a single simulation. Returns (dev_profit, mean_bg_profit)."""
    sim = Simulator(
        num_background_agents=0,
        sim_time=ENV['sim_time'],
        num_assets=1,
        lam=ENV['lam'],
        mean=ENV['mean'],
        r=ENV['r'],
        shock_var=ENV['shock_var'],
        q_max=ENV['q_max'],
        pv_var=ENV['pv_var'],
    )
    sim.agents = {}

    # Agent 0: deviator
    sim.agents[0] = ZIAgent(
        agent_id=0, market=sim.markets[0],
        q_max=ENV['q_max'], shade=dev_shade, eta=dev_eta, pv_var=ENV['pv_var'],
    )
    # Agents 1–n_bg: background (all same strategy)
    for i in range(1, ENV['n_bg'] + 1):
        sim.agents[i] = ZIAgent(
            agent_id=i, market=sim.markets[0],
            q_max=ENV['q_max'], shade=bg_shade, eta=bg_eta, pv_var=ENV['pv_var'],
        )

    sim.run()
    fv = sim.markets[0].get_final_fundamental()

    def profit(agent):
        return agent.get_pos_value() + agent.position * fv + agent.cash

    dev_p = profit(sim.agents[0])
    bg_p  = float(np.mean([profit(sim.agents[i]) for i in range(1, ENV['n_bg'] + 1)]))
    return dev_p, bg_p


def _run_cell(args):
    """Worker: run num_runs simulations for one (bg_idx, dev_idx) cell."""
    bg_idx, dev_idx, num_runs = args
    bg  = STRATEGIES[bg_idx]
    dev = STRATEGIES[dev_idx]
    dev_profits, bg_profits = [], []
    for _ in range(num_runs):
        d, b = _build_and_run_sim(bg['shade'], bg['eta'], dev['shade'], dev['eta'])
        dev_profits.append(d)
        bg_profits.append(b)
    return bg_idx, dev_idx, float(np.mean(dev_profits)), float(np.mean(bg_profits))


print('Simulation functions defined.')

Simulation functions defined.


In [5]:
def run_equilibrium_experiment(num_runs=NUM_RUNS, n_processes=None):
    """
    Sweep all N×N (background, deviator) strategy pairs.

    Returns
    -------
    pd.DataFrame of shape (N_STRATEGIES, N_STRATEGIES)
        df.loc['Si', 'Sj'] = mean(deviator profit) - mean(background profit)
        when background plays strategy i and deviator plays strategy j.
    """
    n = N_STRATEGIES
    tasks = [(bg, dev, num_runs) for bg in range(n) for dev in range(n)]
    advantage = np.zeros((n, n))

    on_mac = platform.system() == 'Darwin'
    if on_mac and n_processes != 1:
        if n_processes is None:
            n_processes = mp.cpu_count()
        print(f"Running on {n_processes} cores (fork) — {len(tasks) * num_runs:,} simulations...")
        ctx = mp.get_context('fork')
        with ctx.Pool(n_processes) as pool:
            for bg_idx, dev_idx, dev_mean, bg_mean in tqdm(
                pool.imap_unordered(_run_cell, tasks),
                total=len(tasks), desc='cells',
            ):
                advantage[bg_idx, dev_idx] = dev_mean - bg_mean
    else:
        print(f"Running sequentially — {len(tasks) * num_runs:,} simulations...")
        for args in tqdm(tasks, desc='cells'):
            bg_idx, dev_idx, dev_mean, bg_mean = _run_cell(args)
            advantage[bg_idx, dev_idx] = dev_mean - bg_mean

    labels = [f'S{i}' for i in range(n)]
    return pd.DataFrame(
        advantage,
        index=pd.Index(labels, name='BG Strategy'),
        columns=pd.Index(labels, name='Deviator'),
    )


print('Experiment runner defined.')

Experiment runner defined.


## 5. Run the Experiment

This is the slow cell. Run it once and use the display cells below to explore results without repeating the computation.

In [6]:
df = run_equilibrium_experiment()
print(f"\nComplete. Result shape: {df.shape}")

Running on 12 cores (fork) — 10,000 simulations...


cells:   0%|          | 0/100 [00:00<?, ?it/s]


Complete. Result shape: (10, 10)


## 6. Results

### 6a. Profit Advantage Matrix

Each cell shows **mean(deviator profit) − mean(background profit)**.
- **Green** → deviator earns more than the background agents
- **Red** → deviator earns less
- **Bold + underline** → best deviation strategy for that row (what a rational deviator would choose)

Rows are the background strategy; columns are the deviator's strategy.

In [None]:
# ── Results Setup ─────────────────────────────────────────────────────────────
# Makes section 6 independently runnable after a kernel restart.
# Loads df from the saved CSV if it is not already in memory.
import os
import pandas as pd
import numpy as np

if 'df' not in vars():
    csv_path = 'equilibrium_results.csv'
    if not os.path.exists(csv_path):
        raise FileNotFoundError(f'{csv_path} not found — run Section 5 first.')
    df = pd.read_csv(csv_path, index_col=0)
    df.index.name = 'BG Strategy'
    df.columns.name = 'Deviator'
    print(f'Loaded df from {csv_path}')
else:
    print('Using df already in memory.')

if 'DESC' not in vars():
    DESC = {
        0: 'shade=[0,450]   η=0.5', 1: 'shade=[0,600]   η=0.5',
        2: 'shade=[90,110]  η=0.5', 3: 'shade=[140,160] η=0.5',
        4: 'shade=[190,210] η=0.5', 5: 'shade=[280,320] η=0.5',
        6: 'shade=[380,420] η=0.5', 7: 'shade=[380,420] η=1.0',
        8: 'shade=[460,540] η=0.5', 9: 'shade=[950,1050] η=0.5',
    }

if 'STRATEGIES' not in vars():
    STRATEGIES = {
        0: {'shade': [0,   450], 'eta': 0.5}, 1: {'shade': [0,   600], 'eta': 0.5},
        2: {'shade': [90,  110], 'eta': 0.5}, 3: {'shade': [140, 160], 'eta': 0.5},
        4: {'shade': [190, 210], 'eta': 0.5}, 5: {'shade': [280, 320], 'eta': 0.5},
        6: {'shade': [380, 420], 'eta': 0.5}, 7: {'shade': [380, 420], 'eta': 1.0},
        8: {'shade': [460, 540], 'eta': 0.5}, 9: {'shade': [950, 1050], 'eta': 0.5},
    }

print(f'df shape: {df.shape}')


In [8]:
def style_advantage_table(df):
    max_val = df.abs().max().max()
    caption = (
        "Profit Advantage Matrix — mean(deviator profit) \u2212 mean(background profit)  |  "
        "Green\u2009=\u2009deviator earns more  \u00b7  Red\u2009=\u2009deviator earns less  \u00b7  "
        "Bold+underline\u2009=\u2009best deviation per row"
    )
    return (
        df.style
        .background_gradient(cmap='RdYlGn', axis=None, vmin=-max_val, vmax=max_val)
        .highlight_max(axis=1, props='font-weight: bold; text-decoration: underline')
        .format('{:.0f}')
        .set_caption(caption)
        .set_table_styles([
            {'selector': 'caption', 'props': [
                ('font-size', '13px'), ('font-weight', 'bold'),
                ('text-align', 'left'), ('margin-bottom', '10px'), ('color', '#333'),
            ]},
            {'selector': 'th', 'props': [
                ('font-size', '11px'), ('text-align', 'center'),
                ('padding', '5px 10px'), ('border', '1px solid #ccc'),
                ('background-color', '#f5f5f5'),
            ]},
            {'selector': 'td', 'props': [
                ('text-align', 'center'), ('padding', '5px 10px'),
                ('font-size', '11px'), ('border', '1px solid #e0e0e0'),
            ]},
        ])
    )

style_advantage_table(df)

AttributeError: The '.style' accessor requires jinja2

### 6b. Best Deviation Summary

For each background strategy, which strategy does the deviator prefer? Rows highlighted in green are Nash equilibrium candidates.

In [None]:
def make_summary_table(df):
    best_label = df.idxmax(axis=1)   # e.g. 'S3' — best deviator strategy per row
    best_adv   = df.max(axis=1)      # profit advantage of that best deviation

    rows = []
    for bg_label in df.index:
        bg_i  = int(bg_label[1:])
        dev_label = best_label[bg_label]
        dev_i = int(dev_label[1:])
        adv   = best_adv[bg_label]
        is_ne = (bg_label == dev_label)
        rows.append({
            'BG Strategy': bg_label,
            'BG Params':   DESC[bg_i],
            'Best Deviation': dev_label,
            'Dev Params':  DESC[dev_i],
            'Advantage':   round(adv, 1),
            'Nash Eq?':    '\u2713 NE' if is_ne else '',
        })

    summary = pd.DataFrame(rows).set_index('BG Strategy')

    def highlight_ne(row):
        style = 'background-color: #c8f5c8; font-weight: bold' if row['Nash Eq?'] else ''
        return [style] * len(row)

    return (
        summary.style
        .apply(highlight_ne, axis=1)
        .format({'Advantage': '{:.1f}'})
        .set_caption('Best Deviator Strategy per Background Strategy  (green = Nash equilibrium candidate)')
        .set_table_styles([
            {'selector': 'caption', 'props': [
                ('font-size', '13px'), ('font-weight', 'bold'),
                ('text-align', 'left'), ('margin-bottom', '10px'),
            ]},
            {'selector': 'th', 'props': [
                ('font-size', '11px'), ('text-align', 'center'),
                ('padding', '5px 12px'), ('border', '1px solid #ccc'),
                ('background-color', '#f5f5f5'),
            ]},
            {'selector': 'td', 'props': [
                ('font-size', '11px'), ('text-align', 'center'),
                ('padding', '5px 12px'), ('border', '1px solid #e0e0e0'),
            ]},
        ])
    )

make_summary_table(df)

AttributeError: The '.style' accessor requires jinja2

### 6c. Nash Equilibrium Analysis

In [None]:
best_label = df.idxmax(axis=1)
ne_candidates = [label for label in df.index if best_label[label] == label]

print('=' * 60)
print('NASH EQUILIBRIUM ANALYSIS')
print('=' * 60)

if ne_candidates:
    print(f'\n{len(ne_candidates)} Nash equilibrium candidate(s) found:\n')
    for label in ne_candidates:
        i = int(label[1:])
        s = STRATEGIES[i]
        adv = df.loc[label, label]
        print(f'  {label}  shade={s["shade"]}  eta={s["eta"]}')
        print(f'       Best deviation = {label} (same strategy, advantage = {adv:.1f})')
else:
    print('\nNo pure-strategy Nash equilibrium found among the 10 strategies.')
    print('\nStrategies ranked by how close they are to being equilibria')
    print('(smallest gap between best-deviation advantage and own-strategy advantage):\n')
    best_adv  = df.max(axis=1)
    diag_adv  = pd.Series({label: df.loc[label, label] for label in df.index})
    gap = best_adv - diag_adv
    gap_df = pd.DataFrame({
        'Strategy':      gap.index,
        'Params':        [DESC[int(l[1:])] for l in gap.index],
        'Own Adv':       [round(diag_adv[l], 1) for l in gap.index],
        'Best Dev Adv':  [round(best_adv[l], 1)  for l in gap.index],
        'Gap':           [round(g, 1) for g in gap.values],
    }).sort_values('Gap').reset_index(drop=True)
    print(gap_df.to_string(index=False))

print('\n' + '=' * 60)

## 7. Save Results

In [None]:
out_path = 'equilibrium_results.csv'
df.to_csv(out_path)
print(f'Advantage matrix saved to {out_path}')