# LQ Game: Basics

### This notebook demonstrates how to build a 2-player LQ game from scratch with references to available factory methods.

### 1. Imports
To make the files in `src/` available to jupyter notebook, we have to add the root path of this directory to the Python path.  
Furthermore, we use `%matplotlib` inline to ensure plots are displayed in the notebook, instead of additional windows.

In [None]:
import numpy as np
import sys
import os
import matplotlib.pyplot as plt
# Get the absolute root path of this repository
root_path = os.path.abspath(os.path.join(os.getcwd(), "../.."))
# Add it to the Python path
if root_path not in sys.path:
    sys.path.insert(0, root_path)

from src.gamecore import (
    LinearSystem,
    QuadraticCost,
    LinearStrategy,
    LQPlayer,
    LQGame,
    SystemTrajectory,
    feedback_nash_equilibrium
)
%matplotlib inline

### 2. Define the system dynamics
A linear system is defined by:
- the state matrix A (shared among all players),
- and one input matrix B_i per player.
 
The constructor `LinearSystem` expects the input matrices to be given as a list:
    Bs = [B_1, B_2, ..., B_N]

This codebase uses the naming conventions:
- a trailing 's' (like `Bs`, `costs`, `players`) indicates a list of objects of the respective singular form (`B`, `cost`, `player`).
- a trailing '_i' (like `B_i`, `Q_i`, `R_i`) indicates an element of the player under consideration
- a trailing '_j' or 'k' (like `B_j`, `Q_j`, `R_ijk`) corresponds to the *other* players from the perspective of player `i`
- indices start at 0, so in general we have i, j, k in {0, 1, ..., N-1}

**Tip**:  
For convenience (or randomized experiments), you can use several factory methods to generate a system.
Have a look at [src/utils/core_factories/system_factory.py](../../src/utils/core_factories/system_factory.py)

In [None]:
A = np.array([[0, 1], [-1, -0.5]])
B0, B1 = np.array([[0], [1]]), np.array([[0], [0.5]])
Bs = [B0, B1]

system = LinearSystem(A=A, Bs=Bs)

### 3. Define the cost functions
Each player i seeks to minimize an infinite-horizon quadratic cost of the form:

    J_i = ∫ [ xᵀ Q_i x + ∑_{j,k} u_jᵀ R_ijk u_k ] dt

Here, Q_i penalizes state deviation, and R_ijk penalizes the control effort product of player j and k.  
The R_ijk matrices are passed as a dictionary R_i = {(j,k): R_ijk}, where the keys start with a zero index.  
The game constructor will later do some checks to ensure a positive definite R_iii, a positive semidefinite Q_i and in general symmetric matrices Q_i and R_ijj (whereas R_ijk for j!=k are unrestricted).

**Tip**:  
For convenience (or randomized experiments), you can use several factory methods to generate cost functions.
Have a look at [src/utils/core_factories/cost_factory.py](../../src/utils/core_factories/cost_factory.py)

In [None]:
Q0 = np.eye(2)
Q1 = np.diag([10.0, 1.0])

R000 = 1.0 * np.eye(1)
R011 = 0.1 * np.eye(1)
R100 = 0.2 * np.eye(1)
R111 = 1.0 * np.eye(1)

R0 = {(0, 0): R000, (1, 1): R011}
R1 = {(0, 0): R100, (1, 1): R111}

cost0 = QuadraticCost(Q=Q0, R=R0)
cost1 = QuadraticCost(Q=Q1, R=R1)

## 4. Define policies for both players
Each player’s strategy is a linear state-feedback controller:
    u_i(t) = -K_i x(t)

The gain matrix K_i is wrapped in a `LinearStrategy` object.

**Tip**:  
For convenience (or randomized experiments), you can use several factory methods to generate policies.
Have a look at [src/utils/core_factories/strategy_factory.py](../../src/utils/core_factories/strategy_factory.py)

In [None]:
K0 = np.array([[1.0, 0.0]])
K1 = np.array([[0.0, 1.0]])

strategy0 = LinearStrategy(K=K0)
strategy1 = LinearStrategy(K=K1)

## 5. Create player objects
Each player combines a strategy and a cost function. Furthermore, it tracks the strategies if the player adapts or learns in form of a `StrategyTrajectory`, initialized with the given strategy.   
You can create players directly or use one of the factory methods in [src/utils/core_factories/player_factory.py](../../src/utils/core_factories/player_factory.py), which also handle the cost and strategy creation.

In [None]:
player0 = LQPlayer(strategy=strategy0, cost=cost0, player_idx=0)
player1 = LQPlayer(strategy=strategy1, cost=cost1, player_idx=1)

## 6. Create the game
The game couples the system dynamics with all player objects. Players are expected to be given as list. Supports continuous-time system dynamics (type="differential") and discrete-time system dynamics (type="dynamic").

Of course, there are also factory methods for the game object, covering all previous steps: [src/utils/core_factories/game_factory.py](../../src/utils/core_factories/game_factory.py)

In [None]:
game = LQGame(system=system, players=[player0, player1], type="differential")

## 7. Simulate the game from an initial state
The game integrates the system forward in time using the strategies of the players.  
Returns a [`SystemTrajectory`](../src/core/trajectory.py) object containing x(t), u(t), and list of cost values, which represent the total cost induced by these trajectories for each player.

In [None]:
def print_player_costs(costs: list[float]):
    """
    Print the cost for each player in the game.
    """

    print(f"=== Costs of all players ===")
    for i, cost in enumerate(costs):
        print(f"Player {i} cost: {cost:.2f}")

def plot_system_trajectories(
    traj: SystemTrajectory,
    state_labels: list[str] = None,
    control_labels: list[str] = None,
    skip_plt_show: bool = False
) -> tuple[plt.Figure, plt.Axes]:
    """
    Plots the state and control trajectories of a simulated LQ game using matplotlib.

    Parameters
    ----------
    traj : Trajectory
        Contains the time vector, state and control trajectories.
    state_labels : list of str, optional
        Labels for each state variable. If None, generic labels ["x_0", "x_1", ...] are used.
    control_labels : list of str, optional
        Labels for each control input. If None, generic labels ["u_0", "u_1", ...] are used.
    skip_plt_show : bool, optional
        Skips the plt.show() command at the end. Useful, if more figures are displayed after this function.
        (Only the last plt.show() should be executed, otherwise the figures block each other)

    Returns
    -------
    tuple[plt.Figure, plt.Axes]
        The figure and axes objects containing the plots.
    """
    t = traj.t
    x, us = traj.x, traj.us

    n = x.shape[1]
    N = len(us)

    if state_labels is None:
        state_labels = [f"$x_{{{i}}}$" for i in range(n)]
    if control_labels is None:
        control_labels = [f"$u_{{{i}}}$" for i in range(N)]

    # Plot setup
    fig, axs = plt.subplots(1, 2, figsize=(8, 4))

    # --- State plot ---
    for i in range(n):
        axs[0].plot(t, x[:, i], label=state_labels[i], linewidth=2.5)
    axs[0].set_title("State Trajectory", fontsize=14)
    axs[0].set_xlabel("Time", fontsize=12)
    axs[0].set_ylabel("State", fontsize=12)
    axs[0].grid(True)
    axs[0].legend(fontsize=10)
    axs[0].set_xlim([t[0], t[-1]])

    # --- Control plot ---
    for i, u_i in enumerate(us):
        for j in range(u_i.shape[1]):
            label = f"{control_labels[i]}$_{{{j}}}$" if u_i.shape[1] > 1 else control_labels[i]
            axs[1].plot(t, u_i[:, j], label=label, linewidth=2.5)
    axs[1].set_title("Control Inputs", fontsize=14)
    axs[1].set_xlabel("Time", fontsize=12)
    axs[1].set_ylabel("Input", fontsize=12)
    axs[1].grid(True)
    axs[1].legend(fontsize=10)
    axs[1].set_xlim([t[0], t[-1]])

    plt.tight_layout()
    if not skip_plt_show:
        plt.show()
    return fig, axs
        
x0 = np.array([1.0, 0.0])
traj = game.simulate_system(x0, T=10.0)
print_player_costs(traj.costs)
fig, axs = plot_system_trajectories(traj)

## 8. Lyapunov-based cost calculation
In LQGames, we have the possibility to calculate an analytic cost value using the lyapunov matrix.  
Note that they are different from the trajectory-based costs above and furthermore independent of the initial condition and numerical problems like, e.g., a simulation horizon chosen too short, which might goes unnoticed in experiments.  
Hence, this cost should be preferred.

Note: The Lyapunov-based cost is only defined for stable, linear time-invariant systems, whereas the trajectory-based cost is fully flexible.

In [None]:
print_player_costs(game.strategies_costs())

## 9. Nash Equilibrium
The LQGame class features an implementation of "Algorithm 6" from Engwerda, "Algorithms for computing Nash equilibria in deterministic LQ games" to iteratively compute a Nash Equilibrium of the LQ Game, using the system dynamics and the cost functions.  
Important to note are:
- There my be an arbitrary number of equilibria in infinite horizon LQ Games, ranging from zero to infinity. If the algorithm converges, it only yields one and does not make any statement about its uniqueness.
- As iterative algorithm, an initial set of policies is needed, which needs to yield a stable closed loop system. This set can be given as argument, otherwise the current policies of the players are used. Either way, the stability will be checked.

In [None]:
nash_strategies = feedback_nash_equilibrium(game)
game.adopt_strategies(nash_strategies)
traj = game.simulate_system(x0, T=10)
print_player_costs(game.strategies_costs())
fig, axs = plot_system_trajectories(traj)