<center><img src="Fig/UGA.png" width="30%" height="30%"></center>
<center><h3>Master of Science in Industrial and Applied Mathematics (MSIAM)  - 1st year</h3></center>
<hr>
<center><h1>Numerical Optimization</h1></center>
<center><h2>Lab 9: Min-Max problem and Zero-sum games</h2></center>

Let us consider a game with 2 players, both having $n$ possible actions.


They play against each other and whenever Player 1 plays action \#i and Player 2 plays action \#j, P1 gets a reward of $g_{ij}\in\mathbb{R}$ while P2 gets $-g_{ij}\in\mathbb{R}$ (hence the name zero sum).


The goal for both players is to find a Nash Equilibrium, that is a probability distribution over the actions for each player such that neither player has an individual interest to deviate from this strategy.

## Formulation of the Nash Equilibrium as the solution of a Min-Max problem

Let us denote by $x$ the probability distribution of the actions of P1 (its "strategy"), and $y$ the one of P2. 

Both $x$ and $y$ are probability distributions over $n$ possible actions, thus they both belong to the simplex of size n:
$$ \Delta_n = \left\{ p \in \mathbb{R}^n : p\geq 0 , \sum_{i=1}^n  p_i = 1 \right\} . $$


Then, it can be shown that the NE is achieved by $(x^\star,y^\star)$ solution of the problems
\begin{align}
\tag{P1}
 x^\star = \arg\max_{x\in\Delta_n} \min_{y\in\Delta_n} x^\top A y
\end{align}
where $A$ is the $n\times n$ matrix such that $A_{ij} = g_{ij}$, the reward of P1 for actions $i$ and $j$.

Similarly, we have
\begin{align}
\tag{P2}
 y^\star = \arg\min_{y\in\Delta_n} \max_{x\in\Delta_n} x^\top A y
\end{align}
with the same matrix $A$.

# Numerical computation of constrained Min-Max problems

In this lab, we will first consider a zero-sum game characterized by matrix $A=\left[\begin{array}{cc} -6 & 9 \\  4 & -6 \end{array}\right]$ .

In [None]:
import numpy as np
import scipy.optimize as scopt

In [None]:
n = 2; m =2 # Dimension
A = np.array([[-6,9],[4,-6]])
A

In [None]:
n,m = A.shape

# Method 1: Linear Programming

### Optimal strategy for x

We begin by finding the optimal $x^\star$.

> **1.** Reformulate the problem (P1) into a linear program and solve it using a LP solver.

### Optimal strategy for y

> **2.** Do the same thing with (P2) to find $y^\star$.

### Value of the game

> **3.** Compare the values of problems (P1) and (P2). What is remarkable about $A y^\star$? About $A^\top x^\star$? 

# Method 2: Optimization 

Finding the solution of a min-max optimization problem is harder in general than for a simple minimization problem. Nevertheless, it can still be achieved by first-order ``gradient-like'' methods. This kind of setup has attracted a lot of interest in the 2020's for the training of Generative Adversarial Networks (GANs). 

To do so, we can define $X=(x,y)\in \Delta_n\times\Delta_n$ and $v(X) = (-A y, A^\top x)$. To solve the problem
\begin{align}
\tag{P}
\max_{x\in\Delta_n} \min_{y\in\Delta_n}  x^\top A y ,
\end{align}
we can try to move oppositely to its direction (ie. do a gradient ascent on $ x\mapsto x^\top A y $ and a gradient descent on  $ y\mapsto x^\top A y $:
\begin{align}
    \tag{Gradient Descent Ascent}
    X_{k+1} = \mathrm{proj}_{\Delta_n\times\Delta_n} (X_k-\gamma_k v(X_k)).
\end{align}


We first define the vector field

In [None]:
def v(X):
    x = X[0:n]
    y = X[n:]
    return np.concatenate((-A.dot(y),A.T.dot(x)))

And the dimension of the variables space.

In [None]:
N = 2*n

We also need the projection to the contraints: $\Delta_n\times\Delta_n$

> **4.** Implement a function that projects a vector onto $\Delta_n\times\Delta_n$

In [None]:
def proj_simplex(v):
    ## TODO
    return v

def proj_2simplex(X):
    x = X[0:n]
    y = X[n:]
    return np.concatenate((proj_simplex(x),proj_simplex(y)))


#### Gradient Descent-Ascent

> **5.** Run Gradient Descent Ascent by completing the code below.

In [None]:
X = proj_2simplex(np.ones(N))
K = 1000
step = 0.01

X_tab_GDA = np.copy(X)

for k in range(1,K):
    X = X ## Step to fill
    if k%5==0:
        if k%25==0: print("ite. {:3d} : x= [{:.3f},{:.3f}] | y= [{:.3f},{:.3f}]".format(k,X[0],X[1],X[2],X[3]))
        X_tab_GDA = np.vstack((X_tab_GDA,X))

> **6.** What do you observe in terms of convergence?

#### Extragradient

To overcome the issues with gradient descent-ascent, the ExtraGradient method was proposed:
\begin{align}
    \tag{ExtraGradient}
    \left\{ 
        \begin{array}{l}
            X_{k+1/2} = \mathrm{proj}_{\Delta_n\times\Delta_n} (X_k-\gamma_k v(X_k) ) \\
        X_{k+1} =  \mathrm{proj}_{\Delta_n\times\Delta_n} (X_k-\gamma_k v(X_{k+1/2})))
        \end{array}
    \right. 
\end{align}
which intuitively consists in generating a leading point that will look forward the value of the field and apply it to the base point. This way, circular effects can be managed and convergence can be restored.


> **7.** Run ExtraGradient by completing the code below.

In [None]:
X = proj_2simplex(np.ones(N))
K = 1000
step = 0.01

X_tab_EG = np.copy(X)

for k in range(1,K):
    X_lead = X ## Step to fill
    X      = X ## Step to fill
    if k%5==0:
        if k%25==0: print("ite. {:3d} : x= [{:.3f},{:.3f}] | y= [{:.3f},{:.3f}]".format(k,X_lead[0],X_lead[1],X_lead[2],X_lead[3]))
        X_tab_EG = np.vstack((X_tab_EG,X_lead))
        

#### Comparison

> **8.** Compare Gradient and ExtraGradient on the plot below.


In [None]:
import matplotlib.pyplot as plt

plt.figure()
plt.plot(X_tab_GDA[:,0],X_tab_GDA[:,2],color="red",label="GDA")
plt.plot(X_tab_EG[:,0],X_tab_EG[:,2],color="blue",label="EG")
plt.title("Behavior of x[1] and y[1]")
plt.legend()
plt.show()

#### Mirror Prox

A possibility to make the projections above easier to compute is to change the (implicit) Euclidean metric.
    For the simplex, an efficient example is the \emph{Kullback-Liebler} divergence $D(x,y) = \sum_{i=1}^n x_i\log(x_i/y_i) - \sum_{i=1}^n (x_i-y_i)$, which serve as a metric on strictly positive vectors.
    
With this metric, for any positive vector $y$,
    \begin{align}
        \mathrm{proj}^{KL}_{\Delta_n} (y) = \arg\min_{u\in\Delta_n} D(u,y)  = \frac{y}{ \sum_{i=1}^n y_i} = \frac{y}{ \|y\|_1}
    \end{align}
    which is much easier to compute.
    
By changing the metric of the Extragradient algorithm, by going from $X_{k+1}=\arg\min_X\{ -\gamma\langle v(X_k),X\rangle + \frac{1}{2} \|X-X_k\|^2 \}$ to $X_{k+1}=\arg\min_X\{ -\gamma\langle v(X_k),X\rangle + D(X,X_k) \}$} we obtain the Mirror-Prox method.


> **9.** Show that 
> $$ \arg\min_X\{ -\gamma\langle v(X_k),X\rangle + D(X,X_k) \} = X_k \exp(-\gamma v(X_{k} )) $$


The Mirror Prox algorithm then writes:
    \begin{align}
        \tag{Mirror Prox}
        \left\{ 
            \begin{array}{l}
            (a_{k+1/2},b_{k+1/2}) = X_k \exp(-\gamma v(X_k)) \\
            X_{k+1/2} = (\frac{a_{k+1/2}}{\|a_{k+1/2}\|_1},\frac{b_{k+1/2}}{\|,b_{k+1/2}\|_1}) \\
            (a_{k+1},b_{k+1}) = X_k \exp(-\gamma v(X_{k+1/2})) \\
            X_{k+1} = (\frac{a_{k+1}}{\|a_{k+1}\|_1},\frac{b_{k+1}}{\|,b_{k+1}\|_1}) \\
            \end{array}
        \right. .
    \end{align}


This is ExtraGradient but with this adapted geometry.


> **10.** Run Mirror Prox by completing the code below and compare its behavior with the previous methods.

In [None]:
X = proj_2simplex(np.ones(N))
K = 1000
step = 0.05

X_tab_MP = np.copy(X)

for k in range(1,K):
    X_lead = X_lead ## Step to fill
    if k%1==0:
        if k%25==0: print("ite. {:3d} : x= [{:.3f},{:.3f}] | y= [{:.3f},{:.3f}]".format(k,X_lead[0],X_lead[1],X_lead[2],X_lead[3]))
        X_tab_MP = np.vstack((X_tab_MP,X_lead))
        

In [None]:
import matplotlib.pyplot as plt

plt.figure()
plt.plot(X_tab_GDA[:,0],X_tab_GDA[:,2],color="red",label="GDA")
plt.plot(X_tab_EG[:,0],X_tab_EG[:,2],color="blue",label="EG")
plt.plot(X_tab_MP[:,0],X_tab_MP[:,2],color="green",label="MP")
plt.title("Behavior of x[1] and y[1]")
plt.legend()
plt.show()