**The laning phase** is insanely complex, imo under only teamfighting when it comes to skill ceiling

> How is it that even among pros we can see consistent mid gaps?

image.png

*How does Chovy have higher CS in this matchup? Sylas vs Pantheon (KR Challenger)*

**INTUITION:** Naturally, since league is a PvP game it would make sense that we can maybe use some game theory to develop an intuition behind how we should act. Although the actual game is far more complex than this brief introduction will model, im hoping it will atleast help allow some understanding behind how we *should* play


*   We can think of waves as a **repeated game**, i.e. each time our waves meet, it's like a turn for each player
*   Then, at each turn, what should we do? Shove? Slow-push? Freeze?

**ASSUMPTIONS:** For now, we just want to look at the macro-decisions we can make:

1.   We will ignore the *micro* behind trading. So let's assume both players are equally skilled, and act appropriately enough to where this is not a factor.

2.   Sophistications that are kinda champion specific, e.g. item spikes, level-up timers, so on and so forth are too difficult to model and unnecessary for our purposes
<br>
<br>

# **MODEL:** We model laning as a **repeated 2-player game** played over discrete turns (waves)

$$
t = 0,1,\dots,T-1.
$$

Each turn, both players choose an action simultaneously. The lane state updates, and each player receives a reward.

---

## 1) **State Variables**

At turn $t$, the lane state is:

$$
s_t = (w_t,\; m_t,\; v_t,\; g_t)
$$

where:

- $w_t \in \{-2,-1,0,1,2\}$: **wave position**  
  (negative = closer to you, positive = closer to enemy)

- $m_t \in \{0,1,2\}$: **wave stack size**  
  (0 = normal wave, 2 = large stacked wave)

- $v_t \in \{0,1\}$: **vision indicator**  
  ($v_t=1$ = you have vision, $v_t=0$ = no vision)

- $g_t \in \mathbb{R}$: **advantage / gold difference proxy** (you âˆ’ enemy)

---

## 2) **Actions**

Action set:

$$
\mathcal{A}=\{\text{SP},\text{SH},\text{F}\}
$$

- $\text{SP}$: slow-push (build a stacked wave)  
- $\text{SH}$: shove (fast push / crash now)  
- $\text{F}$: freeze (hold wave on your side; deny/safe)

Let your action be $a_t$, and the opponent's action be $b_t$.

---

## 3) **Vision Update** (Random)

Vision refreshes each turn as a Bernoulli random variable:

$$
v_{t+1} \sim \mathrm{Bernoulli}(p_v)
$$

so $\Pr(v_{t+1}=1)=p_v$.

---

## 4) **Transition Model**

### 4.1 *Action pressure*

Define lane pressure:

$$
\Delta w(x)=
\begin{cases}
+1 & x=\text{SP}\\
+2 & x=\text{SH}\\
-1 & x=\text{F}
\end{cases}
$$

### 4.2 *Wave position update*

$$
w_{t+1}=\mathrm{min}\Big({max}(w_t+\Delta w(a_t)-\Delta w(b_t),\,-2),\,2\Big)
$$


### 4.3 *Wave stack update*

$$
m_{t+1}=
\begin{cases}
\min(2,\;m_t+1) & a_t=\text{SP}\ \text{and}\ w_{t+1}<2\\[4pt]
0 & a_t=\text{SH}\ \text{and}\ w_{t+1}=2\\[4pt]
\max(0,\;m_t-1) & a_t=\text{F}\\[4pt]
m_t & \text{otherwise}
\end{cases}
$$

Interpretation:

- $\text{SP}$ Tries to stack your wave over time
- $\text{SH}$ Tries to crash the wave  
- $\text{F}$  Tries to freeze

---

## 5) **Interaction Payoff** $M[a,b]$

We include a small "denial" payoff based on the action matchup (rock-paper-scissors style). E.g. if we freeze when our opponent shoves, we are rewarded:

- $\text{SH}$ counters $\text{SP}$  
- $\text{F}$ counters $\text{SH}$  
- $\text{SP}$ counters $\text{F}$  

For a constant $\epsilon>0$ (e.g., $\epsilon=0.3$).

$$
M[a,b]=
\begin{cases}
+\epsilon & a \text{ counters } b\\
-\epsilon & b \text{ counters } a\\
0 & a=b
\end{cases}
$$

Payoff table (to the row player):

$$
\begin{array}{c|ccc}
M[a,b] & b=\text{SP} & b=\text{SH} & b=\text{F}\\ \hline
a=\text{SP} & 0 & -\epsilon & +\epsilon\\
a=\text{SH} & +\epsilon & 0 & -\epsilon\\
a=\text{F}  & -\epsilon & +\epsilon & 0
\end{array}
$$

---

## 6) **Reward Function**

Per-turn reward to you. Note the use of indicator functions, (e.g. 1 *[condition]).

$$
r(s_t,a_t,b_t)=M[a_t,b_t] + B_{\text{plates}} + B_{\text{deny}} + B_{\text{crash}} - L\cdot \text{gank}_t
$$

### * 6.1 Plates / pressure bonus (shove on enemy side)*

$$
B_{\text{plates}}=\beta_{\text{plates}}\;\mathbf{1}[a_t=\text{SH}]\mathbf{1}[w_t\ge 1]
$$


### * 6.2 Deny / safety bonus (freeze on your side)*

$$
B_{\text{deny}}=
\beta_{\text{deny}}\mathbf{1}[a_t=\text{F}]\mathbf{1}[w_t\le -1]
+
\beta_{\text{deny,bonus}}\mathbf{1}[a_t=\text{F}]\mathbf{1}[w_t\le -1]\mathbf{1}[b_t\in\{\text{SP},\text{SH}\}]
$$

Explanation: We have two terms which may be a unintuitive at first glance. One rewards having the wave on our side since this is safer/more dangerous for the enemy, the second is to encourage this behavior even more in the case that the enemy wants to crash it (e.g. they want to move to a play)

### * 6.3 Crash bonus (convert a slow-push into a crash)*

$$
B_{\text{crash}}=\beta_{\text{crash}}\;\mathbf{1}[w_{t+1}=2]\mathbf{1}[m_t=2]
$$

### * 6.4 Gank penalty (risk when extended with no vision)*

$$
\text{gank}_t \sim \mathrm{Bernoulli}(p_{\text{gank}}(s_t,a_t))
$$

$$
p_{\text{gank}}(s_t,a_t)=(1-v_t)\cdot \mathrm{min}\Big({max}(q_0 + q_1\mathbf{1}[w_t=2] + q_2\mathbf{1}[a_t=\text{SH}],\,0),\,1\Big)
$$


Explanation: We want a way to punish "reckless" gameplay, so this is just a term that says if you get ganked in a bad spot (e.g. trying to shove on their side) there is a penalty.

---

## 7) *Gold-Difference Update*

Let $g_t$ track gold difference. Update using both players' rewards:

$$
g_{t+1}=g_t + \big(r_t - r^{\text{'}}_t\big).
$$

---

## 8) * Objective*

A policy $\pi$ maps states to actions (deterministic or probabilistic):

$$
\pi(a\mid s)=\Pr(a_t=a\mid s_t=s).
$$

Each player aims to maximize expected discounted return over $T$ turns:

$$
\max_\pi \mathbb{E}\left[\sum_{t=0}^{T-1}\gamma^t \, r(s_t,a_t,b_t)\right],
\qquad 0<\gamma\le 1.
$$

The expectation is taken over randomness in the environment (vision and gank events) and the opponent's behavior.









**Reinforcement-Learning Set-Up :**
We can now use some RL techniques to try and derive some (hopefully) interpretable results. We will use a style that is reflective of reality (or atleast my interpretation of such), being a **two-agent** set-up, where we train each side against a *frozen* version of the other for a set-period, before switching and training the other. I think this makes sense; in a real-game, players don't immediately adjust. We will use a **Q-learning** approach since this is more my style of coaching (i.e. we want to view the game in such a way that we will act optimally in future plays, since this is what we should strive for). Networks like DQN are definitely overkill for our purposes, although maybe this could be useful if the model were to be extended to incorporate *micro* details.

---
With this model now, we can study:
- **Best responses** (what beats a given policy, e.g. should i freeze if they slow push?),
- **Exploitability** (how punishable a strategy is, e.g. if they **always** look to freeze, what should I do?),
- We can observe what the optimal action split is, i.e. how often should we look to slow push/shove/freeze

2 phases for policies :

1. Determinstic training phase : To reduce complexity and use our "prior intuition" we will use deterministic policies to train our models, e.g. always freeze when shove

2. Stochastic training phase : After behavior starts to appear consistent, we will add in more "stochastic" models to introduce realism, i.e. more consistent with game theory

At the end of these phases we will take our learned q table and interpret for results, (i.e. what is best action given state i).