## 📘 Centralized Multi-Agent Value Iteration (Two Agents on a Grid)

We consider a grid-based environment with two agents. Each agent starts at a given location and must reach a unique goal location without colliding with the other agent.

---

### 🔹 1. Grid World

Let the environment be a 2D grid of size $N_r \times N_c$, where:

- $N_r = N_c = 20$
- The set of valid positions is:

$$
\mathcal{G} = \{0, 1, \dots, N_r - 1\} \times \{0, 1, \dots, N_c - 1\}
$$

Each agent's individual state space is:

$$
\mathcal{S}_1 = \mathcal{S}_2 = \mathcal{G}
$$

The **joint state space** is:

$$
\mathcal{S} = \mathcal{S}_1 \times \mathcal{S}_2 = \mathcal{G} \times \mathcal{G}
$$

---

### 🔹 2. Agent Start and Goal States

Let:

$$
\begin{aligned}
s_1^{\text{start}} &= (0, 0), \quad &s_1^{\text{goal}} &= (19, 19) \\
s_2^{\text{start}} &= (0, 19), \quad &s_2^{\text{goal}} &= (19, 0)
\end{aligned}
$$

The **joint goal state** is:

$$
s^* = (s_1^{\text{goal}},\ s_2^{\text{goal}})
$$

---

### 🔹 3. Action Space

Each agent may move to a neighboring or diagonal cell, or stay in place. Define:

$$
\mathcal{A}_{\text{ind}} = \{(a_r, a_c) \mid a_r, a_c \in \{-1, 0, 1\}\}
$$

The **joint action space** is:

$$
\mathcal{A} = \mathcal{A}_{\text{ind}} \times \mathcal{A}_{\text{ind}}
$$

A joint action $a = (a_1, a_2)$ moves agents as:

$$
T(s, a) = \left(s_1 + a_1,\ s_2 + a_2\right)
$$

---

### 🔹 4. Value Function

We define a centralized **cost-to-go function**:

$$
V: \mathcal{S} \to \mathbb{R}_{\geq 0} \cup \{\infty\}
$$

Initial conditions:

$$
V(s^*) = 0,\quad V(s) = \infty \text{ for all } s \neq s^*
$$

---

### 🔹 5. Backward Dynamic Programming

For every valid joint predecessor $s'$ of $s$, we update:

$$
V(s') \leftarrow \min\left(V(s'),\ V(s) + 1\right)
$$

where:

$$
s' = T^{-1}(s, a) = (s_1 - a_1,\ s_2 - a_2), \quad a \in \mathcal{A}
$$

A predecessor $s'$ is valid if:
- Each agent's position is in bounds
- $s_1' \ne s_2'$ (no collision)
- Not a swap: $(s_1', s_2') \ne (s_2, s_1)$

---

### 🔹 6. Path Extraction

Starting from the initial joint state:

$$
s_0 = (s_1^{\text{start}},\ s_2^{\text{start}})
$$

The path follows:

$$
s_{t+1} = \arg\min_{s'} \left\{ V(s') \mid s' = T(s_t, a),\ a \in \mathcal{A},\ s' \text{ valid} \right\}
$$

The agent paths are extracted from:

$$
\text{Path}_i = \{s_i^t\}_{t=0}^{T_{\text{end}}},\quad i \in \{1,2\}
$$

---

### 🔹 7. Collision Constraints

For all timesteps $t$:

- No vertex conflict:
  $$
  s_1^t \ne s_2^t
  $$
- No edge conflict (swap):
  $$
  \neg\left(s_1^{t+1} = s_2^t \land s_2^{t+1} = s_1^t\right)
  $$

---

### 🔹 8. Cost of Solution

Let $T_{\text{end}}$ be the last timestep. The total cost is:

$$
\text{Cost} = T_{\text{end}} = \max\left(\text{len(Path}_1),\ \text{len(Path}_2)\right) - 1
$$

Each agent waits at their goal if the other hasn't yet finished.

---
