## Centralized Multi-Agent Path Planning on a Grid

We solve the problem of two agents navigating a grid from their respective start to goal positions **without colliding**, using centralized dynamic programming (value iteration).

---

### 🔹 1. Environment

The environment is a 2D grid of size:

$$
N_r \times N_c = 20 \times 20
$$

Each grid cell is indexed as $(i, j)$ with:

$$
\mathcal{G} = \{0, 1, \dots, N_r - 1\} \times \{0, 1, \dots, N_c - 1\}
$$

Each agent occupies one cell at a time.

---

### 🔹 2. State Space

Each agent $i \in \{1, 2\}$ has its own state space:

$$
\mathcal{S}_i = \mathcal{G}
$$

The **joint state space** is:

$$
\mathcal{S} = \mathcal{S}_1 \times \mathcal{S}_2 = \mathcal{G} \times \mathcal{G}
$$

A state $s \in \mathcal{S}$ is written as:

$$
s = (s_1, s_2) = ((i_1, j_1), (i_2, j_2))
$$

---

### 🔹 3. Start and Goal States

Each agent has a predefined start and goal location:

$$
\begin{aligned}
\text{Agent 1:} \quad & s_1^{\text{start}} = (0, 0), \quad s_1^{\text{goal}} = (19, 19) \\
\text{Agent 2:} \quad & s_2^{\text{start}} = (0, 19), \quad s_2^{\text{goal}} = (19, 0)
\end{aligned}
$$

The joint start and goal states are:

$$
\begin{aligned}
s^{\text{start}} &= (s_1^{\text{start}},\ s_2^{\text{start}}) \\
s^* &= (s_1^{\text{goal}},\ s_2^{\text{goal}})
\end{aligned}
$$

---

### 🔹 4. Action Space

Each agent can move in any of 9 directions:

$$
\mathcal{A}_{\text{ind}} = \{ (a_r, a_c) \mid a_r, a_c \in \{-1, 0, 1\} \}
$$

The **joint action space** is:

$$
\mathcal{A} = \mathcal{A}_{\text{ind}} \times \mathcal{A}_{\text{ind}}
$$

Each joint action $a = (a_1, a_2)$ updates the current state:

$$
T(s, a) = \left(s_1 + a_1,\ s_2 + a_2\right)
$$

where $+$ is element-wise.

---

### 🔹 5. State Transition and Validity

We perform **backward value iteration**, propagating from the goal.

To reach a state $s = (s_1, s_2)$ via action $a = (a_1, a_2)$, the **predecessor state** is:

$$
s' = T^{-1}(s, a) = (s_1 - a_1,\ s_2 - a_2)
$$

This state is **valid** iff:

- Each agent stays within bounds:
  $$
  s_1', s_2' \in \mathcal{G}
  $$

- Agents do **not collide** or **swap positions** (see next section).

---

###  6. Collision Avoidance

#### ✅ No Vertex Collision:
Agents cannot occupy the same cell:
$$
s_1^t \ne s_2^t
$$

#### ✅ No Edge Collision (Swap):
Agents cannot move into each other's current location:
$$
\neg \left(s_1^{t+1} = s_2^t \land s_2^{t+1} = s_1^t \right)
$$

#### ✅ Combined Check in Code:
```python
if pred_pos1 == pred_pos2: continue
if pred_pos1 == current_state[1] and pred_pos2 == current_state[0]: continue
```

###  7. Cost and Reward
Each step incurs a unit cost:

\begin{aligned}
c(s,a) =1
\end{aligned}

No negative rewards or weights — it's a pure shortest path problem.

### 8. Value Function
We define the cost-to-go value function:

\begin{aligned}
V\,:\, \mathcal{S} \, \rightarrow \mathbb{R}_{\geq 0} \, \cup\, \{\infty\}
\end{aligned}

with

\begin{aligned}
V(s^*)=0,\quad V(s)=\infty \text{ for all } s\neq s^*
\end{aligned}

### 9. Value Iteration
For each valid predecessor $s'$, we update:

\begin{aligned}
V(s') \leftarrow \min (V(s'),\, V(s)+1)
\end{aligned}

This is equivalent to backward BFS from the goal with cost-per-step $= 1$.

We use a queue to propagate values outward from the goal until convergence.

### 10. Path Extraction
From the start state $s^{\text{start}}$, we simulate:

\begin{aligned}
s_{t+1}=\arg\min_{s'}\, \{V(s')\mid s'=T(s_t,\,a), \, a\in \mathcal{A},\, s' \text{ valis}\}
\end{aligned}
Until: $s_t=s^*$. Each agent's trajectory is:

\begin{aligned}
\text{Path}_i = \{s_i^t\}_{t=0}^{T_{\text{end}}}
\end{aligned}

### 11. Final Cost
Total cost is the number of timesteps until both agents reach their goals:

\begin{aligned}
\text{Cost }=T_{end} = \max (\text{len }(\text{Path}_1), \text{len }(\text{Path}_2))-1
\end{aligned}

Agents wait at their goals if needed to avoid collisions.
