# Finite-Horizon Dynamic Programming

- A **dynamic programming problem** is an optimization problem in which decisions have to be taken sequentially over several time periods.
- It is usually assumed that the periods are "linked" viz. that actions taken in any particular period affect the decision environment (and reward possibilities) in all future periods. 

### 11.1 - Finite-Horizon Dynamic Programming

A **Finite Horizon (Markovian) Dynamic Programming Problem** (FHDP) is defined by a tuple $\{S, A, T, (r_t, f_t, \Phi_t)_{t = 1}^T\}$ where
1. $S$ is the **state space** of the problem, with generic element $s$,
2. $A$ is the **action space** of the problem, with generic element $a$,
3. $T$, a positive integer, is the **horizon** of the problem,
4. For each $t \in \{1, \dots, T\}$,
   - $r_t : S \times A \to \mathbb{R}$ is the period-$t$ **reward function**,
   - $f_t: S \times A \to S$ is the period-$t$ **transition function**, and
   - $\Phi_t: S \to P(A)$ is the **feasible action correspondence**. 

**Interpretation**: 
- Decision-maker begins from some initial state $s_1 = s \in S$. The set of actions available to the decision maker at this state is given by the correspondence $\Phi_1(s_1) \subset A$.
- When the decision-maker choses an action $a_1 \in \Phi_1(s)$, two things happen:
  - First, the decision-maker receives an immediate reward of $r_1(s_1, a_1)$.
  - Second, the state $s_2$ at the beginning of period 2 as realized as $s_2 = f_1(s_1, a_1)$. At this new state, the set of feasible actions is now given by $\Phi_2(s_2) \subset A$.
- When an action $a_2 \in \Phi_2(s_2)$ is chosen, a reward $r_2(s_2, a_2)$ is recieved, and the period-3 state $s_3$ is realized as $s_3 = f_2(s_2, a_2)$, and so on until the terminal date $T$.

**Objective**:
- Choose a plan for taking actions at each point in time in order to maximize the sum of per-period rewards over the horizon of the model.

I.e., we want to solve $$\text{Maximize } \sum_{t = 1}^T r_t(s_t, a_t) \qquad \text{subject to } \begin{cases} s_1 = s \in S, \\ s_t = f_{t - 1}(s_{t - 1}, a_{t - 1}), \qquad t = 2, \dots, T, \\ a_t \in \Phi_t(s_t), \qquad \qquad\qquad   t = 1, \dots, T.\end{cases}$$