# The Markov Property

The Markov property simplifies the modeling process, as it allows decision-makers to focus on the present state without the need to consider the entire history of states and actions.

## Markov Property
1. The future is independent of the past, given the present. (memory less)
2. MDPs are memoryless; the current state contains all relevant information for decision-making.

In an MDP, the agent interacts with the environment at discrete time steps $t = 1, 2, 3, \ldots$:

![MDP Diagram](../Resources/Images/RL/MDP.png)

1. At each time step, the agent receives a representation of the environment, known as the state ($S_t \in S$).
2. Based on the state, the agent selects an action ($A_t \in A(s)$).
3. At the next time step, the agent receives a numerical reward based on its action ($R_{t+1} \in R$), and the agent transitions to a new state ($S_{t+1}$).
4. Therefore, the trajectory is: 
   \[
   S_0, A_0, R_1, S_1, A_1, R_2, S_2, A_2, R_3, \ldots
   \]

## Finite MDP

In a finite MDP, the sets of states, actions, and rewards all have a finite number of elements. In this case, the random variables $R_t$ and $S_t$ have well-defined discrete probability distributions.

For particular values of those random variables $s' \in S$ and $r' \in R$, there is a probability of those values occurring at time $t$, given the values of the previous state and action:

$$
p(s', r' \mid s, a) = \Pr(S_t = s', R_t = r' \mid S_{t-1} = s, A_{t-1} = a)
$$

This probability satisfies the following normalization condition:

$$
\sum_{s' \in S} \sum_{r' \in R} p(s', r' \mid s, a) = 1 \quad \forall s \in S, a \in A(s)
$$
