# Reinforcement Learning
### Markov Decision Process (MDP)
A **Markov decision process** (MDP) is a mathematical framework used for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision-maker.
<br>MDPs are used in a wide range of applications, including:
- Robotics (path planning, control),
- Game AI (decision-making in games),
- Finance (portfolio management),
- Healthcare (treatment planning),
- Autonomous systems (self-driving cars).

The **Markov property** is a key assumption in MDPs that ensures the future depends only on the current state and action, not on the history of past states and actions. This property simplifies the modeling and solving of sequential decision-making problems and is the foundation of many algorithms in reinforcement learning and dynamic programming.
<hr> 

In a Markov Decision Process (MDP), the **agent-enviornment interaction** follows a sequential decision-making process. The agent (e.g., the robot) interacts with the environment (the grid world, for example) by taking actions, transitioning between states, and receiving rewards.
<br> An example is given for a 2-by-2 grid world in which a robot can take one of four actions *Up*, *Down*, *Left*, and *Right*. Based on the curent state and chosen action, the state changes and a reward is received. The state-transition is goverened by transition probabilties defined for our grid world (the transition probabilitise are fictional). Morover, the *goal state* is state 4 of the grid world. 
<hr>
https://github.com/ostad-ai/Reinforcement-Learning
<br> Explanation: https://www.pinterest.com/HamedShahHosseini/Reinforcement-Learning

In [1]:
# Importing required module
import random

In [2]:
# Example
# Simulates the next state and reward based on the current state and action
# for the grid world 2*2.
# The transition probabilities are fictional
def grid_world_transition(state, action):
    """
    Args:
        state (int): Current state (1, 2, 3, or 4).
        actions (str): ("Up", "Down", "Left", or "Right").
    
    Returns:
        next_state (int): Next state.
        reward (int): Reward received.
    """
    if state == 1:
        if action == "Right":
            next_state = random.choices([2, 1, 3], weights=[0.7, 0.2, 0.1])[0]
            reward = -1
        elif action == "Down":
            next_state = 3
            reward = -1
        else:
            next_state = 1
            reward = -1
    
    elif state == 2:
        if action == "Down":
            next_state = random.choices([4, 2], weights=[0.8, 0.2])[0]
            reward = 10 if next_state == 4 else -1
        elif action == "Left":
            next_state = 1
            reward = -1
        else:
            next_state = 2
            reward = -1
    
    elif state == 3:
        if action == "Up":
            next_state = random.choices([1, 3], weights=[0.8, 0.2])[0]
            reward = -1
        elif action == "Right":
            next_state = 4
            reward = 10
        else:
            next_state = 3
            reward = -1
    
    elif state == 4:
        next_state = 4
        reward = 10
    
    else:
        raise ValueError("Invalid state. State must be 1, 2, 3, or 4.")
    
    return next_state, reward

In [3]:
# Example usage
current_state = random.choice([1,2,3,4])
action = random.choice(["Left","Right","Up", "Down"])
next_state, reward = grid_world_transition(current_state, action)
print(f"Current State: {current_state}, Action: {action}")
print(f"Next State: {next_state}, Reward: {reward}")

Current State: 1, Action: Right
Next State: 2, Reward: -1
