# Reinforcement Learning: An Introduction

### Definition 1
Reinforcement Learning is learning what to do-how to map situations to actions-so as to
maximize a numerical reward signal.
### Definition 2
Reinforcement Learning (RL) is a type of machine learning paradigm in which an agent learns to
make a sequence of decisions by interacting with an environment. The agent receives feedback in
the form of rewards or penalties, enabling it to learn optimal strategies over time.
- Learning by trial-and-error with a delayed reward
- Actions are taken sequentially, current decisions affect future outcomes


## Key Components of RL

![Reinforcement Learning Model](https://images.spiceworks.com/wp-content/uploads/2022/09/29100907/Reinforcement-Learning-Model.png)

### Environment
The external system with which the agent interacts,
providing feedback in the form of rewards
### Agent
The learner and decision-maker that interacts with the
environment
### State (s)
Full representation of the environment
### Action (a)
Decision taken by agent at a given state to transition it
to another state
### Reward (r)
A numerical representation that quantifies the
immediate feedback after the agent takes an action at a
state


## Comparison of Learning Paradigms

| Learning Paradigm      | How They Learn                                           | Interaction with Environment | Feedback Mechanism          |
|------------------------|----------------------------------------------------------|------------------------------|-----------------------------|
| Supervised Learning    | Learns from labeled data with input-output pairs         | Passive: Receives predefined labels | Direct, Explicit            |
| Unsupervised Learning  | Learns patterns and structures in unlabeled data         | Passive: Learns patterns from data  | Indirect, Intrinsic Structures |
| Reinforcement Learning | Learns from interaction with the environment, receiving delayed feedback in the form of rewards | Active: Takes actions in the environment | Delayed, Scalar Rewards      |


## Advantages of RL

- Focuses on the long-term goal
- Easy data collection process
- Learn from interaction
- Operates in an evolving & uncertain environment
- Sequential Decision-Making

## The RL Problem

The Reinforcement Learning (RL) problem is a framework in machine learning where an agent interacts with an environment over a series of discrete time steps. The agent's goal is to learn a policy that maximizes the cumulative rewards it receives from the environment.

RL explicitly considers the whole problem of a goal-directed agent interacting with an uncertain environment. This contrasts with many approaches that consider subproblems without addressing how they might fit into a larger picture.

### Key Components

- **Environment, Agent, State, Action, Reward**
- **Policy (π)**: The strategy or mapping from states to actions that the agent follows. The goal of RL is to learn an optimal policy that maximizes the expected cumulative rewards.
- **Value Function (V)**: The expected cumulative rewards that an agent can expect to receive from a given state onwards, following a specific policy. It helps the agent evaluate the desirability of different states.
- **Model (Optional)**: A learned or known model of the environment's dynamics, which provides information about how the environment will respond to the agent's actions. Models are not always used, but when available, they can assist in planning and decision-making.

## Exploration vs Exploitation

Exploration and exploitation are key aspects of RL. There is always a trade-off between the two, where the agent must balance between:
- Taking new actions to explore the environment and gather information
- Taking the best known strategy so far in a greedy fashion to maximize the reward


## RL Applications

### Gaming
Teaching agents to play games like chess, Go, pong, or video games.

### Robotics
Training robots to perform tasks in real-world environments such as robotic arms for pick-and-place or path planning for ground vehicles.

### Finance
Optimizing investment strategies in financial markets.

### Healthcare
Personalizing treatment plans based on patient data.