# Introduction

Source: https://huggingface.co/learn/deep-rl-course/unit1/what-is-rl

Deep RL is a type of Machine Learning where an agent learns <b>how to behave</b> in an environment <b>by performing actions and seeing the results</b>.

# What is Reinforcement Learning?

## The big picture

### The formal definition

> Reinforcement learning is a framework for solving control tasks (also called decision problems) by building agents that learn from the environment by interacting with it through trial and error and receiving rewards (positive or negative) as unique feedback.

# The Reinforcement Learning Framework

Source: https://huggingface.co/learn/deep-rl-course/unit1/rl-framework

## The RL Process

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/RL_process.jpg" style="width:600px;" title="RL process">

A loop of state, action, reward, and next state.

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/sars.jpg" style="width:300px;" title="RL loops outputs sequuence">

<span style="color:blue; font-size: large">The agent’s goal is to <i>maximize</i> its cumulative reward, called the <b>expected return</b>.</span>

## The reward hypothesis: the central idea of reinforcement learning

## Markov property

In papers, you’ll see that the RL process is called a <b>Markov Decision Process</b> (MDP).

The Markov Property implies that our agent needs <b>only the current state to decide</b> what action to take and <b>not the history of all the states and actions</b> they took before.

## Observations/States Space

Observations/States are the <b>information our agent gets from the environment</b>. For e.g., frame in a video game, value of certain stock in trading.

- State $s$: Fully observed environment
    - Chess
- Observation $o$: Partially observed environment
    - Super mario

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/obs_space_recap.jpg" style="width:600px;" title="Observation vs State space">

## Action Space

The Action space is the set of <b>all possible actions in an environment</b>.

- Discrete space: the number of possible actions is finite.
    - e.g., Super mario
- Continuous space: the number of possible actions is infinite.
    - e.g., Self Driving car

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/action_space.jpg" style="width:600px;" title="Action space">

## Rewards and the discounting

<i>Cumulative reward = Sum of all rewards in the sequence<i>
    
The cumulative reward at each time step $t$ can be written as:

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/rewards_1.jpg" style="width:500px;" title="Cumulative reward">

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/rewards_2.jpg" style="width:200px;" title="Cumulative reward">
   

However, in reality, <b>we can’t just add them like that</b>. The rewards that come sooner (at the beginning of the game) <b>are more likely to happen</b> since they are more predictable than the long-term future reward.

Our discounted expected cumulative reward is:

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/rewards_4.jpg" style="width:500px;" title="Discounted exepcted cumulative reward">


where
- $\gamma$ is discount rate, <b>between 0 and 1</b>
    - Larger the $\gamma$, smaller the discount => Agent <b>cares more about long-term reward.</b>
    - Smaller the $\gamma$, larger the discount => Agent <b>cares more abotu the short-term reward.</b>

# The type of tasks

Source: https://huggingface.co/learn/deep-rl-course/unit1/tasks

A task is an <b>instance</b> of a Reinforcement Learning problem.

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/tasks.jpg" style="width:600px;" title="Type of tasks">

## Episodic task

- e.g., Super Mario Bros
    - An episode begins at launch of a new level and ends <b>when we are killed or reach end of the level<b>.

## Continuous task

- e.g., Automated Stock trading
    - There's no starting point and ending point $\Rightarrow$ No terminal state. <b>The agent keeps running till we decide to stop it.</b>

# The Exploration/ Exploitation tradeoff

Source: https://huggingface.co/learn/deep-rl-course/unit1/exp-exp-tradeoff

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/expexpltradeoff.jpg" style="width:600px;" title="Exploration Exploitation Tradeoff">

We need to balance how much we <b>explore the environment</b> and how much we <b>exploit what we know about the environment</b>.

Therefore, we must <b>define a rule that helps to handle this trade-off</b>.

# The two main approaches for solving RL problems

Source: https://huggingface.co/learn/deep-rl-course/unit1/two-methods

## The policy π: the agent’s brain

The Policy <b>π</b> is the <b>brain of our Agent</b>, it’s the <i>function that tells us what <b>action to take given the state</b></i>. So it <b>defines the agent’s behavior</b> at a given time.

<img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/policy_1.jpg" style="width:300px;" title="Policy - The brain of agent, a function that tells us the action to take given the state.">

| Policy-based | Value-based | 
| :-: | :-: |
| <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/pbm_1.jpg" style="width:450px;" title="Policy - The brain of agent, a function that tells us the action to take given the state."> <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/pbm_2.jpg" style="width:450px;" title="Policy - The brain of agent, a function that tells us the action to take given the state."> |  <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/vbm_1.jpg" style="width:450px;" title="Policy - The brain of agent, a function that tells us the action to take given the state."> <img src="https://huggingface.co/datasets/huggingface-deep-rl-course/course-images/resolve/main/en/unit1/vbm_2.jpg" style="width:450px;" title="Policy - The brain of agent, a function that tells us the action to take given the state."> |

## Policy-Based methods

## Value-Based methods

# The “Deep” in Deep Reinforcement Learning

# Summary

# Glossary

# Hands-on

# Quiz

# Conclusion

# Additional Readings