### Run in collab
<a href="https://colab.research.google.com/github/racousin/rl_introduction/blob/master/notebooks/1_Environment_Agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%%capture
!pip install swig==4.2.1
!pip install box2d-py==2.3.8
!pip install gymnasium[box2d,atari,accept-rom-license]==0.29.1
!pip install pyvirtualdisplay==3.0
!pip install opencv-python-headless
!pip install imageio imageio-ffmpeg
!git clone https://github.com/racousin/rl_introduction.git > /dev/null 2>&1

In [None]:
import warnings
warnings.filterwarnings('ignore')
import seaborn as sns
import numpy as np
from time import sleep
import matplotlib.pyplot as plt
import gymnasium
from time import time,sleep
sns.set_style("darkgrid")

# 1_Environment_and_Agent

### Introduction to Reinforcement Learning (RL)

In RL, we study the interaction between an **agent** and an **environment**. The agent takes actions to achieve a goal, guided by rewards from the environment. Our aim is to develop agents that can learn optimal behaviors through these interactions.



### Creating an Environment

An environment in RL defines the space in which the agent operates. It returns a new state and a reward for each action taken by the agent.

In [None]:
class Env:
    def __init__(self):
        self.state = np.random.randint(2)
        self.done = False

    def step(self, action):
        if (action % 2 == self.state):
            reward = 1
        else:
            reward = -1
        self.state = np.random.randint(2)
        return self.state, reward, self.done, {}, {}

    def reset(self):
        self.state = np.random.randint(2)
        self.done = False
        return self.state

### Building an Agent
Agents in RL decide which actions to take in an environment. A simple agent might act randomly or follow a predetermined policy.



In [None]:
class Agent:
    def __init__(self, env):
        pass

    def act(self, state):
        return np.random.randint(2)

### Running an Experiment

To evaluate our agent's performance, we generate trajectories of state-action-reward sequences and compute the total reward.


In [None]:
def run_experiment(env, agent, nb_steps):
    state = env.reset()
    res = [state]
    for _ in range(nb_steps):
        action = agent.act(state)
        state, reward, done, info, _ = env.step(action)
        res += [action, reward, state]

    return res

## Understanding the Environment and Agent

**Question 1:** What is the **state space** in the provided `Env` class?


**Question 2:** What is the **action space** in the provided `Env`/`Agent` class?


**Question 3:** What is the **Transition model** in the provided `Env` class?


**Question 4:** What is the **Policy** in the provided `Agent` class?


**Question 5:** What is the **Reward Function** in the provided `Env` class?


**Question 6:** What object **run_experiment** is returning?


**Exercise 1:** Instantiating the class `Agent` and `Env` to `run_experiment` on **100 steps**.



**Exercise 2:** Compute the **cumulative reward** and **discouted cumultative reward**, also known as the return value. You can return more information from `run_experiment` to help.


**Question 7:** In this `MDP`, what is the **Expected Return** when following the random policy of the `Agent`?


**Question 8:** what would be the **best policy** function for the `Env` environment?


**Exercise 3:** Implement the best policy function and use it to run the best agent. Compare its performance to the random agent.



## Start with Gymnasium's Environment


In this section, we delve into the diverse range of environments offered by Gymnasium, which is recognized as the gold standard for defining reinforcement learning environments. Our exploration will provide insights into the dynamics of different systems and how they can be modeled and understood within the framework of reinforcement learning.

Execute the code below to initiate and observe experiments across various environments: **'FrozenLake-v1'**, **'CartPole-v1'**, **'LunarLanderContinuous-v2'**, and **'PongNoFrameskip-v4'**. While these experiments run, visit the Gymnasium documentation to acquaint yourself with the detailed characteristics and nuances of each environment.

In [None]:
from rl_introduction.rl_introduction.render_colab import exp_render
# Environments to run experiments on
env_render_configs = [{"name":'FrozenLake-v1', "fps":2, "nb_step":30},
 {"name":'CartPole-v1', "fps":17, "nb_step":120},
  {"name":'LunarLanderContinuous-v2', "fps":30, "nb_step":300},
   {"name":'PongNoFrameskip-v4', "fps":40, "nb_step":800}]
for env_render_config in env_render_configs:
  exp_render(env_render_config)

### Questions on Environment Dynamics


**Question 1:** Actions and States
For each environment (FrozenLake-v1, CartPole-v1, LunarLanderContinuous-v2, PongNoFrameskip-v4), identify the action space and state space. Specify whether each is discrete or continuous, and provide their sizes.


**Question 2:** Transition Models


**Question 3:** Reward Functions

### Exercises on Agent Performance

**Exercise 1:** Running an Experiment

**Exercise 2:** Running Experiments and compute cumulative reward
Conduct 20 experiments for each environment using a random agent. For each environement display the cumulative reward with a discount factor of 0.5 and without discount factor of (=1).
