# Install dependecies

In [7]:
# !pip3 install tensorflow
# !pip3 install gym
# !pip3 install keras
# !pip3 install keras-rl2
# !pip3 install gym[classic_control] Invalid

# 1. Test Random Environment with OpenAI Gym

In [1]:
import gym
import random

## Get the environment from the gym library
#### https://www.gymlibrary.dev/api/core/#gym.Env.render
#### `gym.make` takes argument
- Name of the environment (CartPole-v1)
- `render_mode`: human (continous vizualisation), rgb_array (get frame), and more...

In [2]:
env = gym.make("CartPole-v1", render_mode='human')

## Spaces
#### https://www.gymlibrary.dev/api/spaces/#spaces
Spaces are used in Gym to define the format of valid actions and observations. 

### Observation Space
https://www.gymlibrary.dev/api/core/#gym.Env.observation_space

Attribute of the `Env` class.  
This attribute gives the format of valid observations. There are different types of observation space. The most basic one is `Box`.  

The `Box` type has a shape, let's say 4, this denotes a valid observation will be an array of 4 numbers.

In [3]:
states = env.observation_space

In our case: https://gymnasium.farama.org/environments/classic_control/cart_pole/#observation-space

- 0: Cart Position = [-4.8, 4.8]
- 1: Cart Velocity = [-Inf, Inf]
- 2: Pole Angle = [~ -0.418, ~ 0.418] rad
- 3: Pole Angular Velocity = [-Inf, Inf]

In [4]:
states

Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32)

Get higher and lower bound

In [5]:
print(states.high)
print(states.low)

[4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38]
[-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38]


#### Usually, we only need the number of elements that constitutes our observation space (here 4)

In [6]:
states = env.observation_space.shape[0]
states

4

### Action Space
https://www.gymlibrary.dev/api/core/#gym.Env.action_space

Attribute of the `Env` class.  
This format gives the format of valid actions. Actions can be either `Discrete` or `Continuous`.  

If the action space if of type Discrete and gives the value Discrete(2), this means there are two valid discrete actions: 0 & 1.

In [9]:
actions = env.action_space

In our case:
https://gymnasium.farama.org/environments/classic_control/cart_pole/#action-space

- 0: Push the cart to the left
- 1: Push the cart to the right

In [10]:
actions

Discrete(2)

#### Usually, we just need the number of different actions

In [11]:
actions = env.action_space.n
actions

2

## Render episodes with random action

##### `env.reset`: Resets the environment to an initial state and returns the initial observation (`observation_space`).
Parameters:
- seed (optional int)
- options (optionai dict): configuration to reset, depends on environment

##### `env.render`: Compute the render frames as specified by render_mode attribute during initialization of the environment (`gym.make`).

##### `env.step`: Run one timestep of the environment’s dynamics. Takes in parameter an action (`action_space`).

Return values:
- observation
- reward: amount of reward returned as a result of taking the action
- terminated: wether a terminal state is reached.
- truncated: if not terminal state but still should be stopped (ex: out of bound)
- info: dictionary containig various information useful for debugs, metrics...
- done: wether an episode has ended.  A done signal may be emitted for different reasons: Maybe the task underlying the environment was solved successfully, a certain timelimit was exceeded, or the physics simulation has entered an invalid state.

In [15]:
# Generate 10 episodes
episodes = 10
for episode in range(0, episodes):
    # Reset environment for beginning of each episode
    state = env.reset()
    terminated= False
    score = 0

    # While episode has not ended
    while not terminated:
        # Render animation
        env.render()
        # Choose at random left or right
        action = random.choice([0, 1])
        # Do action and get info
        _, reward, terminated, _, info = env.step(action) # done is not a return value here
        # Add reward to score of episode
        score += reward
    # Print scores of each episode
    print('Episode:{} Score:{}'.format(episode + 1, score))

Episode:1 Score:16.0
Episode:2 Score:19.0
Episode:3 Score:16.0
Episode:4 Score:46.0
Episode:5 Score:11.0
Episode:6 Score:15.0
Episode:7 Score:20.0
Episode:8 Score:12.0
Episode:9 Score:11.0
Episode:10 Score:26.0
