# Testing custom Gym environment

This experimental notebook checks if the custom Gym environment is set up correctly by running some checks and some basic sample Gym code.


## Table of Contents

- Contact information
- Checking requirements
  - Correct anaconda environment
  - Correct module access
- Testing connect four Gym setup
  - Connect four with random agent
    - Setting up the gym environment
    - Interacting with the environment
    - Visualising the environment in terminal
    - Letting random agent play the game in terminal
    - Visualising the environment in human mode
    - Letting random agent play the game in human mode

<hr><hr>

## Contact information

| Name             | Student ID | VUB mail                                                  | Personal mail                                               |
| ---------------- | ---------- | --------------------------------------------------------- | ----------------------------------------------------------- |
| Lennert Bontinck | 0568702    | [lennert.bontinck@vub.be](mailto:lennert.bontinck@vub.be) | [info@lennertbontinck.com](mailto:info@lennertbontinck.com) |



<hr><hr>

## Checking requirements

### Correct anaconda environment

The `rl-project` anaconda environment should be active to ensure proper support. Installation instructions are available on [the GitHub repository of the RL course project and homeworks](https://github.com/pikawika/vub-rl).

In [1]:
####################################################
# CHECKING FOR RIGHT ANACONDA ENVIRONMENT
####################################################

import os
from platform import python_version

print(f"Active environment: {os.environ['CONDA_DEFAULT_ENV']}")
print(f"Correct environment: {os.environ['CONDA_DEFAULT_ENV'] == 'rl-project'}")
print(f"\nPython version: {python_version()}")
print(f"Correct Python version: {python_version() == '3.8.10'}")

Active environment: rl-project
Correct environment: True

Python version: 3.8.10
Correct Python version: True


<hr>

### Correct module access

The following codeblock will load in all required modules and show if the versions match those that are recommended.

In [3]:

####################################################
# LOADING MODULES
####################################################

# Allow reloading of libraries
import importlib

# Plotting
import matplotlib; print(f"Matplotlib version (3.5.1 recommended): {matplotlib.__version__}")
import matplotlib.pyplot as plt

# Pygame
import pygame; print(f"Pygame version (2.1.2 recommended): {pygame.__version__}")

# Gym environment
import gym; print(f"Gym version (0.21.0 recommended): {gym.__version__}")

# Our custom connect four gym environment
import sys
sys.path.append('../')
import gym_connect4_pygame
importlib.invalidate_caches()
importlib.reload(gym_connect4_pygame)

# Time for allowing "freezes" in execution
import time;

# Used for updating notebook display
from IPython.display import clear_output

Matplotlib version (3.5.1 recommended): 3.5.1
Pygame version (2.1.2 recommended): 2.1.2
Gym version (0.21.0 recommended): 0.21.0


<hr><hr>

## Testing connect four Gym setup

### Connect four with random agent

We start by creating an instance of the mountain car environment and analysing some of its properties.
This is based on the documentation from the [Gym tutorials](https://www.gymlibrary.ml/content/tutorials/), [this one](https://blog.paperspace.com/getting-started-with-openai-gym/) in particular as well as the [mountain car documentation](https://www.gymlibrary.ml/environments/classic_control/mountain_car/).

#### Setting up the gym environment

The `observation_space` defines the structure as well as the legitimate values for the observation of the state of the environment.
The observation can be different things for different environments.
The most common form is a screenshot of the game.
There can be other forms of observations as well, such as certain characteristics of the environment described in vector form.
- The observation for the mountain car environment is a vector of two numbers:
  - Position of the car along the x-axis
  - Velocity of the car

- The middle point between the two mountains is taken to be the origin, with right being the positive direction and left being the negative direction.

Similarly, the `Env` class also defines an attribute called the `action_space`, which describes the numerical structure of the legitimate actions that can be applied to the environment.
- We have three discrete actions:
  - 0: Accelerate to the left
  - 1: Don't accelerate
  - 2: Accelerate to the right

In [4]:
####################################################
# SETTING UP THE GYM ENVIRONMENT
####################################################

# Create an instance of the environment to be used
env = gym.make('lennert_bontinck/ConnectFour-v1')

# Get information about the environment
print(f"Observation space: {env.observation_space}")
print(f"\nAction space: {env.action_space}")

# Reset the environment to start from a clean state, returns the initial observation
observation = env.reset()
print("\n Initial observation:")
print(observation)

# Clean unused variables
del observation

Observation space: Dict(board:Box([[0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]
 [0 0 0 0 0 0 0]], [[2 2 2 2 2 2 2]
 [2 2 2 2 2 2 2]
 [2 2 2 2 2 2 2]
 [2 2 2 2 2 2 2]
 [2 2 2 2 2 2 2]
 [2 2 2 2 2 2 2]], (6, 7), int32))

Action space: Discrete(7)

 Initial observation:
{'board': array([[0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0.]])}


#### Interacting with the environment

Next, let's interact with the created environment.

In [5]:
####################################################
# INTERACTING WITH THE ENNVIRONMENT
####################################################

# Place coin in the first column:
action = 0

# Take the action and get the new observation space
new_observation, reward, done, info = env.step(action)
print(f"After taking action {action}, the new observation of the board is: \n {new_observation['board']}")
print(f"\nThis resulted in a reward of {reward} and a {done} done state")
print(f"\nOther information given is: {info}")

# Clean unused variables
del new_observation
del reward
del done
del info
del action

After taking action 0, the new observation of the board is: 
 [[1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]]

This resulted in a reward of 0 and a False done state

Other information given is: {'current_player': 2}


#### Visualising the environment in terminal

Let's now try to visualize the environment

In [6]:
####################################################
# VISUALISING THE ENVIRONMENT
####################################################

# Visualise the environment by printing to the terminal
env.render(mode = 'terminal')

[[0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0.]]


#### Letting random agent play the game in terminal

Let two agents play a combined total of 200 moves at random in the game and visualise it by updating the terminal continously.

In [7]:
####################################################
# VISUALISING THE ENVIRONMENT
####################################################

# Reset environment
obs = env.reset()

# Show initial state of the environment
env.render(mode= "terminal")

for step in range(2):
    # take random action
    action = env.action_space.sample()
    
    # apply the action
    obs, reward, done, info = env.step(action)
    
    # update environment in new window
    clear_output()
    print("Performed a move, new board: ")
    env.render(mode= "terminal")

    # Wait a bit before the next frame unless you want to see a crazy fast video
    time.sleep(0.5)
    
    # If the epsiode is up, then start another one
    if done:
        print("\n\n!!! Finished a game !!!")
        time.sleep(4)
        env.reset()

# Close the environment and thus the popup
env.close()

# Delete unused variables
del action
del done
del info
del obs
del reward
del step

Performed a move, new board: 
[[0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 2. 0. 0.]]


#### Visualising the environment in human mode

Let's now try to visualize the environment in the pygame window.

In [10]:
####################################################
# VISUALISING THE ENVIRONMENT
####################################################

# Reset the environment
env.reset();

# Visualise the environment by opening pygame window
env.render(mode = 'human')

In [11]:
####################################################
# CLOSE PYGAME WINDOW
####################################################

# Close and reset the environment
env.close();
env.reset();

#### Letting random agent play the game in human mode

Let two agents play a combined total of 200 moves at random in the game and visualise it by updating the pygame continously.

In [12]:
####################################################
# VISUALISING THE ENVIRONMENT
####################################################

# Reset environment
obs = env.reset()

# Show initial state of the environment
env.render(mode= "human")

# Wait for popup to open
time.sleep(1.5)

for step in range(100):
    # take random action
    action = env.action_space.sample()
    
    # apply the action
    obs, reward, done, info = env.step(action)
    
    # update environment in new window
    clear_output()
    print("Performed a move, board updated ")
    env.render(mode= "human")

    # Wait a bit before the next frame unless you want to see a crazy fast video
    time.sleep(0.1)
    
    # If the epsiode is up, then start another one
    if done:
        print("\n\n!!! Finished a game !!!")
        time.sleep(2)
        env.reset()

# Close the environment and thus the popup
env.close()

# Delete unused variables
del action
del done
del info
del obs
del reward
del step

Performed a move, board updated 
