# Connect four with random agents from Tianshou

In this notebook the created custom gym environment of connect four is played using two random agents.
We use the powerfull [Tianshou library](https://github.com/thu-ml/tianshou) for this.
This notebook thus shows how the multi-agent environment can be configured using Tianshou for managing the agents.


## Table of Contents

- Contact information
- Checking requirements
  - Correct anaconda environment
  - Correct module access
  - Correct CUDA access
- Training two agents on connect four Gym
  - Create the Gym environment

<hr><hr>

## Contact information

| Name             | Student ID | VUB mail                                                  | Personal mail                                               |
| ---------------- | ---------- | --------------------------------------------------------- | ----------------------------------------------------------- |
| Lennert Bontinck | 0568702    | [lennert.bontinck@vub.be](mailto:lennert.bontinck@vub.be) | [info@lennertbontinck.com](mailto:info@lennertbontinck.com) |



<hr><hr>

## Checking requirements

### Correct anaconda environment

The `rl-project` anaconda environment should be active to ensure proper support. Installation instructions are available on [the GitHub repository of the RL course project and homeworks](https://github.com/pikawika/vub-rl).

In [1]:
####################################################
# CHECKING FOR RIGHT ANACONDA ENVIRONMENT
####################################################

import os
from platform import python_version

print(f"Active environment: {os.environ['CONDA_DEFAULT_ENV']}")
print(f"Correct environment: {os.environ['CONDA_DEFAULT_ENV'] == 'rl-project'}")
print(f"\nPython version: {python_version()}")
print(f"Correct Python version: {python_version() == '3.8.10'}")

Active environment: rl-project
Correct environment: True

Python version: 3.8.10
Correct Python version: True


<hr>

### Correct module access

The following codeblock will load in all required modules and show if the versions match those that are recommended.

In [2]:
####################################################
# LOADING MODULES
####################################################

# Allow reloading of libraries
import importlib

# Plotting
import matplotlib; print(f"Matplotlib version (3.5.1 recommended): {matplotlib.__version__}")
import matplotlib.pyplot as plt

# Pygame
import pygame; print(f"Pygame version (2.1.2 recommended): {pygame.__version__}")

# Gym environment
import gym; print(f"Gym version (0.23.1 recommended): {gym.__version__}")

# Tianshou for RL algorithms
import tianshou as ts; print(f"Tianshou version (0.4.8 recommended): {ts.__version__}")

# Torch is a popular DL framework
import torch; print(f"Torch version (1.11.0 recommended): {torch.__version__}")

# Our custom connect four gym environment
import sys
sys.path.append('../')
import gym_connect4_pygame
importlib.invalidate_caches()
importlib.reload(gym_connect4_pygame)

# Time for allowing "freezes" in execution
import time;

# Used for updating notebook display
from IPython.display import clear_output

Matplotlib version (3.5.1 recommended): 3.5.1
pygame 2.1.2 (SDL 2.0.18, Python 3.8.10)
Hello from the pygame community. https://www.pygame.org/contribute.html
Pygame version (2.1.2 recommended): 2.1.2
Gym version (0.23.1 recommended): 0.23.1


  from .autonotebook import tqdm as notebook_tqdm


Tianshou version (0.4.8 recommended): 0.4.8
Torch version (1.11.0 recommended): 1.12.0.dev20220520+cu116


  if not hasattr(tensorboard, "__version__") or LooseVersion(
  logger.warn(f"Overriding environment {id}")
  logger.warn(f"Overriding environment {id}")


<hr>

### Correct CUDA access

The installation instructions specify how to install PyTorch with CUDA 11.6.
The following codeblock tests if this was done succesfully.

In [3]:
####################################################
# CUDA VALIDATION
####################################################

# Check cuda available
print(f"CUDA is available: {torch.cuda.is_available()}")

# Show cuda devices
print(f"Amount of connected devices supporting CUDA: {torch.cuda.device_count()}")

# Show current cuda device
print(f"Current CUDA device: {torch.cuda.current_device()}")

# Show cuda device name
print(f"Cuda device 0 name: {torch.cuda.get_device_name(0)}")

CUDA is available: True
Amount of connected devices supporting CUDA: 1
Current CUDA device: 0
Cuda device 0 name: NVIDIA GeForce GTX 970


<hr><hr>

## Training two random agents on connect four Gym

Our connect four gym setup requires two agents, one for each player.
To reduce complexity, agents will always play as the same player, e.g. always as player 1.
It is important to note that connect four is a *solved game*.
According to [The Washington Post](https://www.washingtonpost.com/news/wonk/wp/2015/05/08/how-to-win-any-popular-game-according-to-data-scientists/):

> Connect Four is what mathematicians call a "solved game," meaning you can play it perfectly every time, no matter what your opponent does. You will need to get the first move, but as long as you do so, you can always win within 41 moves.

### Create the Gym environment

We start of by instatiating our previously created connect four gym environment.

In [None]:
####################################################
# SETTING UP THE GYM ENVIRONMENT
####################################################

# Create an instance of the environment to be used
# V2 is used as this contains edits for Tianshou
env = gym.make('lennert_bontinck/ConnectFour-v2')

# Get information about the environment
print(f"Observation space: {env.observation_space}")
print(f"\nAction space: {env.action_space}")

# Reset the environment to start from a clean state, returns the initial observation
observation, info = env.reset(return_info= True)
print("\n Initial observation:")
print(observation)

print("\n Initial info:")
print(info)

# Clean unused variables
del observation
del info

### Create a Multi Agent Policy Manager

Since we have a multi-agent environment we need a multi agent policy manager.

In [None]:
####################################################
# SETTING UP THE MULTI AGENT POLICY MANAGER
####################################################

# Create the mutly agent policy manager
multi_agent_policy_manager = ts.policy.MultiAgentPolicyManager([ts.policy.RandomPolicy(), ts.policy.RandomPolicy()], env)

# need to vectorize the environment for the collector
vectorized_env = ts.env.DummyVectorEnv([lambda: env])

# use collectors to collect a episode of trajectories
# the reward is a vector, so we need a scalar metric to monitor the training
collector = ts.data.Collector(multi_agent_policy_manager, vectorized_env)


result = collector.collect(n_episode=1, render=.1)