# Part 1) Supervised Learning. Dataset generator.

- **GOAL:** generate and save new data to be used in '_src/train/part1_supervised_learning.ipynb_'
<br>
- **ABOUT THE DATA:**
    - the data represents the actions of a mid-level player of Connect4
    - our mid-level player is a _1StepLookahead Agent_ instance
    - the data is a set of (_obs_, _action_) pairs:
         - _obs_ is a game board where 1 is the active player and -1 is the oppponent
         - _action_ is the column that our mid-level player would choose to play in _obs_
    - A Supervised Learning task: given an 'obs' (game board), predict its 'action' (classification)
    - Our mid-level player is deterministic (there is no randomness in the selection of the action). We achieve this by setting the '_prefer_central_columns_' attribute to True.
<br>
- **DATA GENERATION:**
    - initialize a non-terminal random board
        - a _random board_ is a game board where some moves have already been played by a random player 
    - use _self-play_ to let the _1StepLookAheadAgent_ finish the random game
        - the _1StepLA_ selects all the actions for both players (i.e. playing against itself)
    - store the new sequence of (obs, action) pairs
<br>
- **PREPROCESS DATA BEFORE SAVING:**
    - (1) replace -1 with 2 in the board
    - (2) flatten the board and turn it into string format
    - (3) apppend the action in string format
    - in the standard Connect4 6x7 board:
        - the '_board_' in string format is a sequence of 6x7=42 chars {'0','1','2'}
        - the '_action_' is a string number from '0' to '6'
        - the final sequence has 42+1=43 characters (what will be stored)
<br>
- **RESULTS:**
    - 200k unique data pairs (_obs, action_) by a mid-level player.
    - saved in 'src/data/part1_data/part1_supervised_learning_data.txt'

## 1) Imports

In [None]:
import os

import numpy as np

In [None]:
### YOUR PATH HERE
code_dir = '/home/marc/Escritorio/RL-connect4/'

if os.path.isdir(code_dir):
    # local environment
    os.chdir(code_dir)
    print(f"directory -> '{code_dir }'")
else:
    # google colab environment (upload 'src.zip' and unzip it in the Colab environment)
    if os.path.isdir('./src'):
        print("'./src' dir already exists")
    else:  # not unzipped yet
        !unzip -q src.zip
        print("'./src.zip' file successfully unzipped")

In [None]:
from src.agents.agent import Agent
from src.environment.connect_game_env import ConnectGameEnv
from src.environment.env_utils import random_action
from src.agents.baselines.n_step_lookahead_agent import NStepLookaheadAgent

## 2) Hyper parameters

In [None]:
hparams = {
    'n_samples': 200000,
    'saved_data_filepath': './src/data/part1_data/part1_supervised_learning_data.txt'
}

## 3) The environment and our mid-level player

In [None]:
env = ConnectGameEnv()

agent = NStepLookaheadAgent(
    n=1, 
    prefer_central_columns=True
)

## 4) Generate data

In [None]:
def obs_action_to_string(o, a):
    """
    Turns an (obs, action) pair into string format
    
    :param o: observation (game board)
    :param a: action
    """
    obs_str = ''.join([str(int(i%3)) for i in o.flatten()])
    action_str = str(a)
    return obs_str + action_str

example_obs = ConnectGameEnv.random_observation()
example_action = random_action(board=example_obs)
print('obs:\n', example_obs)
print('action:', example_action)
print('(obs, action) in string format:', obs_action_to_string(example_obs, example_action))

In [None]:
new_data = set()

In [None]:
log_every = 100
while len(new_data) < hparams['n_samples']:
    obs, _ = env.reset(init_random_obs=True)
    done = False
    while not done:
        action = agent.choose_action(obs=obs)
        obs_action_str = obs_action_to_string(o=obs, a=action)    
        # if that pair is already in the data, start a new episode
        if obs_action_str in new_data:
            done = False
        if not done:
            new_data.add(obs_action_str)
            # create its symmetry
            sym_obs = np.flip(obs, axis=1)
            sym_action = 6 - action
            sym_obs_action_str = obs_action_to_string(o=sym_obs, a=sym_action)
            new_data.add(sym_obs_action_str)
            obs, _, done, _ = env.step(action=action)
        if len(new_data) % log_every == 0:
            print(f'{len(new_data)}/{hparams["n_samples"]}')
print('done!')

## 5) Save the generated data

In [None]:
sep = ';'

# consider only the first hparams['n_samples'] pairs
lines = ''
for i, d in enumerate(new_data):
    lines += d + sep
    if i == hparams['n_samples']:
        break

print(len(lines))

In [None]:
# if hparams['file_path'] already has samples, append the new ones
with open(hparams['saved_data_filepath'], '+a') as file:
    file.write(lines)
print(f"new samples saved in '{hparams['saved_data_filepath']}'")

## 6) Sanity check: try to load the data

In [None]:
sep = ';'
with open(hparams['saved_data_filepath'], 'r') as file:
    lines = file.read().split(sep)[:-1]  # last line is ''

# undo the changed made to save the pairs (obs, action)
loaded_obs, loaded_actions = [], []
for line in lines:
    flat_obs = [int(i) for i in line[:-1]]
    new_obs = np.array(flat_obs).reshape((6,7))
    new_obs[new_obs==2] = -1
    loaded_obs.append(new_obs)
    loaded_actions.append(int(line[-1]))
print(f"{len(loaded_obs)} (obs, action) pairs loaded from '{hparams['saved_data_filepath']}'")

In [None]:
last_n_samples = 10

for i in range(last_n_samples):
    print(loaded_obs[-i])
    print('action =', loaded_actions[-i])
    print('-'*50)