# Part 1 - Generating Training Data

In order to train our network to play Minesweeper, we'll need training data, and we can generate this by simply running a large number of Minesweeper games. For each game we store the game state at each move, and then randomly select one of those to include in the training set. This allows us to create a set of training data where each element is independent (taken from a different game) and where there's a good distribution of game states from early game to late game.

The Minesweeper game itself is implemented entirely using numpy arrays (see `game.py`). This is partly for performance reasons, but also because numpy arrays map well to PyTorch tensors, which simplifies feeding data from the game to the PyTorch model.

In [22]:
import random
import numpy as np
from ais import MinesweeperAI, CheatingMinesweeperAI
from tqdm import tqdm
from game import MinesweeperGame

np.set_printoptions(precision=1, floatmode='fixed', linewidth=150)

In [2]:
def generate_training_data(width: int, height: int, mines: int, ai: MinesweeperAI, n_samples: int):
    input_data = []
    output_data = []

    with tqdm(total=n_samples) as pbar:
        for n in range(n_samples):
            game = MinesweeperGame(width, height, mines)
            game_input_data = []
            game_output_data = []
            
            while not game.is_over:
                guess_x, guess_y = ai.guess(game)
                game.guess(guess_x, guess_y)
                game_input_data.append([game.visible_gamestate.astype('float32')/10.0])
                game_output_data.append([np.logical_not(np.logical_or(game.minefield, game.exposedfield)).astype('float32')])
            
            idx = random.randint(0, len(game_input_data)-1)
            input_data.append(game_input_data[idx])
            output_data.append(game_output_data[idx])

            pbar.update(1)

    return input_data, output_data

To generate training data, we use a cheating AI, which avoids bombs, but otherwise guesses randomly. As it wins every game, this allows us to get data representing everything from early game to late game.

In [3]:
ai = CheatingMinesweeperAI()

For the training data, we're using Expert difficulty games, which have 99 mines on a 30x16 grid. We could use any configuration here, but some of the more complex arrangements of mines are much more common on Expert difficulty, and we'd like to include them in the training data.

In [4]:
width = 30
height = 16
mines = 99

We're generating 50000 samples, which should be enough to train an effective network. As this means running 50000 games, it takes a few minutes.

In [5]:
n_samples = 50000
in_data, out_data = generate_training_data(width, height, mines, ai, n_samples)

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [01:05<00:00, 75.81it/s]


We want to split our dataset into training data and test (or validation data), and we'll use a ratio of 90% training data. As each element in our dataset is already independent, there's no need to randomly sample here, we can just take the first N elements for our training data.

In [6]:
training_ratio = 0.9
train_size = int(n_samples*training_ratio)
test_size = n_samples - train_size

In [8]:
train_data = np.stack(in_data[:train_size])
train_output = np.stack(out_data[:train_size])
test_data = np.stack(in_data[n_samples-test_size:n_samples])
test_output = np.stack(out_data[n_samples-test_size:n_samples])

We now have four sets of data. We have the input data for our training set, the corresponding expected output data we want the network to generate, and then the same for the test data.

In [9]:
train_data.shape

(4500, 1, 30, 16)

We can take a look at the data to see how it's representing the game. Here's the first element of the input data for our training set:

In [24]:
train_data[0][0]

array([[ 0.0,  0.1, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0,  0.1,  0.0,  0.0],
       [ 0.0,  0.1, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0,  0.2,  0.1,  0.1],
       [ 0.0,  0.1, -1.0,  0.4, -1.0, -1.0, -1.0, -1.0, -1.0,  0.2,  0.2,  0.2,  0.1,  0.2, -1.0, -1.0],
       [ 0.0,  0.1,  0.1, -1.0, -1.0, -1.0,  0.2, -1.0, -1.0, -1.0, -1.0,  0.1,  0.0,  0.1,  0.1,  0.1],
       [ 0.1,  0.2, -1.0, -1.0, -1.0,  0.2, -1.0, -1.0,  0.2, -1.0,  0.2,  0.1,  0.0,  0.0,  0.0,  0.0],
       [-1.0, -1.0, -1.0, -1.0, -1.0, -1.0,  0.1, -1.0,  0.2, -1.0,  0.1,  0.0,  0.1,  0.1,  0.2,  0.1],
       [-1.0,  0.2,  0.1,  0.1,  0.1,  0.2,  0.1, -1.0, -1.0, -1.0,  0.1,  0.0,  0.1, -1.0, -1.0, -1.0],
       [ 0.2,  0.2,  0.0,  0.0,  0.0,  0.1, -1.0,  0.2,  0.1, -1.0,  0.1,  0.2,  0.3, -1.0, -1.0,  0.1],
       [-1.0,  0.2,  0.0,  0.1,  0.2,  0.3, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0, -1.0,  0.2,  0.1, -1.0],
       [-1.0,  0.2,  0.1,  0.2, -1.0, -1.0, -1.0, -1.0,

This is the visible gamestate of a Minesweeper game (i.e. what the player sees) represented as a numpy array, normalized to the range of -1.0 to 0.8. The -1.0 values represent cells which haven't yet been revealed, and numbers from 0.0 up represent the number of bombs in adjacent cells. So 0.0 represents no adjacent bombs, 0.1 means 1 adjacent bomb, 0.4 means 4, etc.

We can also look at the output data. This is the data we want the neural network to replicate:

In [25]:
train_output[0][0]

array([[0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0],
       [0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0],
       [0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
       [1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
       [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0],
       [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0],
       [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0],
       [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0],
       [0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0],
       [0.0, 0.0, 0.0

This is an array of values of zero and one, where one represents a good guess (not already revealed and not a bomb), and zero is a bad guess. All the cells you'll see revealed in the previous array are marked as zero here, as we don't want the network to guess a cell that's already been revealed.

We can then save the data as numpy `.npz` files, one for the training data and one for the testing data.

In [12]:
np.savez("training_data", input_data=train_data, output_data=train_output)
np.savez("testing_data", input_data=test_data, output_data=test_output)