# Playing the game

In this chapter we'll explore the code that will run a game of Noughts and Crosses.

The goal is to create a framework that makes it easy to run MENACE with a variety of different player types.

Specifically, we'll implement
1. a `random` player function which will pick a legal move at random, and
1. a `menace` player that learns by using MENACE's strategy to chose which move to make.

Later we'll look at creating a `perfect` player who makes the best possible moves throughout the game.

I don't yet know how to capture user input from a program running in an APL jupyter notebook, but it's easy to do when running a program using Dyalog's RIDE interface. Appendix B (when written) will describe how to run a version of MENACE that also supports human players.

Each of these players will be implemented as an APL function which will take the current MENACE configuration as an argument, make a move and return an updated configuration as its result.

We'll plug the players into a `game_runnner` which we will implement as a *user-defined* operator.

Recall that in APL, a *function* returns data but an *operator* returns a function. Our `game_runner` will take a pair of player functions as arguments and return a function that will play an entire game.

That function will take an array that captures the MENACE configuration as an argument and return an updated configuration as its result.

In this chapter we will assume that all the games are played in a single APL session. In the *next* chapter we will implement a simple way of retaining that data so that MENACE can learn across multiple sessions.

## The MENACE configuration

The MENACE configuration will contain the information needed by the players.

The original version of MENACE tracked the value of moves by using beads in a matchbox. The beads were adjusted when a game was over.

A player that applies MENACE's strategy needs that information so we will need to include it in the MENACE configuration. The function that implements a random player will ignore it.

MENACE also tracked the moves made during a game by requiring its human agent to leave open each matchbox used, with the selected bead visible. It needed that information for its adjustments, so we will include that too.

Throughout a game **all** player types need to know the configuration for the current board. If we track the moves actually made during a game, the current board is simply the last of the moves in the current game, so the game history contains it.


Eventually we may want to examine MENACE's performance over time. For that reason the configuration will maintain a history of wins, draws and losses. The `game_runner` will maintain that information but the player functions will ignore it.

In summary, then, the MENACE configuration will contain three items:
1. An array of notional *bead counts*. The format of that data is described below.
1. A history of the current game, held as a matrix of position vectors, one row per move.
1. A vector of MENACE's game results. A win is represented by 1, a draw by 0 and a loss by ¯1.

### Tracking MENACE's Beads

The original MATCHBOX-based MENACE remembered the consequences of past games using matchboxes and beads.
1. It used one matchbox for each board position that MENACE might enounter.
1. Each machbox was labelled. The label showed the position represented by the box, and showed each possible move from that position. Possible moves were shown in different colours.
1. Each matchbox contained beads of colours that matched those on the label. The beads determined the probability that MENACE would make the corresponding move.

In our implementation we will maintain a two element vector, with each element an array.

The first element will be a vector of decoded (numerical) canonical possible positions.
The second element will be a matrix of bead counts, with one row per possible position and 9 columns.

Each column will corespond to one of the cells in the game board, and will contain the number of virtual beads corresponding to that move. Some moves will not be possible from the starting position. We will initialse those bead counts to zero, and they will never get updated, since the correponding moves will never be made.

The values for possible moves will be initialised to some fixed number when a new MENACE configuration is created, and will then get adjusted at the end of each game by MENACE's learning rule.

### MENACE's Learning Rule

At the end of a game MENACE adjusted the bead counts.
1. If MENACE *won* it added three beads of the same colour matching those picked during the game.
1. If MENACE *drew* it added one bead of the appropriate colour to each matchbox used.
1. If MENACE *lost* it removed three beads of the appropriate colour from each matchbox used.

We'll do the same to our virtual bead counts.

Let's start by copying in our earlier work. Then we will look at how to initialise the virtual bead data. MENACE started with 20 beads in each matchbox, but we will make that a changeable parameter called `starting_count`.

In [1]:
)copy notebook6

In [2]:
⎕io ← 0

The bead data is a two-element vector.

The first is a vector of decoded plausible canonical positions. We saved that in a variable `ucpn`.

The second is a matrix of bead counts, with one row for each position in `ucpn`, and nine columns corrspnding to the nine squares on the board. Each column shoud contain the initial bean count for those positions that are empty, and zero for each position that has already been filled.

In [3]:
starting_count ← 20
bc ← ((⍴ucpn),9)⍴ starting_count

Now we need to set the bead counts to zero for easch position that has been filled.

We'll do that in stages. First we'll convert encode values in `ucpn`, That will give us a matrix `ucp` with one row per position and one column per board position.

Next we'll look for the non-zeros in `ucp`, keep the corresponding positions of bead counts unchanged, and set the rest to zero.

In [4]:
ucp ← encode ucpn ⍝ a matric with one board position per row
non_zero ← 0≠ucp ⍝ a matrix with a 1 for each non-empty position and a 0 for every empty one.
bc ← bc × non_zero

In [5]:
list 5↑ucp

In [6]:
5↑bc

That looks correct. Let's turn that into a function `initial_counts` which will take the intial counts as its  argument. 

In [7]:
initial_counts ← { bc ← ((⍴ucpn),9)⍴ ⍵ ⋄ bc × 0≠encode ucpn}

In [8]:
5↑initial_counts 20

In [9]:
)save notebook7 -force