# Playing the game

In this chapter we'll explore the code that will run a game of Noughts and Crosses.

## Types Of Player

The ultimate goal is to create a framework that makes it easy to run MENACE with a variety of different player types.

Eventually we'll implement
1. a `random_player` player function which will pick a legal move at random, and
1. a `menace_player` that learns by using MENACE's strategy to chose which move to make.
1. a `human_player` in which a real person plays one side of the game.
1. a `perfect_player` who makes the best possible moves throughout the game.

In this chapter we will implement the `random_player`.

I don't yet know a way to capture user input from a program running in an APL jupyter notebook, but it's easy to do when running a program using Dyalog's [RIDE](https://www.dyalog.com/dyalog/development-environment.htm) interface. Appendix B (currently under construction) will describe how to run a version of MENACE that also supports a `human_player`.

Each of these players will be implemented as an APL function which will take the current MENACE configuration as an argument, make a move and return an updated configuration as its result.

## The Game Runner Code

We'll plug the players into a `game_runnner` which we will implement as a *user-defined* operator.

Recall that in APL, a *function* returns data but an *operator* returns a function. Our `game_runner` will take a pair of player functions as arguments and return a function that will play an entire game.

That function will take an array that captures the MENACE configuration as an argument and return an updated configuration as its result.

In this chapter we will assume that all the games are played in a single session. In a later chapter we will implement a simple way of retaining that data so that MENACE can learn across multiple sessions.

## The MENACE configuration

The MENACE configuration will contain the information needed by the players and the `game_runner`.

Throughout a game *all* player types need to know the configuration for the current board. If we track the moves actually made during a game, the current board is simply the last of the moves in the current game, so the game history contains it.

Eventually we may want to examine MENACE's performance over time. For that reason the configuration will maintain a history of wins, draws and losses. The `game_runner` will maintain that information but the player functions will ignore it.

The original version of MENACE chose its moves and tracked their value by using beads in a matchbox. The content of the matchbox was adjusted when a game was over.

A player that applies MENACE's strategy needs that information so we will eventualy need to include it in the MENACE configuration. It's not needed by a `random_player` so we will omit it for now.

In summary, then, we will start with a MENACE configuration that contains two elements:

1. A history of the current game, held as a matrix of position vectors, one row per move.
1. A vector of MENACE's game results. A win is represented by 1, a draw by 0 and a loss by ¯1.

In [1]:
)copy notebook6

In [2]:
⎕io ← 0

Now we can easily create the initial configuration.

The first element contains the current board position. Initially that should be an empty board.
The second element contains the history of all games played, which should initially be set to be an empty vector.

In [3]:
config ← (1 9⍴0) ⍬

## Playing The Game

Now we can start work on the `game_runner`.

The `game_runner` will be something we haven't met before: a *user-defined operator*.

Recall that in APL a function returns data but an operator returns a function.

We've encountered two types of APL function: *primitive* functions like `+`, `⍴`, `⌽` which are built into the APL interpreter, and*user-defined* functions, like `initial_counts` where we named APL code so that we could reuse it wherever it was needed without having to repeat the code.

We've also encoutered a number of primitive operators, like reduction `/` and outer product `∘.`.

`game_runner` needs to cope with different types of player, and we will do that by providing player functions as arguments to `game_player`.

A typical game would look like this:
`config ← menace_player game_runner random_player config`
where `config` is the MENACE configuration data we discussed above and `menace_player` and `random_player` are the functions we desicribed at the beginning of this chapter.

Let's start by defining a `random_player` as it's simpler to implement.

We'll invoke the function by providing a menace configuration as its argument. That contains the current board configuration, and the `random_player` function should
1. pick a legal move at random, and then
1. update the configuration appropriately and return it as the result.

The second step is a little more compicated than it seems, because the *player doesn't know whether it's playing `×` or `○`*. Luckily it can find that out from the current board position. If the numbers of '`×`'s and `○`'s are the same, it's `×` to play. If not, it's `○`'s turn.

Let's build the code step by step.

In [4]:
g ← 0⊃config ⍝ find the game so far
⊢cp ← ,¯1↑g ⍝ find current position

In [5]:
e ← 0=cp   ⍝ boolean vector of empty board positions - a 1 means the position is available
⊢c ← +/e ⍝ how many empty squares are there? (At the start of the game, all of them).

The player neets to pick an empty position at random.

Surprise, surprise! APL has a primitive function to do that.

*Roll* is represetned by `?`. It returns a random number in the range specied by its right argument.
So `?5` will return a number between zero and four (inclusive), assuming that ⎕io is zero.

We'll use it to pick a move at random and then update the game.

In [6]:
i ← ⍸e ⍝ find indices of the empty positions - these are the legal moves
np ← {1+>/+/1 2∘.=⍵} ⍝ who is the next player? 1 for ×, 2 for ○
cp[i[?⍴i]] ← np cp ⍝ pick a legal move at random and update the current position
cp

Next, we'll update the current game by appending the new position to the matrix of moves and put the updated game into the coniguration using [structured assignment](http://help.dyalog.com/18.0/#Language/Primitive%20Functions/Assignment%20Selective.htm?Highlight=selective%20assignment).

In [7]:
g ← g⍪ cp 
(0⊃config)←g

Now we'll create functions to access and update the configuration. These hide the internal structure of the configuration, making it easier to change later if we need to.

In [8]:
cg ← {0⊃⍵} ⍝ current game from configuration
ucg ← {c ← ⍺ ⋄ (0⊃c)←⍵ ⋄ c} ⍝ insert current game and return updated configuration
p ← {,¯1↑⍵} ⍝ current position from game

We can combine all those steps into the `random_player` function.

In [9]:
random_player ← {cf ← ⍵ ⋄ g ← cg ⍵ ⋄ cp ← p g ⋄ i ← ⍸e←0=cp ⋄ cp[i[?+/e]] ← np cp ⋄ g ← g⍪cp ⋄ cf ucg g }

Let's try it out.

In [10]:
config ← (1 9⍴0) ⍬ 
config ← random_player config
cg config

If we run `random_player` again it will make a move for `×`.

In [11]:
config ← random_player config
cg config

It would get rather tedious if we had to manually run `random_player` every time we wanted to make another move. APL has a power operator `⍣` which we can use to execute something repeatedly.

The code below cretates a fresh configuration and then runs the `random_player` 9 times.

In [12]:
config ← (1 9⍴0) ⍬
cg config←(random_player ⍣ 9) config

What happens if we try to play again?

In [13]:
config ← random_player config

DOMAIN ERROR
random_player[0] random_player←{cf←⍵ ⋄ g←cg ⍵ ⋄ cp←p g ⋄ i←⍸e←0=cp ⋄ cp[i[?+/e]
      ]←np cp ⋄ g←g⍪cp ⋄ cf ucg g}
                                                                         ∧


The board is full and our code falls over.

There's another problem with our design so far. There's nothing to stop the game when a player has won.

Luckily that is not hard to fix.

We can use the `try` user-defined operator to stop invoking the player as soon as the game is over. We saw above that our first attpemt at play failed if we carried on beyond the 9th move.

TODO: Define the syntax for user defined operators.

In [14]:
 draw ← {0=+/0=⍵} ⍝ is the game drawn?
 gf ←wf ∨ draw ⍝ the game is finished if it's been won or the board is full
 try←{gf p cg ⍵:⍵ ⋄ ⍺⍺ ⍵} ⍝ if the game is over, do nothing; otherwise invoke the player

Let's repeat our new code 10 times and see if it copes OK.

In [15]:
config ← (1 9⍴0) ⍬
list 0⊃config←({random_player try ⍵} ⍣ 10) config

Great! Even when we run it ten times, the code stops adding moves as soon as the game is over.

At the moment we can only play games with `random_player`s, but later we will want to use other types of player. We'll write a `play_round` operator which will take two functions, one the left, one on the right, and create a function that
1. takes a configuration as its arguiment
1. asks the function on the left to make a move
1. asks the function on the right to make a move
1. then returns the updated configuration

In [16]:
play_round ← {cf ← ⍵ ⋄ cf ← ⍺⍺ try cf ⋄ cf ← ⍵⍵ try cf ⋄ cf}

In [17]:
{random_player play_round random_player ⍵} (1 9⍴0) ⍬

We need a helper function `uh` *update history* before we can write `game_runner`.

It will extract the game hstory from a configuration on its left , update it with the result of the last game, and return the updated configuration.

We'll start by writing `result` which `uh` will then use to find the result of the last game.
`result` will check if the game was drawn, and if so it will return a 0.
If not, it will check the number of empty squares. If the count is even, the `x` played last and must have won, so it will return a 1. If the count is odd, `○` played last and the game was lost, so it will return a ¯1.

In [18]:
result ← { draw p ⍵: 0 ⋄ 1 ¯1[2 | +/0=p ⍵]}
uh ← {cf ← ⍵ ⋄ h ← 1⊃cf ⋄ h←h, result cg cf ⋄ (1⊃cf) ← h ⋄ cf}

Now we can develop `game_runner`.

It will start by setting a new starting postion in the configuration it's given.

It will use a different invocation of `⍣`, the power operator. Previously we used a numeric right argument,
and the power operator then invoked the function that number of times. Now we'll ise a function as the rigth argument. IThat will be a stop function; the power operator will repeat the function on its left until the function on its right returns a `1`. The function we'll use is *same* represented by `≡`. It returns a 1 if its arguments are identical.

`game_runner` will repeatedly play a round until two succesive configurations are the same.That will happen as soon as the game is over.

Then it will invoke `uh` to update the configuration with the game result and return the updated configuration.

In [19]:
game_runner ← {cf ← ⍵ ucg 1 9⍴0 ⋄ cf ←(⍺⍺ play_round ⍵⍵)⍣≡⊢cf ⋄ uh cf}

In [20]:
config ← (1 9⍴0) ⍬
⊢config ← random_player game_runner random_player⊢config

In [21]:
⊢config ← random_player game_runner random_player⊢config

Let's play twenty more games and see the results.

In [22]:
{random_player game_runner random_player⊢⍵}⍣20⊢config

In [23]:
)save notebook7 -force