# Implementing The MENACE Player

### Tracking MENACE's Beads

The original MATCHBOX-based MENACE remembered the consequences of past games using matchboxes and beads.
1. It used one matchbox for each board position that MENACE might enounter.
1. Each machbox was labelled. The label showed the position represented by the box, and showed each possible move from that position. Possible moves were shown in different colours.
1. Each matchbox contained beads of colours that matched those on the label. The beads determined the probability that MENACE would make the corresponding move.

In our implementation we will maintain the bead information as a two element vector, with each element an array.

The first element will be a vector of decoded (numerical) canonical possible positions.
The second element will be a matrix of bead counts, with one row per possible position and 9 columns.

Each column will corespond to one of the cells in the game board, and will contain the number of virtual beads corresponding to that move. Some moves will not be possible from the starting position. We will initialse those bead counts to zero, and they will never get updated, since the correponding moves will never be made.

The values for possible moves will be initialised to some fixed number when a new MENACE configuration is created, and will then get adjusted at the end of each game by MENACE's learning rule.

The bead data will form a new third element of the MENACE configuration.

### MENACE's Learning Rule

At the end of a game MENACE adjusted the bead counts.
1. If MENACE *won* it added three beads of the same colour matching those picked during the game.
1. If MENACE *drew* it added one bead of the appropriate colour to each matchbox used.
1. If MENACE *lost* it removed three beads of the appropriate colour from each matchbox used.

We'll do the same to our virtual bead counts.

Let's look at how to initialise the virtual bead data. MENACE started with 20 beads in each matchbox, but we will make that a changeable parameter called starting_count.

The bead data is a two-element vector.

The first is a vector of decoded plausible canonical positions. We saved that earlier in a variable `ucpn`.

The second is a matrix of bead counts, with one row for each position in `ucpn`, and nine columns corrspnding to the nine squares on the board. Each column shoud contain the initial bean count for those positions that are empty, and zero for each position that has already been filled.

In [3]:
starting_count ← 20
bc ← ((⍴ucpn),9)⍴ starting_count

Let's start by copying in our earlier work.

In [None]:
)copy notebook7

In [None]:
⎕io ← 0

Now we need to set the bead counts to zero for each position that has been filled.

We'll do that in stages. First we'll convert encode values in `ucpn`, That will give us a matrix `ucp` with one row per position and one column per board position.

Next we'll look for the non-zeros in `ucp`, keep the corresponding positions of bead counts unchanged, and set the rest to zero.

In [4]:
ucp ← encode ucpn ⍝ a matric with one board position per row
non_zero ← 0≠ucp ⍝ a matrix with a 1 for each non-empty position and a 0 for every empty one.
bc ← bc × non_zero

In [5]:
list 5↑ucp

In [6]:
5↑bc

That looks correct. Let's turn that into a function `initial_counts` which will take the intial counts as its  argument. 

In [7]:
initial_counts ← { bc ← ((⍴ucpn),9)⍴ ⍵ ⋄ bc × 0≠encode ucpn}

In [8]:
5↑initial_counts 20

Now we can easily create the extended initial configuration.

The third element contains `ucpn` and the intial counts.

We'll write a function called `initialise` which will take the initial bead count as its argument.

In [9]:
initialise ← {(ucpn initial_counts ⍵) (1 9⍴0) ⍬}
config ← initialise 20

In [2]:
⎕io ← 0

In [22]:
)save notebook8 -force