# Reinforcement Learning

## Exploring MENACE in APL

Michie's MENACE used matchboxes to remember what had happened in previous games.
Michie used one matchbox for each possible board configuration, and the player picked
a coloured bead at random to decide what move MENACE would make.

How many relevant board configurations are there?

This isn't a trivial question, since many board configurations could never be reached in play.

Theree's an obvious upper limit, though. We can count all the board configurations without considering whether they could be reached.

Each square of the board can be empty, contain a nought or contain a cross.
There are three possibilities for each of the nine cells, so there are three to the power 9 in total.

Here's how we calculate that in APL.

```{note}
Exponentiation is represented by `*` in APL.
APL uses `×` for multiplication.
```

In [1]:
3*9

It's interesting to know how many positions there are, but we'll need to eliminate the ones that are impossible
or redundant.

We need some way of representing a board position.

We can show each board position as a 3 by 3 matrix of characters: a `×` for a cross, a `○` for a nought,
and  a `.` for an unfilled square. That means that an empty board does not just look like white space.

Suppose the first player placed an 'x' in the centre.
We could create that board like this:

In [2]:
3 3⍴'....×....'

In APL we create character literals by enclosing their characters in single quotation marks.
The result is a character *vector* (a list of characters).

We can create a matrix by asking APL to reshape the vector.
`⍴` stands for *reshape*; it reshapes the array on its right using the shape on its left.

That way of displaying a board position is great for humans, but there are better ways to represent a board
in an APL program. The way we'll do it is to represent a board as a numeric vector.
The vector will have a zero for an unfilled position, a 1 for an `×` and a 2 for a `○`.

The board we saw above would be represented by the vector `0 0 0 0 1 0 0 0 0`.

How can we convert that to something we can visualise? We'll need to use the vector as an index to an array of
characters.

As you may know, the mathematical world is divided about how to index things.
Pure mathematicians tend to count from 0, but applied mathematicians often count from 1.

```{note}
APL supports both approaches. You tell APL which you want to use by setting the *index origin* to 0 or 1.

In APL, the index origin is a system variable called `⎕io`. The line below sets it to 0.
```

In [3]:
⎕io ← 0

As you can see, in APL, `←` is used for assignment. (The `=` symbol is only used to test for equality.)

Here's the code to convert the vector `0 0 0 0 1 0 0 0 0` to a human-friendly board position.

In [4]:
3 3⍴'.×○'[0 0 0 0 1 0 0 0 0]

We're going to use that expression whenever we want to  turn a board vector into a human-friendly diagram.

We can use APL's *direct definition* to create a function which we can use repeatedly.

In [5]:
show ← {3 3⍴'.×○'[⍵]}
show 0 0 0 0 1 0 0 0 0 

How can we generate all the 19683 board positions?

The first step is to generate all the numbers from 0 to 19682.

In APL we can do that using `⍳`. IN the lines below, we'll store the list of numbers in a variable called `boards` and then display the first 6 numbers just to check that APL did what we wanted.

In [6]:
boards ← ⍳19683
boards[0 1 2 3 4 5]

That looks good.

We picked the first six elements of `boards` using indexing, but there's an easier way. We can use APL's *take* function.

In [7]:
6↑boards

How can we convert those numbers to board positions?

We can use a *primitive* (built-in) APL function called encode `⊤` which converts an integer to a representation in any number base we chose. We'll convert each number to its *ternary* (base 3) representation.

In [8]:
encoded ← (9⍴3)⊤boards

What's the result? We won't display it, since it contains a large array, but we can find out its shape.

In [9]:
⍴encoded

Earlier we used `⍴` with a shape on the left and and an array on the right to reshape a vector into a matrix. Here we used `⍴` with just an array on the right. Used that way, `⍴` returns the shape of the array.

We can use `↑` to take 9 rows and 6 columns from the encoded matrix.

In [10]:
9 6↑encoded

Each column is a board position. It feels more natural to have the board positions as rows, and APL has a handy transpose function `⍉` which will do the job.

In [11]:
⍉9 6↑encoded

We're going to do a lot of encoding, so it makes sense to create an `encode` function we can use repeatedly.

In [12]:
encode ← {⍉(9⍴3)⊤⍵} ⍝ integer(s) to board vector/matrix

Lets' try that out.

In [13]:
encode ⍳6

Later on we will want to go in the opposite direction from a vector or vectors to the numbers that correspond to each one.

The function `decode` does that.

In [14]:
decode ← {3⊥⍉⍵} 

In [15]:
decode encode ⍳6

What do these boards look like? Let's try to use the show function we wrote earlier.

In [16]:
show encode ⍳6

Oops! That just shows the first board position, which is rather boring, and not what we wanted. The current definition of `show` uses `3 3⍴` to generate its result, so we only ever get 9 characters arranged as a 3 by 3 matrix. We need to modify `show` so that it displays each board.

We'll use APL's `rank` operator `⍤` which we can use to control how the code is applied to the function's argument.

In this case we want `show` to apply to each vector along the last dimension of the argument we provide.

`3 3⍴⍤ 1` allows us to do that. The `⊢` just serves to separate the `1` used with rank from the code that picks the characters to use as an argument.

In [17]:
show ← {3 3⍴⍤ 1 ⊢'.×○'[⍵]}
show encode ⍳6

Let's look at the shape of show's result.

In [18]:
⍴show encode ⍳6

`show` now converts each vector on its right to a matrix. If we apply `show` to a matrix with one board per row it returns a 3d array with one plane for each board configuration.

If there aren't too many boards it can be easier to see the positions going *across* the page rather than down it.

There's an easy way to do that.

In [19]:
⊂⍤2 show encode ⍳6

The expression `⊂⍤2` converts the cube of characters returned by `show` into a vector (list) of matrices.

We've now created the tools we need to work on the next stage of our implementation of MENACE.

In the next notebook we will look at the relationship between symmetries (rotations and reflections) of board configurations.

We will want to use `encode` and `show` so we will save our work ready to reuse in the next notebook.

`)save notebook3` is an APL *system command*. It saves the functions and variables we've created in a *workspace* file called `notebook3.dws`.

In this case we need to use the `-force` option.
When we openedthe notebook it silently loaded a workspace with a different name.
APL will not change that name unless we tell it to.

In [20]:
)save notebook3 -force