# MENACE: Machine Educable Noughts And Crosses Engine

## Introduction

MENACE (Machine Educable Noughts And Crosses Engine) is a mechanical learning machine designed by Donald Michie in 1961. It was created to demonstrate the concept of reinforcement learning, long before modern machine learning techniques were developed.

## Historical Context

Donald Michie (1923-2007) was a British researcher and a pioneer in artificial intelligence. He worked at Bletchley Park during World War II, contributing to the effort to break the German Enigma code alongside Alan Turing. After the war, Michie pursued a career in biology before transitioning to artificial intelligence research.

MENACE was conceived during a time when computers were still in their infancy and not widely accessible. Michie's invention demonstrated that learning algorithms could be implemented without complex electronic computers, using simple mechanical means.

## How MENACE Works

MENACE uses a collection of matchboxes to "learn" how to play Tic-Tac-Toe (also known as Noughts and Crosses). Here's a brief overview of its operation:

1. **Representation**: Each possible game state is represented by a matchbox.
2. **Decision Making**: Inside each matchbox are colored beads, each color corresponding to a possible move.
3. **Learning**: 
   - To make a move, MENACE randomly selects a bead from the appropriate matchbox.
   - After the game, beads are added or removed based on the outcome:
     - Win: Add three beads of the colors used
     - Draw: Add one bead of the colors used
     - Loss: Remove one bead of the colors used

Over time, MENACE learns to make better moves by adjusting the probabilities of selecting each move in different game states.

## Significance

MENACE is significant for several reasons:

1. **Early Reinforcement Learning**: It demonstrated a simple yet effective form of reinforcement learning, a concept that would become crucial in modern AI and machine learning.
2. **Mechanical Computation**: MENACE showed that learning algorithms could be implemented without electronic computers, emphasizing the underlying principles of machine learning.
3. **Interpretability**: The physical nature of MENACE made it easy to understand and visualize the learning process, a quality often lacking in modern, more complex machine learning models.
4. **Inspiration**: MENACE has inspired numerous recreations and adaptations, including digital implementations and educational tools.

## Modern Relevance

While MENACE itself is no longer used for practical applications, the principles it demonstrates remain relevant:

1. **Reinforcement Learning**: The core idea behind MENACE is still fundamental to modern reinforcement learning algorithms used in various fields, from game playing to robotics.
2. **Explainable AI**: As AI systems become more complex, there's a growing interest in making them more interpretable. MENACE's transparent decision-making process resonates with current efforts in explainable AI.
3. **Educational Tool**: MENACE continues to be an excellent educational tool for introducing concepts of machine learning and reinforcement learning in an accessible, hands-on manner.

## Conclusion

MENACE stands as a testament to the ingenuity of early AI researchers and the timeless nature of fundamental machine learning concepts. Its simplicity, elegance, and effectiveness continue to inspire and educate, bridging the gap between the early days of AI and the cutting-edge technologies of today.

In [None]:
import random

import matplotlib.pyplot as plt
import torch

In [None]:
class MENACE:
    def __init__(self, initial_beads=10):
        self.matchboxes = {}
        self.initial_beads = initial_beads
        self.moves = []
        self.first_box_beads_history = []
        
    def get_state(self, board):
        return str(board.tolist())

    def get_move(self, board):
        state = self.get_state(board)
        if state not in self.matchboxes:
            self.matchboxes[state] = torch.full((9,), self.initial_beads, dtype=torch.float32)
            self.matchboxes[state][board != 0] = 0

        if state == '[0, 0, 0, 0, 0, 0, 0, 0, 0]':
            print(self.matchboxes[state])
            self.first_box_beads_history.append(self.matchboxes[state].tolist())
            
        probs = self.matchboxes[state] / self.matchboxes[state].sum()
        move = torch.multinomial(probs, 1).item()
        self.moves.append((state, move))
        return move

    def update(self, reward):
        for state, move in self.moves:
            self.matchboxes[state][move] += reward
            self.matchboxes[state][move] = max(self.matchboxes[state][move], 0)
        self.moves = []

In [None]:
def play_game(menace, player_func):
    board = torch.zeros(9, dtype=torch.int)
    menace_turn = random.choice([True, False])

    while True:
        if menace_turn:
            move = menace.get_move(board)
            board[move] = 1
        else:
            move = player_func(board)
            board[move] = -1

        if check_win(board):
            return 1 if menace_turn else -1
        if torch.all(board != 0):
            return 0

        menace_turn = not menace_turn

In [None]:
# From a board return True if there are 3 by rows, columns or diagonals.
def check_win(board):
    lines = [
        [0, 1, 2], [3, 4, 5], [6, 7, 8],  # Rows
        [0, 3, 6], [1, 4, 7], [2, 5, 8],  # Columns
        [0, 4, 8], [2, 4, 6]  # Diagonals
    ]
    for line in lines:
        if abs(board[line].sum()) == 3:
            return True
    return False

In [None]:
# Random moves will almost create a perfectly valid uptick in wins.
def random_player(board):
    valid_moves = torch.where(board == 0)[0]
    return random.choice(valid_moves).item()

In [None]:
# Initialise
menace = MENACE()
results = []
num_games = 10000

In [None]:
# If the result is a win (1) - add 3 beads.
# If the result is a draw (0) - add 1 bead.
# If the result is a loss (-1) - remove 1 bead.
for game in range(num_games):
    result = play_game(menace, random_player)
    if result == 1:
        menace.update(3)
    elif result == 0:
        menace.update(1)
    else:
        menace.update(-1)
    results.append(result)

In [None]:
cumulative_results = torch.tensor(results).float().cumsum(dim=0)
games_played = torch.arange(1, len(results) + 1).float()
running_average = cumulative_results / games_played

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 16))

# Plot performance
ax1.plot(games_played, running_average)
ax1.set_title("MENACE Performance Over {} Games".format(num_games))
ax1.set_xlabel("Games Played")
ax1.set_ylabel("Running Average Score")
ax1.grid(True)

# Plot Matchbox 0
print(menace.first_box_beads_history)
first_box_beads_history = torch.tensor(menace.first_box_beads_history).t()
for i in range(9):
    ax2.plot(range(len(first_box_beads_history[i])), first_box_beads_history[i], label=f'Position {i}')

ax2.set_title("Beads in First Matchbox for Each Position Over Time")
ax2.set_xlabel("Game Number")
ax2.set_ylabel("Number of Beads")
ax2.legend()
ax2.grid(True)

win_rate = results.count(1) / len(results)
draw_rate = results.count(0) / len(results)
loss_rate = results.count(-1) / len(results)

print(f"Win rate: {win_rate:.2%}")
print(f"Draw rate: {draw_rate:.2%}")
print(f"Loss rate: {loss_rate:.2%}")