# Building and training the MENACE

The Matchbox Educable Noughts and Crosses Engine (MENACE) is a tic-tac-toe playing machine learning algorithm developed by Donald Michie in 1961. The original MENACE was built as a mechanical computer made of 304 matchboxes and a set of rules for learning. This is a Python implementation of the same logic that I worked on as a problem solving and learning exercise.  

- The project was inspired by this blogpost: [Underfitted, The Most Ridiculous Tic-Tac-Toe Machine](https://underfitted.svpino.com/p/the-most-ridiculous-tic-tac-toe-machine)  
- More info on the original MENACE can be found on [Wikipedia](https://en.wikipedia.org/wiki/Matchbox_Educable_Noughts_and_Crosses_Engine).  
- Finding the possible states of the game is not trivial. This part a solution was copied from this repo https://github.com/vug/tic-tac-toe-game-states-graph and studied in the notebook [MENACE_understanding_possible_board_states.ipynb](MENACE_understanding_possible_board_states.ipynb)


## Imports

In [2]:
import import_ipynb
import random
import math
from itertools import chain

In [1]:
#%load_ext ask_ai.magics

In [3]:
# Setting flush=True in print() to flush the output to the console immediately
# This is needed to achieve expected behaviour in interactive mode
import functools
print = functools.partial(print, flush=True)

We import the functions for game state and symmetry group analysis defined in [MENACE_board_states_functions.ipynb](MENACE_board_states_functions.ipynb)

In [4]:
from MENACE_board_states_functions import *

importing Jupyter notebook from MENACE_board_states_functions.ipynb


## Class and game play function definitions

The TokenBag class represents the content within one matchbox in the engine.
For each of the nine position on the 3x3 board there are separate tokens. At the start of the 
game each position has three corresponding tokens. As the engine learns the number of these tokens
changes.

In [5]:
class TokenBag:
    def __init__(self, start_tokens = 3):
        # Each board position is initialized with start_tokens numbers of tokens
        
        # dict keys represents board positions as below, values represent token count
        # 0, 1, 2
        # 3, 4, 5
        # 6, 7, 8
        
        board_dict = {}
        for i in range(9):
            board_dict[i] = start_tokens
    
        self.token_counts = board_dict

    def __str__(self):
        return str(self.token_counts)

The MatchBoxes class represents the engine's 765 matchboxes, one for each possible
state of the game board. Each matchbox has a token bag and the information on what 
board positions (symmetry group) the box represents.

The class has two functions. get_token selects a random token from the box that is then used to play the next move in the game. change_token_count updates the number of tokens in the boxes TokenBag.

In [6]:
class MatchBoxes:
    # Class to represent the matchboxes used by the MENACE engine
    # Dictionary of boxes, where the key is the box number and the values are
    # a dictionary of tokens for each position 0-8 and a list of all states in symmetry group including duplicates.
    
    # Function that takes all_states as input and creates a dict with symmetry group number as key and 
    # the first board state in the symmetry group as value. This state is used to calculate all other transformations.

    def __init__(self):
        def tuple_state_to_dict(st):
            # Nested state tuple to list
            st_list = []
            pos = 0
            for i in st:
                for j in i:
                    st_list.append(pos)
                    pos += 1
                    st_list.append(j)
            # State list to dictionary
            it = iter(st_list)
            st_dict = dict(zip(it, it))
            return st_dict
        
        def set_box_values():
            new_dict = {}
            all_states = all_states_and_groups() # Dictionary with all possible states as keys and symmetry group number as value
            for key, value in all_states.items():
                if value not in new_dict:
                    new_dict[value] = [TokenBag(), [key]]
            
            # Add all symmetry states includeing duplicates to the list of states in each box
            for key, value in new_dict.items():
                canonical_state = value[1][0] # First state in symmetry group
                new_dict[key][1] = get_symmetries_with_duplicates(canonical_state) 
                # Remove tokens from already played positions in canonical state
                board_state_dict = tuple_state_to_dict(new_dict[key][1][0])
                for pos in board_state_dict:
                    if board_state_dict[pos] != 0:
                        new_dict[key][0].token_counts[pos] = 0               
            return new_dict
    
        self.boxes = set_box_values()
    
    def get_token(self, board_state):
        board_position_symmetries = get_symmetries_with_duplicates(((0,1,2),(3,4,5),(6,7,8)))
        all_states = all_states_and_groups() # Dictionary with all possible states as keys and symmetry group number as value
        # If the board state is not in all_states, return
        if board_state not in all_states:
            print('ERROR: Board state not among allowed states.')
            print_state(board_state)
            return
        # Transform the board_state tuple to a dictionary
        def tuple_state_to_dict(st):
            # Nested state tuple to list
            st_list = []
            pos = 0
            for i in st:
                for j in i:
                    st_list.append(pos)
                    pos += 1
                    st_list.append(j)
            # State list to dictionary
            it = iter(st_list)
            st_dict = dict(zip(it, it))
            return st_dict

        # Get the box number corresponding to the given board state
        box_number = all_states[board_state]
        # Get the box corresponding to the box number
        box = self.boxes[box_number]
        
        # Get the total number of tokens in the box
        total_tokens = sum(box[0].token_counts.values()) # box[0] is the TokenBag object
        # If the TokenBag is empty, create a new TokenBag
        if total_tokens == 0:
            box[0] = TokenBag()
            board_state_dict = tuple_state_to_dict(box[1][0]) #box[1][0] is the first state in the symmetry group
            for pos in board_state_dict:
                if board_state_dict[pos] != 0:
                    box[0].token_counts[pos] = 0

            total_tokens = sum(box[0].token_counts.values())

        # Select a random number between 1 and the total number of tokens
        try:
            random_number = random.randint(1, total_tokens)
        except ValueError:
            print(f'ERROR: No tokens in box {box_number}. {box[0].token_counts}')
            return
        # Initialize a variable to keep track of the number of tokens counted
        tokens_counted = 0
        # Loop through the token counts for the box
        for position, count in box[0].token_counts.items():
            # Add the token count to the token count variable
            tokens_counted += count
            # If the random number is less than or equal to the token count,
            # return the position
            if random_number <= tokens_counted:
                chosen_token = position
                break

        # Get the (first) index of the symmetry state
        symmetry_state_index = box[1].index(board_state) # box[1] is the list of symmetry states
        # Specific board position symmetry to dict
        board_position_symmetry_dict = tuple_state_to_dict(board_position_symmetries[symmetry_state_index])
        # Get the position of the chosen token in the symmetry state
        #symmetry_position = board_position_symmetry_dict[chosen_token]
        symmetry_position = [k for k, v in board_position_symmetry_dict.items() if v == chosen_token][0]
        

        return box_number, chosen_token, symmetry_position #chosen_token for positions played book keeping and symmetry_position to update board state
    


    def change_token_count(self, box_number, position, change):
        # Change the token count for a given box and position
        self.boxes[box_number][0].token_counts[position] += change

    def show_token_bag(self, box_number):
        # Show the token bag for a given box
        print(self.boxes[box_number][0])



The MenaceEngine is the class that represents the learning MENACE engine. It has a MatchBoxes object from which it selects the moves to play and methods for playing moves and updating the state of the TokenBags in the match boxes. TokenBags are updated while the MenaceEngine is learning. The engine is learning by default and this can be changed using the set_learning method.

In [7]:
# A class that represents the learning MENACE engine
class MenaceEngine:
    '''
    A class that represents the MENACE learning engine.
    There are 765 symmetry states represented by matchboxes, one for each possible board state.
    
    https://en.wikipedia.org/wiki/Matchbox_Educable_Noughts_and_Crosses_Engine
    Once the game was finished, if MENACE had won, it would then receive a "reward" for its victory. 
    The removed beads showed the sequence of the winning moves.[15] These were returned to their respective trays, 
    easily identifiable since they were slightly open, as well as three bonus beads of the same colour.[11] 
    In this way, in future games MENACE would become more likely to repeat those winning moves, 
    reinforcing winning strategies. If it lost, the removed beads were not returned, "punishing" MENACE, 
    and meaning that in future it would be less likely, and eventually incapable if that colour of bead became absent, 
    to repeat the moves that cause a loss.[3][8] If the game was a draw, one additional bead was added to each box.[11]
    
    # Unclear how many beads are in the boxes in the beginning
    '''
    
    def __init__(self, name):
        # MenaceEngine is initialized with one MatchBoxes object
        self.boxes = MatchBoxes()
        # Initialize a list to keep track of the played positions
        self.played_positions = []
        # Initialize a variable to keep track of the learning status of the MENACE engine
        self.is_learning = True
        self.name = name

    # Method that selects a random token from a box corresponding to a given board state, saves the played position
    # to a list and returns the position
    def play(self, board_state):
        # Call boxes.get_token(board_state) --> (box_number, chosen_token, symmetry_position)
        box_number, chosen_token, symmetry_position = self.boxes.get_token(board_state)
        # Add box_number and chosen_token to played_positions
        self.played_positions.append((box_number, chosen_token))
        # Return symmetry_position as the position to play
        return symmetry_position
        
    
    # Method that resolves the game and updates the matchboxes if the MENACE engine is learning
    def resolve_game(self, result):
        # If the MENACE engine is not learning, return
        if not self.is_learning:
            self.played_positions = []
            return
        # If the result is Win, add three additional tokens to each box in played_positions
        if result == 'W':
            for box_number, position in self.played_positions:
                self.boxes.change_token_count(box_number, position, 3)
        # If the result is Loss, remove one token from each box in played_positions
        elif result == 'L':
            for box_number, position in self.played_positions:
                self.boxes.change_token_count(box_number, position, -1)
        # If the result is Draw, add one additional token to each box in played_positions
        elif result == 'D':
            for box_number, position in self.played_positions:
                self.boxes.change_token_count(box_number, position, 1)
        # Reset played_positions
        self.played_positions = []

    # Method to set the learning status of the MENACE engine
    def set_learning(self, is_learning):
        self.is_learning = is_learning


The play_game function plays a game between two players. The function prints the moves if to_print is set to True.

In [8]:
# A function that plays a game of noughts and crosses between two players
def play_game(playerA, playerB, to_print = True, to_return_winner = False):
    
    # Select a random player to start the game
    if random.randint(0, 1) == 0:
        player1, player2 = playerA, playerB
    else:
        player1, player2 = playerB, playerA
    
    board_state = ((0, 0, 0), (0, 0, 0), (0, 0, 0))
    is_game_over = False
    winner = None
    turn = 0
    to_print = to_print

    # Helper functions to work with the board state
    # Tuple is immutable, so we need to convert it to a list to update it
    # Tuples are used as keys in the all_states dictionary
    def tuple_state_to_list(st):
        l = []
        for t in st:
            l.append(list(t))
        return l
    def list_state_to_tuple(st):
        t = []
        for l in st:
            t.append(tuple(l))
        return tuple(t)
    
    if to_print:
        print(f"Player 1: {player1.name} vs. Player 2: {player2.name}")
        print(f"Turn {turn}")
        print_state(board_state)
        print('\n')
    turn += 1
    # Loop until the game is over
    while not is_game_over:
        
        # Play a turn for player 1
        position = player1.play(board_state)
        # Update the board state
        board_state = tuple_state_to_list(board_state)
        board_state[math.floor(position / 3)][position % 3] = 1
        board_state = list_state_to_tuple(board_state)
        if to_print:
            print(f"Turn {turn}, {player1.name}")
            print_state(board_state)
            print('\n')
        # Increment the turn counter
        turn += 1
        # Check if this is a win
        is_game_over = is_end(board_state)
        # If the game is over, break the loop
        if is_game_over:
            winner = 1
            if to_print:
                print(f"{player1.name} wins!")
            break
        
        # Play a turn for player 2
        
        # Check if the game is a draw
        # If there are no 0 values in the board state, the game is a draw
        if list(chain.from_iterable(tuple_state_to_list(board_state))).count(0) == 0:
            is_game_over = True
            winner = 0
            if to_print:
                print("The game is a draw!")
            break

        position = player2.play(board_state)
        # Update the board state
        board_state = tuple_state_to_list(board_state)
        board_state[math.floor(position / 3)][position % 3] = 2
        board_state = list_state_to_tuple(board_state)
        if to_print:
            print(f"Turn {turn}, {player2.name}")
            print_state(board_state)
            print('\n')
        # Increment the turn counter
        turn += 1
        # Check if this is a win
        is_game_over = is_end(board_state)
        # If the game is over, break the loop
        if is_game_over:
            winner = 2
            if to_print:
                print(f"{player2.name} wins!")                
            break

    # Resolve the game for both players
    if is_game_over:
        if winner == 1:
            player1.resolve_game('W')
            player2.resolve_game('L')
            winning_player = player1.name
        elif winner == 2:
            player1.resolve_game('L')
            player2.resolve_game('W')
            winning_player = player2.name
        else:
            player1.resolve_game('D')
            player2.resolve_game('D')
            winning_player = 'Draw'

    if to_return_winner:
        return winning_player        

## Testing the engine

We create two MenaceEngine objects that will play against each other.

In [9]:
testEngineA = MenaceEngine(name = 'testEngineA')
testEngineB = MenaceEngine(name = 'testEngineB')

And have them play one game.

In [10]:
play_game(testEngineA, testEngineB, to_print = True)

Player 1: testEngineA vs. Player 2: testEngineB
Turn 0
000
000
000


Turn 1, testEngineA
100
000
000


Turn 2, testEngineB
100
000
002


Turn 3, testEngineA
100
000
012


Turn 4, testEngineB
102
000
012


Turn 5, testEngineA
102
000
112


Turn 6, testEngineB
122
000
112


Turn 7, testEngineA
122
100
112


testEngineA wins!


The played moves are not very intelligent as neither engine has been trained. Let's see if the TokenBags were updated as expected. The winners played moves should have been updated with more tokens and the loosers with fewer. Below we see what happened to the TokenBag for the first players first move (when the board was empty).

In [11]:
testEngineA.boxes.show_token_bag(1)

{0: 6, 1: 3, 2: 3, 3: 3, 4: 3, 5: 3, 6: 3, 7: 3, 8: 3}


We let the engines learn by playing each other for 100 games without printing the output. We then play one more game for which we print what happens.

In [12]:
for i in range(100):
    play_game(testEngineA, testEngineB, to_print = False)
play_game(testEngineA, testEngineB, to_print = True)

Player 1: testEngineB vs. Player 2: testEngineA
Turn 0
000
000
000


Turn 1, testEngineB
010
000
000


Turn 2, testEngineA
010
002
000


Turn 3, testEngineB
010
102
000


Turn 4, testEngineA
010
102
020


Turn 5, testEngineB
010
102
120


Turn 6, testEngineA
010
102
122


Turn 7, testEngineB
010
112
122


Turn 8, testEngineA
012
112
122


testEngineA wins!


Still not very smart playing. We train for 1000 more games.

In [13]:
for i in range(1000):
    play_game(testEngineA, testEngineB, to_print = False)
play_game(testEngineA, testEngineB, to_print = True)

Player 1: testEngineB vs. Player 2: testEngineA
Turn 0
000
000
000


Turn 1, testEngineB
000
000
010


Turn 2, testEngineA
000
200
010


Turn 3, testEngineB
000
200
011


Turn 4, testEngineA
000
220
011


Turn 5, testEngineB
000
221
011


Turn 6, testEngineA
002
221
011


Turn 7, testEngineB
002
221
111


testEngineB wins!


That looks like a slightly more intelligently played game. The better the engines play, the more often the game should end in a draw. Let's play 100 games and see how often they draw.

In [14]:
draws_and_games = [0,0]
for i in range(100):
    result = play_game(testEngineA, testEngineB, to_print = False, to_return_winner=True)
    draws_and_games[1] += 1
    if result == 'Draw':
        draws_and_games[0] += 1

In [15]:
print(f"{(draws_and_games[0] / draws_and_games[1] * 100):.0f}% of games end in a draw.") 

29% of games end in a draw.


If we train for another 1000 games the share of draws should increase.

In [16]:
for i in range(1000):
    play_game(testEngineA, testEngineB, to_print = False)

In [17]:
draws_and_games = [0,0]
for i in range(100):
    result = play_game(testEngineA, testEngineB, to_print = False, to_return_winner=True)
    draws_and_games[1] += 1
    if result == 'Draw':
        draws_and_games[0] += 1
print(f"{(draws_and_games[0] / draws_and_games[1] * 100):.0f}% of games end in a draw.") 

36% of games end in a draw.


The original MENACE was said to be unbeatable after training for 150 games. These engines train more slowly probably because they play against another randomly playing engine. In addition, they play both first and second player while the match box MENACE only played first. If they would train against a optimally playing human, they would learn the most important positions first. Let's train for another 5000 games and calculate the share of draws again.

In [18]:
for i in range(5000):
    play_game(testEngineA, testEngineB, to_print = False)

In [19]:
draws_and_games = [0,0]
for i in range(100):
    result = play_game(testEngineA, testEngineB, to_print = False, to_return_winner=True)
    draws_and_games[1] += 1
    if result == 'Draw':
        draws_and_games[0] += 1
print(f"{(draws_and_games[0] / draws_and_games[1] * 100):.0f}% of games end in a draw.") 

78% of games end in a draw.


In [20]:
play_game(testEngineA, testEngineB, to_print = True)

Player 1: testEngineB vs. Player 2: testEngineA
Turn 0
000
000
000


Turn 1, testEngineB
001
000
000


Turn 2, testEngineA
001
020
000


Turn 3, testEngineB
001
021
000


Turn 4, testEngineA
001
021
002


Turn 5, testEngineB
101
021
002


Turn 6, testEngineA
121
021
002


Turn 7, testEngineB
121
021
012


Turn 8, testEngineA
121
021
212


Turn 9, testEngineB
121
121
212


The game is a draw!


That looks pretty good. We will now write a class that lets a human play against the engine.

In [21]:
class HumanPlayer():
    def __init__(self, name):
        self.name = name
    def play(self, board_state):
        print("Enter a position (0-8):")
        position = int(input())
        return position
    def resolve_game(self, result):
        pass

In [22]:
play_game(HumanPlayer('I, the player'), testEngineA, to_print = True)

Player 1: I, the player vs. Player 2: testEngineA
Turn 0
000
000
000


Enter a position (0-8):
Turn 1, I, the player
100
000
000


Turn 2, testEngineA
100
002
000


Enter a position (0-8):
Turn 3, I, the player
100
002
100


Turn 4, testEngineA
102
002
100


Enter a position (0-8):
Turn 5, I, the player
102
002
101


Turn 6, testEngineA
102
202
101


Enter a position (0-8):
Turn 7, I, the player
102
202
111


I, the player wins!


Here we create a new untrained MENACE and let it play against a trained engine.

In [23]:
untrainedMENACE = MenaceEngine(name = 'untrainedMENACE')
untrainedMENACE.set_learning(False)

In [24]:
play_game(untrainedMENACE, testEngineA, to_print = True)

Player 1: untrainedMENACE vs. Player 2: testEngineA
Turn 0
000
000
000


Turn 1, untrainedMENACE
000
001
000


Turn 2, testEngineA
000
001
200


Turn 3, untrainedMENACE
001
001
200


Turn 4, testEngineA
001
001
202


Turn 5, untrainedMENACE
001
101
202


Turn 6, testEngineA
001
121
202


Turn 7, untrainedMENACE
101
121
202


Turn 8, testEngineA
101
121
222


testEngineA wins!


And we calculate what percentage of games the untrained MENACE wins. It should be very low.

In [25]:
untrainedwins_and_games = [0,0,0]
for i in range(100):
    result = play_game(untrainedMENACE, testEngineA, to_print = False, to_return_winner=True)
    untrainedwins_and_games[2] += 1
    if result == 'untrainedMENACE':
        untrainedwins_and_games[0] += 1
    if result == 'Draw':
        untrainedwins_and_games[1] += 1
print(f"The untrained engine wins {(untrainedwins_and_games[0] / untrainedwins_and_games[2] * 100):.0f}% \
of games and {(untrainedwins_and_games[1] / untrainedwins_and_games[2] * 100):.0f}% end in a draw.") 

The untrained engine wins 11% of games and 15% end in a draw.


So the MENACE engine actually does work!