# Lab 3: Policy Search

Write agents able to play Nim, with an arbitrary number of rows and an upper bound on the number of objects that can be removed in a turn (a.k.a., subtraction game).

- Task3.1: An agent using fixed rules based on nim-sum (i.e., an expert system)
- Task3.2: An agent using evolved rules
- Task3.3: An agent using minmax
- Task3.4: An agent using reinforcement learning

In [13]:
import logging
from collections import namedtuple
import random
from copy import deepcopy
from itertools import accumulate
from operator import xor

In [14]:
logging.basicConfig(format="%(message)s", level=logging.INFO)

### The Nim and Nimply classes

In [15]:
Nimply = namedtuple("Nimply", "row, num_objects")

In [16]:
class Nim:
    def __init__(self, num_rows: int, k: int = None) -> None:
        self._rows = [i * 2 + 1 for i in range(num_rows)]
        self._k = k

    def __bool__(self):
        return sum(self._rows) > 0

    def __str__(self):
        return "<" + " ".join(str(_) for _ in self._rows) + ">"

    @property
    def rows(self) -> tuple:
        return tuple(self._rows)

    @property
    def k(self) -> int:
        return self._k

    def nimming(self, ply: Nimply) -> None:
        row, num_objects = ply
        assert self._rows[row] >= num_objects
        assert self._k is None or num_objects <= self._k
        self._rows[row] -= num_objects

In [17]:
def possible_moves(nim: Nim):
    return [Nimply(r, o) for r, c in enumerate(nim.rows) for o in range(1, c + 1) if nim.k is None or o <= nim.k]

### The hardcoded opponent

The hardcoded opponent removes a random number of sticks from a random row, if the row has one or more sticks.

In [18]:
def random_strategy(nim: Nim):
    moves_list = possible_moves(nim)

    if len(moves_list) > 0:
        move = random.choice(moves_list)
        nim.nimming(move)
        return move
    else:
        logging.info("No more moves allowed!")

### The nim sum

The following description was taken from [here](https://dmf.unicatt.it/~paolini/divulgazione/mateappl/nim/nim.html).

In a game of nim that involves nim heaps where you can take as many objects as you want from any one of the heaps during your turn, you need to be able to compute a nim sum, that characterizes the configuration of the game.

Here's how to do it:

- Express the number of objects in each nim heap as a binary number, with the only digits being 0 and 1.
- Fill out the smaller binary numbers with '0's on the left, if necessary, so that all the numbers have the same number of digits.
- Sum the binary numbers, but do not carry.
- Replace each digit in the sum with the remainder that results when the digit is divided by 2.
- This yields the nim sum.
- To win at nim, always make a move, when possible, that leaves a configuration with a nim sum of 0. If you cannot do this, your opponent has the advantage, and you have to depend on his or her committing an error in order to win.
    - Note that if the configuration you are given has a nim sum not equal to 0, there always is a move that creates a new configuration with a nim sum of 0. However, there are usually also moves that will yield configurations that give nim sums not equal to 0, and you need to avoid making these.
    - Also note that if you are given a configuration that has a nim sum of 0, there is no move that will create a configuration that also has a nim sum of 0.

In this implementation, the nim sum is calculated as the bitwise xor of the rows.

In [19]:
def nim_sum(state: Nim) -> int:
    *_, result = accumulate(state.rows, xor)
    return result

#### The nim sum strategy

Calculate the nim_sum of the current board:
- if it is not zero, evaluate some random moves until a zero nim_sum is found
- if it is zero, perform a random move

In [20]:
def nim_sum_strategy(nim: Nim):
    if nim_sum(nim) != 0:
        moves_list = possible_moves(nim)
        for move in moves_list:
            temp_nim = deepcopy(nim)
            temp_nim.nimming(move)
            if nim_sum(temp_nim) == 0:
                nim.nimming(move)
                return
    random_strategy(nim)

## The game

The play_nim function takes four parameters:
- `n` indicates the number of rows on the board.
- `first_strategy` indicates the method used by the first agent to make a move.
- `second_strategy` indicates the method used by the second agent to make a move.
- `who_starts` indicates who makes the first move; it is an even number (tipically 0) to indicate the first agent and an odd number (tipically 1) to indicate the second agent.

In [21]:
def play_nim(n, first_strategy, second_strategy, who_starts):
    nim = Nim(n)
    #logging.info(f"Initial setting: {nim._rows} - Nim sum: {nim_sum(nim)}")
    step = 1 - (who_starts % 2)
    while nim:
        step = 1 - step
        if step == 0: # first agent's turn
            first_strategy(nim)
            #logging.info(f"First agent's turn: {nim._rows} - Nim sum: {nim_sum(nim)}")
        else: # second agent's turn
            second_strategy(nim)
            #logging.info(f"Second agent's turn: {nim._rows} - Nim sum: {nim_sum(nim)}")
    if step == 0:
        logging.info("--- The first agent won! ---\n")
    else:
        logging.info("--- The second agent won! ---\n")


### First version: second agent's random strategy

In [22]:
logging.info("Game 1:\n N = 5\n Start: First Agent\n First agent: nim sum strategy\n Second agent: random strategy")
play_nim(5, nim_sum_strategy, random_strategy, 0)

logging.info("Game 2:\n N = 5\n Start: Second Agent\n First agent: nim sum strategy\n Second agent: random strategy")
play_nim(5, nim_sum_strategy, random_strategy, 1)

Game 1:
 N = 5
 Start: First Agent
 First agent: nim sum strategy
 Second agent: random strategy
--- The first agent won! ---

Game 2:
 N = 5
 Start: Second Agent
 First agent: nim sum strategy
 Second agent: random strategy
--- The first agent won! ---



### Second version: Everyone is using the nim sum strategy

In [23]:
logging.info("Game 3:\n N = 5\n Start: First Agent\n First agent: nim sum strategy \n Second agent: nim sum strategy")
play_nim(5, nim_sum_strategy, nim_sum_strategy, 0)

logging.info("Game 4:\n N = 5\n Start: Second Agent\n First agent: nim sum strategy\n Second agent: nim sum strategy")
play_nim(5, nim_sum_strategy, nim_sum_strategy, 1)

Game 3:
 N = 5
 Start: First Agent
 First agent: nim sum strategy 
 Second agent: nim sum strategy
--- The first agent won! ---

Game 4:
 N = 5
 Start: Second Agent
 First agent: nim sum strategy
 Second agent: nim sum strategy
--- The second agent won! ---



### Third version: The second agent is using the nim sum strategy with a 70% probability

This strategy emulates an human (that can make errors).

In [24]:
def not_so_smart_strategy(nim: Nim):
    if random.random() < 0.7:
        nim_sum_strategy(nim)
    else:
        random_strategy(nim)

logging.info("Game 5:\n N = 5\n Start: First Agent\n First agent: nim sum strategy\n Second agent: not so smart strategy")
play_nim(5, nim_sum_strategy, not_so_smart_strategy, 0)

logging.info("Game 6:\n N = 5\n Start: Second Agent\n First agent: nim sum strategy\n Second agent: not so smart strategy")
play_nim(5, nim_sum_strategy, not_so_smart_strategy, 1)

Game 5:
 N = 5
 Start: First Agent
 First agent: nim sum strategy
 Second agent: not so smart strategy
--- The first agent won! ---

Game 6:
 N = 5
 Start: Second Agent
 First agent: nim sum strategy
 Second agent: not so smart strategy
--- The first agent won! ---

