# Lab 3: Policy Search

Write agents able to play Nim, with an arbitrary number of rows and an upper bound on the number of objects that can be removed in a turn (a.k.a., subtraction game).

- Task3.1: An agent using fixed rules based on nim-sum (i.e., an expert system)
- Task3.2: An agent using evolved rules
- Task3.3: An agent using minmax
- Task3.4: An agent using reinforcement learning

In [1]:
import logging
import random
from copy import deepcopy

In [2]:
logging.basicConfig(format="%(message)s", level=logging.INFO)

In [3]:
class Nim:
    def __init__(self, num_rows: int, k: int = None) -> None:
        self._rows = [i*2 + 1 for i in range(num_rows)]
        self._k = k

    def nimming(self, row: int, num_objects: int) -> None:
        assert self._rows[row] >= num_objects
        assert self._k is None or num_objects <= self._k
        self._rows[row] -= num_objects
        #if sum(self._rows) == 0:
            #logging.info("Yeuch")

In [4]:
def has_lost(nim: Nim):
    return sum(nim._rows) == 0

### The hardcoded opponent

The hardcoded opponent removes a random number of sticks from a random row, if the row has one or more sticks.

In [5]:
def random_move(nim: Nim):
    rows = list(filter(lambda r: r > 0, nim._rows))
    if len(rows) > 0:
        index = nim._rows.index(rows[random.randint(0, len(rows)-1)])
        num = random.randint(1, nim._rows[index])
        nim.nimming(index, num)
        return (index, num)
    else:
        logging.info("No more moves allowed!")

### The nim sum

The following description was taken from [here](https://dmf.unicatt.it/~paolini/divulgazione/mateappl/nim/nim.html).

In a game of nim that involves nim heaps where you can take as many objects as you want from any one of the heaps during your turn, you need to be able to compute a nim sum, that characterizes the configuration of the game.

Here's how to do it:

- Express the number of objects in each nim heap as a binary number, with the only digits being 0 and 1.
- Fill out the smaller binary numbers with '0's on the left, if necessary, so that all the numbers have the same number of digits.
- Sum the binary numbers, but do not carry.
- Replace each digit in the sum with the remainder that results when the digit is divided by 2.
- This yields the nim sum.
- To win at nim, always make a move, when possible, that leaves a configuration with a nim sum of 0. If you cannot do this, your opponent has the advantage, and you have to depend on his or her committing an error in order to win.
    - Note that if the configuration you are given has a nim sum not equal to 0, there always is a move that creates a new configuration with a nim sum of 0. However, there are usually also moves that will yield configurations that give nim sums not equal to 0, and you need to avoid making these.
    - Also note that if you are given a configuration that has a nim sum of 0, there is no move that will create a configuration that also has a nim sum of 0.

In [6]:
# Accessory functions
def to_binary_list(num: int, n):
    _list = []
    new_num = num
    for _ in range(0, n):
        _list.append(new_num % 2)
        new_num = int(new_num / 2)
    return _list

def sum_lists(lists):
    new_list = list(0 for _ in range(0, len(lists[0])))
    for _list in lists:
        for i in range(0, len(_list)):
            new_list[i] += _list[i]
    return new_list

def has_an_odd_element(_list):
    for item in _list:
        if item % 2 != 0:
            return True
    return False


In [7]:
def nim_sum(nim: Nim):
    binary_lists = []
    num_digit = 1
    while (2**num_digit < len(nim._rows)*2):
        num_digit += 1
    for item in nim._rows:
        binary_lists.append(to_binary_list(item, num_digit))
    if has_an_odd_element(sum_lists(binary_lists)):
        return 1
    return 0

#### The nim sum strategy

Calculate the nim_sum of the current board:
- if it is not zero, evaluate some random moves until a zero nim_sum is found
- if it is zero, perform a random move

In [8]:
def nim_sum_move(nim: Nim):
    if nim_sum(nim) != 0:
        found = False
        while(not found):
            temp_nim = deepcopy(nim)
            moves = random_move(temp_nim)
            if nim_sum(temp_nim) == 0:
                nim.nimming(moves[0], moves[1])
                found = True
        return
    random_move(nim)

## The game

The play_nim function takes three parameters:
- `n` indicates the number of rows on the board
- `opponent_strategy` indicates the method used by the opponent to make a move
- `who_starts` indicates who makes the first move; it is an odd number (tipically 1) to indicate the user and an even number (tipically 0) to indicate the opponent

In [9]:
def play_nim(n, opponent_strategy, who_starts):
    nim = Nim(n)
    logging.info(f"Initial setting: {nim._rows}")
    step = who_starts
    while not has_lost(nim):
        step += 1
        if step % 2 == 0: # user's turn
            nim_sum_move(nim)
            #logging.info(f"User's turn: {nim._rows} - Nim sum: {nim_sum(nim)}")
        else: # opponent's turn
            opponent_strategy(nim)
            #logging.info(f"Opponent's turn: {nim._rows} - Nim sum: {nim_sum(nim)}")
    if step % 2 == 0:
        logging.info("The user won!")
    else:
        logging.info("The opponent won!")


### First version: opponent's random strategy

In [10]:
logging.info("Game 1: N = 5")
logging.info("I start, the opponent uses the random strategy")
play_nim(5, random_move, 1)

logging.info("Game 2: N = 5")
logging.info("The opponent starts and uses the random strategy")
play_nim(5, random_move, 0)

Game 1: N = 5
I start, the opponent uses the random strategy
Initial setting: [1, 3, 5, 7, 9]
The user won!
Game 2: N = 5
The opponent starts and uses the random strategy
Initial setting: [1, 3, 5, 7, 9]
The user won!


### Second version: Everyone is using the nim sum strategy

In [11]:
logging.info("Game 3: N = 5")
logging.info("I start, the opponent uses the nim sum strategy")
play_nim(5, nim_sum_move, 1)

logging.info("Game 4: N = 5")
logging.info("The opponent starts and uses the nim sum strategy")
play_nim(5, nim_sum_move, 0)

Game 3: N = 5
I start, the opponent uses the nim sum strategy
Initial setting: [1, 3, 5, 7, 9]
The user won!
Game 4: N = 5
The opponent starts and uses the nim sum strategy
Initial setting: [1, 3, 5, 7, 9]
The opponent won!


### Third version: The opponent is using the nim sum strategy with a 70% probability

In [12]:
def not_so_smart_opponent(nim: Nim):
    if random.random() < 0.7:
        nim_sum_move(nim)
    else:
        random_move(nim)

logging.info("Game 5: N = 5")
logging.info("I start, the opponent uses the nim sum strategy")
play_nim(5, not_so_smart_opponent, 1)

logging.info("Game 6: N = 5")
logging.info("The opponent starts and uses the nim sum strategy")
play_nim(5, not_so_smart_opponent, 0)

Game 5: N = 5
I start, the opponent uses the nim sum strategy
Initial setting: [1, 3, 5, 7, 9]
The user won!
Game 6: N = 5
The opponent starts and uses the nim sum strategy
Initial setting: [1, 3, 5, 7, 9]
The user won!
