## Framing the Problem

This explores the game of Orlog from Assassin's Creed: Valhalla, in search of an optimal policy for winning games.

In the very short description, we can probably approach this as a state table with a Q-learning approach. We treat the game as an MDP since prior turns in the game do not change what information we gain about the game's progress when we know the current turn. Each game (multiple turns) is a single episode. Games are short (there are few state changes) so we can update the policy at the end of each episode - needn't look into temporal difference approaches.

To make the development on the game and algorithm fast, we can simplify the game down to the core pieces:

1. Player flips a coin to determine who goes first.
1. Player rolls a single dice up to three times and chooses which roll result they want to keep.
1. Opponent does the same.
1. Resolution occurs.
1. If there is health on both sides of the game, repeat previous three steps. Else, player with health remaining is the winner.
1. Policy is updated based on which moves resulted in a win and which resulted in a loss.
1. Repeat until a policy is developed that consistently wins games better than random strategy.

TODO:

* Create game representation.
* Create random strategy.
* Pull metrics on win/loss rate for random strategy. This serves as the baseline.
* Review algorithms available to us and pass the game representation into one.

In [1]:
import logging
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import random
import seaborn as sns
from enum import Enum, auto

import torch
import torch.nn as nn

In [2]:
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
logging.debug(f'DEBUG logging enabled!')

DEBUG:root:DEBUG logging enabled!


In [3]:
def flip_coin():
    """
    1 is heads, 0 is tails.
    """
    return random.randint(0,1)

In [4]:
class DieFacing(Enum):
    SWORD = 1
    SHIELD = 2
    ARROW = 3
    HELMET = 4
    STEAL = 5
    FAVOUR = 6

In [5]:
def resolution(game_state):
    # TODO impl
    pass

In [6]:
def roll_dice_and_choose(
    strategy_callable, 
    die_per_roll: int = 1
):
    """
    Use a strategy_callable to roll die
    
    @param die_to_roll: How many die to toss each roll
    """
    turns = 3
    die_kept = []
    # TODO impl

In [21]:
def random_dice_keeping_strategy(roll_results: list):
    """
    Randomly decide which rolls to keep.
    """
    keep_rolls = []
    for result in roll_results:
        if random.randint(0,1) > 0:
            keep_rolls.append(result)
    return keep_rolls


def steadfast_dice_keeping_strategy(roll_results: list):
    """Always keep all rolls."""
    return roll_results

In [46]:
class OrlogGame(object):
    def __init__(
        self, 
        player_strategy = steadfast_dice_keeping_strategy, 
        opponent_strategy = steadfast_dice_keeping_strategy
    ):
        self.current_turn = 0
        self.die_per_turn = 1
        self.max_rolls = 3
        self.max_turns = 100  # TODO Just a failsafe. Implement.
        self.player_strategy = player_strategy
        self.opponent_strategy = opponent_strategy
    
    def describe_game_state() -> dict:
        # TODO Impl
        return
    
    def roll_dice(self, die: int):
        die_results = []
        for dice in range(die):
            result = DieFacing(random.randint(1,len(DieFacing)))
            logging.debug(f'Roll result: {result}')
            die_results.append(result)
        return die_results
    
    def roll_and_select(self, die: int, strategy):
        roll_results = self.roll_dice(die)
        kept_results = strategy(roll_results)
        return kept_results, roll_results
    
    def run_turn(self, strategy):
        kept_rolls = []
        for i in range(self.max_rolls):
            remaining_die = self.die_per_turn - len(kept_rolls)
            logging.debug(f'Dice roll {i+1}/{self.max_rolls}. Remaining die: {remaining_die}')
            current_kept_rolls, all_rolls = self.roll_and_select(die=remaining_die, strategy=self.player_strategy)
            logging.debug(f'all_rolls: {all_rolls}')
            logging.debug(f'current_kept_rolls: {current_kept_rolls}')
            # If kept is equal to die count, no more to roll. Else, add to kept and reroll with fewer.
            if self.die_per_turn == kept_rolls:
                logging.debug(f'Player has kept all their rolls. Continuing turn.')
                break
            elif i >= self.max_rolls - 1:
                logging.debug(f'We are now on the {i+1}th roll, keeping remaining die.')
                kept_rolls += all_rolls
            else:
                kept_rolls += current_kept_rolls
            logging.debug(f'Kept rolls now up to: {kept_rolls}')
    
    def run(self):
        # Step: decide first player.
        # Step: first player die roll 1
        # Step: first player stick or roll2
        # Step: first player stick or roll3
        # Step: first player sticks
        # Step: second player die roll 1
        # Step: second player stick or roll2
        # Step: second player stick or roll3
        # Step: second player sticks
        # Step: resolution
        coinflip = flip_coin()
        logging.debug(f'Coinflip: {coinflip}')
        if coinflip:
            logging.debug('Player goes first.')
            player_turn = self.run_turn(strategy=self.player_strategy)
            logging.debug(f'Now opponent takes their turn.\n')
            opponent_turn = self.run_turn(strategy=self.opponent_strategy)
            # TODO Resolution, passing game state to strategy.
        else:
            logging.debug('Opponent goes first.')
            opponent_turn = self.run_turn(strategy=self.opponent_strategy)
            logging.debug(f'Now player takes their turn.\n')
            player_turn = self.run_turn(strategy=self.player_strategy)

In [47]:
# Test the game.
game = OrlogGame()

In [48]:
game.run()

DEBUG:root:Coinflip: 1
DEBUG:root:Player goes first.
DEBUG:root:Dice roll 1/3. Remaining die: 1
DEBUG:root:Roll result: DieFacing.SWORD
DEBUG:root:all_rolls: [<DieFacing.SWORD: 1>]
DEBUG:root:current_kept_rolls: [<DieFacing.SWORD: 1>]
DEBUG:root:Kept rolls now up to: [<DieFacing.SWORD: 1>]
DEBUG:root:Dice roll 2/3. Remaining die: 0
DEBUG:root:all_rolls: []
DEBUG:root:current_kept_rolls: []
DEBUG:root:Kept rolls now up to: [<DieFacing.SWORD: 1>]
DEBUG:root:Dice roll 3/3. Remaining die: 0
DEBUG:root:all_rolls: []
DEBUG:root:current_kept_rolls: []
DEBUG:root:We are now on the 3th roll, keeping remaining die.
DEBUG:root:Kept rolls now up to: [<DieFacing.SWORD: 1>]
DEBUG:root:Now opponent takes their turn.


DEBUG:root:Dice roll 1/3. Remaining die: 1
DEBUG:root:Roll result: DieFacing.ARROW
DEBUG:root:all_rolls: [<DieFacing.ARROW: 3>]
DEBUG:root:current_kept_rolls: [<DieFacing.ARROW: 3>]
DEBUG:root:Kept rolls now up to: [<DieFacing.ARROW: 3>]
DEBUG:root:Dice roll 2/3. Remaining die: 0
DEBUG:

In [None]:
plt.clf()
plt.hist([ for x in range(1_000)], bins=100)
plt.show()