# Playing Tic-Tac-Toe with palaestrAI

Tic-Tac-Toe is a little game that is played on a 3x3 field. The player take turns in placing either an “X” or an “O” in one of the fields. The first player to place three symbols in a row—either horizontally, vertically, or diagonally—wins. For example, in the following pictures, the “X” player wins:

     X | O | X      X |   | O      O | X | 
    ---+---+---    ---+---+---    ---+---+---
     X |   | O        | X | O      X | X | X
    ---+---+---    ---+---+---    ---+---+---
     X | O | O        |   | X      O | O |

Tic-Tac-Toe can easily be played to a draw. Or, as the supercomputer *WOPR* in *War Games* puts it: “This is a strange game. The only winning move is not to play.”

Regardless of this, we will play Tic-Tac-Toe with palaestrAI and hARL: Let's see whether our agent learns something useful! In this tutorial, we will put together three packages:

1. *palaestrAI* itself, our runtime
2. *palaestrai-environments*, which provides the `TicTacToeEnvironment` for us
3. *hARL*, which contains the Deep Q Learning agent that we will use as player.
    
This tutorial tries to offer a glimpse into a full-stack experimentation, showing typical tasks that a researcher will perform. The only thing we leave out for now is a dedicated design of experiments with arsenAI—we will safe that for another tutorial. Thus, our agenda is as follows:

1. We will formulate an *objective* for our learning agents
2. we will then create an experiment run file that ties everything together
3. afterwards, we feed everything to palaestrAI and let it execute the experiment run
4. finally, we run some analyses on our data.

## Objective

Our first task is to provide an objective for the agent. An objective is a class that implements a method with the signature: 

    internal_reward(self, memory: palaestrai.agent.Memory) -> float
    
I.e., it gets a reference to the agent's brain's memory and returns a single, floating-point number. Of course, anything can implement this, but it is easy to simply subclass `palaestrai.agent.Objective` and follow this class's API documentation.

Environments deliver rewards, which are stored in the brain's memory, accessible through the `memory.rewards` property. In this case, the environment's reward is already useful enough for us. The environment defines it as follows:

  * If the player wins, the reward is +10
  * if the opponent (a bot provided by the environment) wins, the reward is -10
  * on draw, the reward is 0
  * on invalid moves, the environment emits a reward of -1000
  * else, a reward of 1 is emitted.
  
This is simple enough for us, so we can simply pass the values as-is. The name of the reward is `Tic-Tac-Toe-Reward`. Since the brain's memory offers a simple interface based on pandas DataFrames, we can just retrieve the last row with `memory.tail()`, access the rewards, and retrieve the value for `Tic-Tac-Toe-Reward`.

In [None]:
import palaestrai.agent

In [None]:
class TicTacToeObjective(palaestrai.agent.Objective):
    def internal_reward(self, memory: palaestrai.agent.Memory) -> float:
        return float(memory.tail().rewards["Tic-Tac-Toe-Reward"].iloc[0])

## Experiment Run File

We already know that an experiment run file pierces together agent, environment, and objective, plus some hyperparameter configuration. 

In [None]:
import palaestrai

If the above line works without error, at least the import is done. Thats a first good starter. Now let's import the rest…

In [None]:
import os
import pprint
import tempfile
from pathlib import Path

## Setting the Runtime Configuration

Usually, this is not necessary like this. But because we're running a self-contained test, we want to have a new, fresh database where we want to put it. Which is, in a temporary directory.

In [None]:
store_dir = tempfile.TemporaryDirectory()
store_dir

We're now going to change palaestrAI's runtime configuration to point to the new directory.

In [None]:
from palaestrai.core import RuntimeConfig

runtime_config = RuntimeConfig()
runtime_config.reset()
runtime_config.load({"store_uri": "sqlite:///%s/palaestrai.db" % store_dir.name})
pprint.pprint(runtime_config.to_dict())

## Create the Database

Next, we create the database at the given URI. It will complain that we're not using TimescaleDB, which is okay.

In [None]:
from palaestrai.store.database_util import setup_database
setup_database(runtime_config.store_uri)

In [None]:
assert Path("%s/palaestrai.db" % store_dir.name).is_file()

## Run Experiment

In this part, we load our dummy experiment and run it.

In [None]:
experiment_file_path = Path().absolute() / '..' / 'fixtures' / 'tictactoe_run.yml'
assert experiment_file_path.is_file()

In [None]:
rc = palaestrai.execute(str(experiment_file_path))
assert rc[1].value == 4
rc

If you see something like `('Yo-ho, a dummy experiment run for me!', <ExecutorState.EXITED: 4>)` as output of the previous line, then congratulations, everything went well! Now onwards to the final step…

## Verify Data from the Store

Now that an experiment has been run, there should be something in the store. Note that we don't check whether something *meaningful* is in the store, only that there is *something* in the store.

In [None]:
from sqlalchemy import select

import palaestrai.store
import palaestrai.store.database_model as dbm

In [None]:
dbh = palaestrai.store.Session()

In [None]:
q = select(dbm.Experiment)
str(q)

In [None]:
experiment = dbh.execute(q).first()[dbm.Experiment]
experiment.name

In [None]:
our_experiment_run = experiment.experiment_runs[0]
str(our_experiment_run)

The `document` property of the experiment run object should contain the YAML file, and we should be able to de-searialize it. Let's check that.

In [None]:
assert our_experiment_run.document

In [None]:
our_experiment_run._document_json

In [None]:
our_experiment_run.experiment_run_instances[0].uid

… and so on. For a system test, this is enough.