# Analysis Example with the palaestrAI Store

This notebook is both a tutorial and a system test. It runs an example experiment, which contains two run phases as well as two agents. After running the experiment, we will get all data out of the store and plot some nice graphs. Since the agents we use for brains perform only random actions, there won't be much to focus on content-wise. However, the tutorial shall serve as a pointer on how to use the palaestrAI core infrastructure (and not hARL, etc.)

## Imports

Let's start by importing necessary modules. This will be what we need for palaestrAI, namely the entrypoint, the runtime config, and the database access stuff:

In [None]:
import palaestrai  # Will provide palaestrai.exectue
import palaestrai.core  # RuntimeConfig
import palaestrai.store  # store.Session for database connectivity
import palaestrai.store.database_util
import palaestrai.store.query as palq
import palaestrai.store.database_model as paldb

The typical data science analysis toolstack uses *pandas* and *matplotlib*, so let's import those, too. 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

*jsonpickle* we will need to inspect the reward information objects later on. Here, we also need to use the jsonpickle extension for numpy:

In [None]:
import jsonpickle
import jsonpickle.ext.numpy as jsonpickle_numpy

jsonpickle_numpy.register_handlers()

There are also some of the usual suspects from Python's standard library, which we'll import here without further comment:

In [None]:
import io
import os
import pprint
import tempfile
from pathlib import Path

## Experiment Run Document

Everything palaestrAI does depends on its configuration, or rather, *experiments*. When you do real design of experiments, you first create an *experiment* document, in which you define strategies for sampling your factors. Each sample is an *experiment run*, which will be executed by palaestrAI. We won't do the full DoE dance here, but rather provide an experiment run document directly.

Experiments and experiment runs have **unique names** (`uid`). When they're not given, they are auto-generated, but usually the user wants to set them in order to find them in the store later on. Choosing a good name might seem hard (it isn't, any string will do); being force to choose a *unique* names might seem an unecessary constraint. However, it isn't: Each experiment run must be repeatable, i.e., always have the same result, no matter how often it is run. A change in an experiment run definition can yield different results. Therefore, each experiment run is unique—and thus should be its name, too. We will define the experiment run name as a separate variable so that we don't have to remember it later on when we query the store:

In [None]:
experiment_run_name = "Tutorial Experiment Run"

Experiment (run) documents also have a **version**. It serves as a discriminator to catch semantic changes in the document. It is an additional safeguard and emits a log message, but not a stopgap. 

For this tutorial, we set the document's version to palaestrAI's version. That is okay here since we need to keep this documented up-to-date in any case. When experiment runs are archived, the version number (and its immutability!) become more important.

In [None]:
experiment_run_version = "3.4.1"

And now to the document itself. Apart from the `uid`, the `version`, and the random seed (`seed`), it provides the configuration of the experiment run. Experiment runs have *phases*, so the most important key here is the experiment `schedule`.

a **schedule** defines the phases of an experiment run. A phase is comprised of environments, agents, simulation paramaters such as the termination condition, as well as general configuration flags. Schedule configurations are cascading: Values defined in the previous phase are applied to following phases, too, unless they are explicitly overwritten.

In our example, we have three phases in our schedule. The first phase trains only one agent, the second trains two in the same environment, and finally, there is a third phase as testing stage.

(*Please note* that we're using an f-string here, and hence the YAML dict `{}` becomes `{{}}`.)

In [None]:
experiment_run_document = f"""
uid: "{experiment_run_name}"
seed: 47  # Not quite Star Trek, but...
version: "{experiment_run_version}"
schedule:  # The schedule for this run; it is a list
  - phase_0:
      environments:  # Definition of the environments for this phase
        - environment:
            name: palaestrai.environment.dummy_environment:DummyEnvironment
            uid: denv
            params: {{ }}
      agents:  # Definiton of agents for this phase
        - name: mighty_defender
          brain:
            name: palaestrai.agent.dummy_brain:DummyBrain
            params: {{ }}
          muscle:
            name: palaestrai.agent.dummy_muscle:DummyMuscle
            params: {{ }}
          objective:
            name: palaestrai.agent.dummy_objective:DummyObjective
            params: {{"params": 1}}
          sensors: [denv.0, denv.1, denv.2, denv.3, denv.4]
          actuators: [denv.0, denv.1, denv.2, denv.3, denv.4]
      simulation:  # Definition of the simulation controller for this phase
        name: palaestrai.simulation:VanillaSimulationController
        conditions:
        - name: palaestrai.simulation:VanillaSimControllerTerminationCondition
          params: {{ }}
      phase_config:  # Additional config for this phase
        mode: train
        worker: 1
        episodes: 5 
  - phase_1:  # Name of the current phase. Can be any user-chosen name
      agents:  # Definiton of agents for this phase
        - name: mighty_defender
          brain:
            name: palaestrai.agent.dummy_brain:DummyBrain
            params: {{ }}
          muscle:
            name: palaestrai.agent.dummy_muscle:DummyMuscle
            params: {{ }}
          objective:
            name: palaestrai.agent.dummy_objective:DummyObjective
            params: {{"params": 1}}
          sensors: [denv.0, denv.1, denv.2, denv.3, denv.4]
          actuators: [denv.0, denv.1, denv.2, denv.3, denv.4]
        - name: evil_attacker
          brain:
            name: palaestrai.agent.dummy_brain:DummyBrain
            params: {{ }}
          muscle:
            name: palaestrai.agent.dummy_muscle:DummyMuscle
            params: {{ }}
          objective:
            name: palaestrai.agent.dummy_objective:DummyObjective
            params: {{"params": 1}}
          sensors: [denv.5, denv.6, denv.7, denv.8, denv.9]
          actuators: [denv.5, denv.6, denv.7, denv.8, denv.9]
      simulation:  # Definition of the simulation controller for this phase
        name: palaestrai.simulation:VanillaSimulationController
        conditions:
        - name: palaestrai.simulation:VanillaSimControllerTerminationCondition
          params: {{ }}
      phase_config:  # Additional config for this phase
        mode: train
        worker: 1
        episodes: 2
  - phase_2:  # Definition of the second phase. Keeps every information
              # from the first except for those keys that are redefined
              # here.
      phase_config:  
        mode: test
        episodes: 3
run_config:  # Not a runTIME config
  condition:       
    name: palaestrai.experiment:VanillaRunGovernorTerminationCondition
    params: {{ }}
"""

## Runtime Config

With the experiment run neatly defined, there is something else that defines how palaestrAI behaves: Its runtime config. It has nothing to do with an experiment run, but defines the behavior of palaestrAI on a certain machine. This includes log levels or the URI defining how to connect to the database. Usually, one does not touch it once the framework is installed.

In this case, since we're doing a little tutorial *and* a system test case, we provide some sane defaults that are only relevant for the scope of this notebook. For example, we'll resort to using SQLite in a temporary directory instead of PostgreSQL + TimescaleDB (speed is not of importance here), and we set the log level to `DEBUG` for the store.

In [None]:
store_dir = tempfile.TemporaryDirectory()
store_dir

In [None]:
runtime_config = palaestrai.core.RuntimeConfig()
runtime_config.reset()
runtime_config.load(
    {
        "store_uri": "sqlite:///%s/palaestrai.db" % store_dir.name,
        "executor_bus_port": 4747,
        "logger_port": 4748,
    }
)
runtime_config.logging["loggers"]["palaestrai.store"]["level"] = "DEBUG"
pprint.pprint(runtime_config.to_dict())

The nice thing about the `RuntimeConfig` is that it is a singleton available everywhere in the framework. So whatever we set here pertains throughout the run.

## Database Initialization

Since we've opted to start fresh with a new SQLite database in a temporary directory, we will have to create and initialize it. Usually, one does this once (e.g., from the CLI with `palaestrai database-create`) and is then done with it, but in this case we do it every time we run the notebook—it is a one-shot tutorial, after all. :-)

Luckily, palaestrAI has just the function we need to do it for us:

In [None]:
palaestrai.store.database_util.setup_database(runtime_config.store_uri)

You will see a warning regarding the TimescaleDB extension. That is okay and just a warning. Since we're not running a big, sophisticated experiment, we can live with a bit of a performance penality.

## Experiment Run Execution

Next up: Actually executing the experiment run! It just consists of one line: A call to `palaestrai.execute()`. This method can cope with three types of parameters:

1. An `ExperimentRun` object. Nice in cases one has already loaded it (e.g., de-serialized it).
2. A `str`. `palaestrAI.execute()` interprets this as a path to a file—one of the most common use cases.
3. A `TextIO` object: Any stream that delivers text. Useful when the experiment run document is not yet deserialized, and exactly what we need.

To turn a `str` into a `TextIO`, we simply wrap it into a `StringIO` object. Make it so!

In [None]:
rc = palaestrai.execute(io.StringIO(experiment_run_document))

The execution should yield no errors (and no warnings, too).

In [None]:
assert rc[1].name == "EXITED"

## Quering the Store

Let's get a custom session to the database first:

In [None]:
dbh = palaestrai.store.Session()

palaestrAI has no special database access features, only nice object-relational mapper (ORM) bindings provided by SQLAlchemy. Which means that we can use all the nice magic SQLAlchemy gives us. So let's first import it:

In [None]:
import sqlalchemy as sa

Do you remember the name of our experiment run? We can now use it to look it up. Therefore, we first create a query using `sqlalchemy.select`, which we then execute.

In [None]:
q = sa.select(paldb.ExperimentRun).where(
    paldb.ExperimentRun.uid == experiment_run_name
)
str(q)

palaestrAI ensures through the `uid` that each experiment run is stored only once in the database. `one()` not only retrieves only one element from the query, it also raises an exception if there's no or more than one row in the result set. Thus:

In [None]:
result = dbh.execute(q).one()
experiment_run_record = result[paldb.ExperimentRun]
experiment_run_record.id, experiment_run_record.uid

…yes, that's us. 

No matter how often an experiment run is executed, there will be only one entry for the same UID in the table. But many more instances will exist. Here, since we ran it only once, we will also see only one experiment run instance.

Through the SQLAlechemy ORM, we can access the experiment run instances directly:

In [None]:
experiment_run_record.experiment_run_instances

Would we run execute the run again, we'd see two entries in the list here:

In [None]:
rc = palaestrai.execute(io.StringIO(experiment_run_document))
assert rc[1].name == "EXITED"

In [None]:
dbh.refresh(experiment_run_record)
experiment_run_record.experiment_run_instances

In [None]:
assert len(experiment_run_record.experiment_run_instances) > 1

Now let's focus on the run phases. Each instance will have several of them—three, to be precise. Remember our experiment run document? We have three, so lets find them in the database.

Thanks to palaestrAI's query API, this is very simple:

In [None]:
palq.experiments_and_runs_configurations(dbh)

In [None]:
assert (
    len(
        experiment_run_record.experiment_run_instances[0].experiment_run_phases
    )
    == 3
)

Next up: Who did participate in this run phase? We can define participants for each run phase separately. In our experiment run document, we decided that first one agent may train on its own, then we have two agents train together, and finally a test phase for both. So that is what we want to see now.

However, simply exploring the ORM is not really fun for showing it in a Jupyter notebook. Thankfully, SQLAlchemy and pandas interface nicely: We can construct a query in SQLAlchemy with our ORM and than end it over to pandas to construct a dataframe out of it:

In [None]:
pd.read_sql(
    sa.select(paldb.Agent).where(
        paldb.Agent.experiment_run_phase_id.in_(
            phase.id
            for phase in experiment_run_record.experiment_run_instances[
                0
            ].experiment_run_phases
        )
    ),
    dbh.bind,
)

Okay, now that we have explored many things, let's find out how good our agents were! Let us start by looking at how well the first agent trained when it was alone. Each agent gets a new ID when it enters a new experiment run phase, regardless of whether its the same agent than before or a new one. (The discriminating element is the agent's name.)

We first needt the ID of the first experiment run phase:

In [None]:
run_phase_id = min(
    phase.id
    for phase in experiment_run_record.experiment_run_instances[
        0
    ].experiment_run_phases
)
run_phase_id

Okay, which agent is it?

In [None]:
agent_record = dbh.execute(
    sa.select(paldb.Agent).where(
        paldb.Agent.experiment_run_phase_id == run_phase_id
    )
).one()[paldb.Agent]
assert agent_record.name == "mighty_defender"
agent_record.id, agent_record.name

In [None]:
actions = pd.read_sql(
    sa.select(paldb.MuscleAction).where(
        paldb.MuscleAction.agent_id == agent_record.id
    ),
    dbh.bind,
)
actions

Okay, but how do we get rewards out of this? The `rewards` column contains a list of `RewardInformation` objects. In our case, we know that there will ever be only one (more than one is a special case). We also know that there will always be a float. The knowledge about this comes from our knowledge of the reward, i.e., it is really domain knowledge that an experimenter will have.

At this point, we need to modify the dataframe a bit. We have to call `jsonpickle.loads()` to get the object, and then extract the reward out of it. `DataFrame.apply()` solves us well here. In order to make it more readable, we provide a function for this.

In [None]:
def unpack_reward(x):
    return float(x[0]["py/state"]["value"]) if x else 0.0


actions.rewards = actions.rewards.apply(lambda x: unpack_reward(x))
actions

Plotting is relatively easy now, as pandas already provides us with everything we need.

In [None]:
actions.plot(x="id", y="rewards", kind="scatter")

Okay, but if we want to compare the agents' performance during the testing phase? First we need to find out what agents participated in the last experiment run phase. So let's return to the experiment run phases table:

In [None]:
experiment_run_phases = pd.read_sql(
    sa.select(paldb.Agent)
    .where(
        paldb.Agent.experiment_run_phase_id.in_(
            phase.id
            for phase in experiment_run_record.experiment_run_instances[
                0
            ].experiment_run_phases
        )
    )
    .order_by(paldb.Agent.experiment_run_phase_id.desc()),
    dbh.bind,
)
experiment_run_phases

Okay, the top two rows are the ones we want to look at.

In [None]:
muscle_actions = pd.read_sql(
    sa.select(paldb.Agent, paldb.MuscleAction)
    .join(paldb.Agent.muscle_actions)
    .where(
        paldb.Agent.experiment_run_phase_id.in_(
            experiment_run_phases.experiment_run_phase_id[0:2]
        )
    ),
    dbh.bind,
)
assert len(muscle_actions) > 2
muscle_actions

Let's do the reward conversion dance:

In [None]:
muscle_actions.rewards = muscle_actions.rewards.apply(
    lambda x: unpack_reward(x)
)
muscle_actions

The table contains rewards, alternating, for both agents. You can see that from the `simtime_ticks` entry as well the `name` column. So let's plot them—it's easy now:

In [None]:
defender_actions = muscle_actions[muscle_actions.name == "mighty_defender"][
    ["rewards"]
].rename(columns={"rewards": "defender_rewards"})
attacker_actions = muscle_actions[muscle_actions.name == "evil_attacker"][
    ["rewards"]
].rename(columns={"rewards": "attacker_rewards"})
pd.concat([attacker_actions, defender_actions]).plot()