# Generative Model Demo: Constructing a simple transition model
NOTE: THIS IS AN UNFINISHED DEMO (WIP).

This demo notebook provides a walk-through of how to build a simple B matrix (or transition likelihood mapping) that encodes an aegnt's beliefs about how hidden states and actions at timestep t 'cause' or probabilistically relate to hidden states at timestep t+1.

### Imports

First, import `pymdp` and the modules we'll need.

In [1]:
import os
import sys
import pathlib

import numpy as np
import itertools
import pandas as pd
from pandas import ExcelWriter


path = pathlib.Path(os.getcwd())
module_path = str(path.parent) + '/'
sys.path.append(module_path)

import pymdp.utils as utils
from pymdp.utils import create_B_matrix_stubs, get_model_dimensions_from_labels

## The world (as represented by the agent's generative model)

### Hidden states

We assume the agent's "represents" (this should make you think: generative _model_ , not _process_ ) its environment using two latent variables that are statistically independent of one another - we can thus represent them using two _hidden state factors._

We refer to these two hidden state factors are `LOCATION` and `BANDIT_STATE`. 

#### 1. `LOCATION`
The first factor is a ternary variable representing the location / playing-state of the agent (which arm of the multi-armed bandit or 'slot machine' am I playing?)

#### 2. `BANDIT_STATE`

The second factor is a binary variable representing whether the bandit is a "high reward" or "low reward" bandit.

### Control states

We assume the agent's "represents" (this should make you think: generative _model_ , not _process_ ) its ability to intervene upont the world using two latent variables that interface with the two respective hidden state factors detailed above.

We refer to these two _control_ state factors are `ARM_CONTROL` and `BANDIT_STATE_CONTROL`. 

#### 1. `ARM_CONTROL`
The first control factor is a binary variable representing the agent's action, of either playing arm 1 or arm 2 of the 2-armed bandit ( `play_arm1` vs. `play_arm2` )

#### 2. `BANDIT_STATE_CONTROL`

The second control factor is a trivially 1-dimensional variable (`null`) representing the agent's action of doing nothing to the state of the bandit - this encodes the agent's belief that it cannot control the overall reward state (high reward or low reward) of the two-armed bandit.



In [2]:

model_labels = {
            "observations": {
                "reward outcome": [
                    "win",
                    "loss"            
                    ]
            },
            "states": {
                "location": ["start", "arm1", "arm2"],
                "bandit_state": ["high_rew", "low_rew"]
            },
            "actions": {
                "arm_control": ["play_arm1", "play_arm2"],
                "bandit_state_control": ["null"]
            }
        }

B_stubs = create_B_matrix_stubs(model_labels)

### Option 1. After writing the list of empty B matrix stubs to a multi-sheet excel file, fill it out separately (e.g. manually in excel) and then read it back into memory. Remember, these represent the agent's generative model, not the true probabilities that relate states to observations. So you can think of these as the agent's personal/subjective 'assumptions' about how hidden states transition among one another, and how actions condition/influence those transitions.

In [3]:
xls_dir = 'tmp_dir'
if not os.path.exists(xls_dir):
    os.mkdir(xls_dir)

xls_fpath = os.path.join(xls_dir, 'my_b_matrices.xlsx')

with ExcelWriter(xls_fpath) as writer:
    for factor_name, B_stub_f in B_stubs.items():
        B_stub_f.to_excel(writer,'%s' % factor_name)

In [4]:
B_stubs = utils.read_B_matrices(xls_fpath)

### Option 2. Fill out the B matrix using the desired probabilities. Remember, these represent the agent's generative model, not the true probabilities that relate states and actions to next-states. So you can think of these as the agent's personal/subjective 'assumptions' about how the world works

In [5]:
# B_stubs['location'].loc['arm1',('start', 'play_arm1')] = 1.0
# B_stubs['location'].loc['arm1',('arm1', 'play_arm1')] = 1.0
# B_stubs['location'].loc['arm1',('arm2', 'play_arm1')] = 1.0

# B_stubs['location'].loc['arm2',('start', 'play_arm2')] = 1.0
# B_stubs['location'].loc['arm2',('arm1', 'play_arm2')] = 1.0
# B_stubs['location'].loc['arm2',('arm2', 'play_arm2')] = 1.0

# B_stubs['bandit_state'].loc['high_rew', ('high_rew', 'null')] = 1.0
# B_stubs['bandit_state'].loc['low_rew', ('low_rew', 'null')] = 1.0

### Now we can use a utility function `convert_stub_to_ndarray` to convert the human-readable A matrix into the multi-dimensional tensor form needed by `pymdp` to achieve things like inference and action selection

In [6]:
B = utils.convert_B_stubs_to_ndarray(B_stubs, model_labels)

AssertionError: B matrix not normalized! Check your initialization....
