# What is SiPS?

SiPS, short for Situated Projective Simulation, is a model of decision making that integrates insights of Bayesian Filtering, Active Inference, and Projective Simulation to understand how cognitive systems learn to navigate in initially unknown environments. We use the word "Situated" to describe this agent in reference to the Ecological Pyschology of James Gibson. Accordingly, an assumption of our model is that it is fundamentally impossible for an embodied agent to possess a complete representation of the world in which it navigates. The information a embodied agent receives is a direct function of its position in the world, and necessarily incomplete. We call SiPS a model of an "Agent" because the information it recieves may have an influence on its actions, which in turn have differential effects on the agent's position in the world. Though a embodied agent may be able to act in optimal-enough ways using only immediate sensory information and an intrinsic set of priors about the world (given, for example, by a programmer or inherited via a system of descent with modification under selective pressure), a key question relevant to both the study of animal behavior and machine learning is how embodied agents acquire and store information such that it can discover and exploit new relations in the world, and thus become able to represent its "situation" in space and time.

Bayesian filtering is a normative method for updating one's beliefs about the state of the world (or one's situation within it) given some degree of uncertainty and new, partial evidence regarding the state in question. Active Inference is a normative method for choosing actions that minimimize one's uncertainty by maximizing the expected evidence, given an agent's uncertainty about its state and the outcomes of its actions. Both of these approaches, however, depend on the agent "knowing what it doesn't know". The agent must know, for example, the probability of error in its sensory information and whether its current sensory information could be evidence of multiple different situated states. How can agents construct an error model of their world that supports Bayesing filtering and Active Inference if they do not come pre-equipped with such "inductive biases"? 

Most methods to-date use offline learning methods such as gradient descent to update the agent's "world model" such that it becomes better able to predict the sequence of sensory information and actions states acquired over the last "online" period. These approaches are effective, but they fail to account for the moments of "online" insight known to any reader of this document and observed in many animals.

Projective Simulation is a model of decision making in which an agent selects its action by replaying "clips" of past situational sequences. The probability of the agent repeating the action taken in that clip is a function of the clip's "emotional valence". If the agent doesn't take the action, it replays another clip. An essential feature of Projective Simulation is that the selection of replayed clips is stochastic, modeled as a random walk on a graph where the nodes represent replayable clips. This stochasticity has two important outcomes. First, it allows the agent to "assign credit" to the deliberative pathway that produced an action and reinforce the traversed edges as a function of outcomes. Thus, Projective Simulation is a reinforcement learning model in which associative connections in a world model may be reinforced instead of, or in addition to, direct stimulus-response connections. And, therefore, learning from one situation may be generalized to another. The second important outcome of the stochastic deliberative process is that it may generate novel combinations of clips, which may themselves become represented as "imagined" or "hypothetical" nodes in the aforementioned graph. Based on these features, this graph - which provides a Projective Simulation agent with a world model - is called the Episodic and Compositional Memory (ECM).

The processes of Bayesian Filtering and Active Inferance may also be represented using a graph structure. SiPS uses a single graph structure that can accomodate all three processes and has a parameter which we call "focus" that scales processes on this graph from operating purely as a system of Active Inference with Bayesian filtering to a system of Projective Simulation. In this work, we explore when, why, and how an agent might switch between these two modes, or adopt an intermediate "focus" state, to achieve insightful behavior in novel environments without relying on offline and computationally intensive learning methods.

# How does SiPS work?

## Overview

Following from the principles of Active Inference, the ECM of SiPS is organized hierarchically such that information passed from higher levels to lower levels represent predictions, and information passed from lower levels to higher levels represent prediction errors. We refer to nodes in the lowest level of the ECM as "perceptual representations", which are further divided into "sensory representations" and "action representations". Higher levels of the ECM interact with the ECM's environment only through the perceptual representations, i.e. the perceptual representations form a Markov Blanket around the rest of the ECM (the ECM's enviroment may be equivalent to the agent's environment, or it may include other elements of the agent's 'body'). Higher level nodes in the ECM represent "memory traces" of past situational states; at any given time, the agent's perceptual state is encoded to a single trace by establishing connections between that trace and excited perceptual representations. Over time, the strength of these connections may change as functions of forgetting and reinforcement, and new connections may be formed by the deliberative process. A key goal of the SiPS model is to understand how these three processes can be configured such that connections to a memory trace either decay to obselesence or crytalize into a representation of a learned relation in the world.

Figure 1 provides an overview of the SiPS ECM.

<figure>
<img src="..\figs\SiPS_structure.png" style="width:100%">
<figcaption align = "center"> Figure 1: SiPS Structure Overview. SiPS combines a reflex system with episodic memory structure. The reflex system is defined as a matrix, <b>W</b>, with edges that connect each unique observation to each action in the agent's repertoire. The episodic memory structure can be represented as a graph with three type of nodes: sensory representations, action representations, and memory traces. Sensory representations and action representations are jointly refered to as percept nodes. Percept nodes are connected to memory traces by bidirectional edges called Hebbian weights, given by a matrix <b>H</b>. Memory Traces have recurrent edges with weights given by the matrix <b>M</b>. </figcaption>
</figure>

## States and Variables

Each node in a SiPS graph is associated with multiple states. To denote states associated with sensory representations we will use the letter 's'; to denote states associted with action representations we will use the letter 'a'; to denote the states of percept nodes more generally we will use the letter 'p' (observation); and to denote states associted with memory traces we will use the letter 'b' (beliefs).

There are three types of state associated with each node, which we distinguish notationally by using diacritical marks. Here, we describe each type of state breifly. Subsequent sections will explain and demonstrate the dynmaics of these states and their interactions in the ECM in much greater detail.

The <b>excitation state</b> of a node reflects the transmission of new sensory information through the graph, and is denoted by a plain letter with no diacritical mark. For example, an excitation state of all sensory representations is denoted $\boldsymbol{s}$ and an excitation state of a single sensory representation by $s_\mathrm{j}$. $\boldsymbol{S}$ denotes the J dimensional random variable associated with sensory representation states, where J is the number of sensory representation nodes. Subscripts of this variable, $S_\mathrm{j}$, indicate random variables associated with the excitation state of a specific sensory representation.

The <b>expectation state</b> of a node reflects the ECM's prediction regarding its next situated state and is denoted using the diacritical ^. For example, $\hat{b}_\mathrm{n}$ denotes the expectation state of memory trace n, which reflects the strength of the agent's belief that its next situated state will be well represented by the situated state encoded by memory trace n.

The <b>attention</b> state of a node reflects how strongly that node is currently being considered by the agent's deliberative process. Importantly, while all other random variables in a SiPS ECM evolve on a discrete timescale $t$, the attention state evolves at a smaller timescale $\tau$. The evolution of the attention states across the ECM over an interval of $t$ defines the stochastic process by which the ECM computes the expectation states of memory traces, i.e. predictions about its next situated state.

Memory Traces have a fourth associated state: their <b>valence state</b>. The valence of a memory trace reflects how suprised the ECM was by the excitation state of sensory represenetations following the situated state encoded by that trace. The valence of a memory trace determines how much an action by the agent is inhibited or primed by an increase of attention on the given memory trace.

## Creating a SiPS agent

Let's begin an exploration of how SiPS agents work by using the "Projective Simulation" package to create a simple agent.

In [1]:
from projective_simulation.agents import Situated_Agent
import nbdev
import numpy as np

In [2]:
example_SiPS = Situated_Agent(num_actions = 2, memory_capacity = 100)
example_SiPS

<projective_simulation.agents.Situated_Agent at 0x1fa8f468860>

There are many parameters that can be used to customize a SiPS agent, but at a minimum the number of actions and memory traces available to the agent must be defined. For now, we will rely on default values for all other parameters. As with all projective simulation agents, a SiPS agent can return an action if given an observation.

In [3]:
test_observations = [1,0,5,1,1]
test_actions = [None] * len(test_observations)
for t in range(len(test_observations)):
    test_actions[t] = example_SiPS.get_action(test_observations[t])
print(test_actions)


[1, 1, 0, 0, 1]


The get_action function of SiPS agent has three steps: triggering a reflex, processing the percept, and setting expectations. Each of these steps is handled by a different object in the Situated_Agent class: the reflex_ECM, preprocessor, and episodic_ECM, respectively.

Let's cover the reflexes firsts

## Reflexes

The first thing a SiPS agent does when it recieves a new observation is it selects an action using its reflexes. Note that while we use an independant ECM class type (Priming_ECM) to define the SiPS reflexes, conceptually these reflexes are a *component* of a SiPS ECM. The Priming_ECM is a subclass of a Two_Layer ECM, and we can take a look at the documnetation of these classes to see how they work. We refer the reader to (LINK TO BASIC PS TUTORIAL) for discussion of the role glow and damp parameters in reinforcement learning on a two-layer PS.

In [4]:
from projective_simulation.ECMs import Two_Layer, Priming_ECM
nbdev.show_doc(Two_Layer)

---

[source](https://github.com/{user}/projective_simulation/blob/master/projective_simulation/ECMs.py#LNone){target="_blank" style="float:right; font-size:smaller"}

### Two_Layer

>      Two_Layer (num_actions:int, glow:float, damp:float, softmax:float)

*A minimal ECM, every agent should be Derived from this class. Primarily serves to enforce that all ECMs have the "ECM" class

Examples:
>>> pass*

|    | **Type** | **Details** |
| -- | -------- | ----------- |
| num_actions | int | The number of available actions. |
| glow | float | The glow (or eta) parameter. |
| damp | float | The damping (or gamma) parameter. |
| softmax | float | The softmax (or beta) parameter. |

In [5]:
nbdev.show_doc(Priming_ECM)

---

[source](https://github.com/{user}/projective_simulation/blob/master/projective_simulation/ECMs.py#LNone){target="_blank" style="float:right; font-size:smaller"}

### Priming_ECM

>      Priming_ECM (num_actions:int, glow:float=0.1, damp:float=0.01,
>                   softmax:float=0.5, action_primes:list=None)

*This sub-class of the Two-Layer ECM adds a variable for action priming.
This variable should be a list of floats, each element of which corresponds to an action in the ECM.
These "priming values" are summed with h-values of any edge connected to the associated action node prior to calculating walk probabilites with the softmax function*

|    | **Type** | **Default** | **Details** |
| -- | -------- | ----------- | ----------- |
| num_actions | int |  | The number of available actions. |
| glow | float | 0.1 | The glow (or eta) parameter. |
| damp | float | 0.01 | The damping (or gamma) parameter. |
| softmax | float | 0.5 | The softmax (or beta) parameter. |
| action_primes | list | None | weights on the probability that deliberation steps into each action. Defaults to 0 for each action |

The addition of action priming is important for SiPS, and is given by the expectation state of action representations. A positive action expectation increases the probability that the priming ECM returns the corresponding action, regardless of the input percept. Likewise, a negative value reduces the probability of returning that action. Take a look.

In [6]:
reflexes = Priming_ECM(num_actions = 2, action_primes = [0,2])
example_SiPS = Situated_Agent(reflex_ECM = reflexes, memory_capacity = 100)
# Notice that we define a reflex ECM for the Situated Agent instead of num_actions.
# If a reflex ECM is predefined, the SiPS agent uses that instead of creating a reflex ECM using the num_actions, glow, damp, and reflex_softmax variables

test_observations = [0] * 10 + [1] * 10
test_actions = [None] * len(test_observations)
for t in range(len(test_observations)):
    test_actions[t] = example_SiPS.get_action(test_observations[t])
print(test_actions)

[1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1]


Specifically, the probability that the reflex ECM returns action $k$ given a particular observation $o^{(i)}$ is given by the equation

\begin{equation}
Pr(\boldsymbol{A} = \boldsymbol{a}^{(\mathrm{k})}|\boldsymbol{O} = \boldsymbol{o}^{(\mathrm{i})}) = \frac{\mathrm{e}^{\beta_\mathrm{reflex}(W_\mathrm{ik} + \hat{A}_\mathrm{k})}}{\sum_\mathrm{k'=1}^\mathrm{K}{\mathrm{e}^{\beta_\mathrm{reflex}(W_\mathrm{ik'} + \hat{A}_\mathrm{k'})}}}
\end{equation}

Where the Action Space of the agent, $\mathcal{A}^\mathrm{K}$, is defined such that for a given number of actions K, a particular action state $\boldsymbol{a}^{(k)}$ is equivalent to the one-hot excitation of action representations where $a_\mathrm{k} = 1$; $W_{ik}$ is the reflex weight from observation state $\boldsymbol{o}^{(\mathrm{i})}$ to action state $\boldsymbol{a}^{(\mathrm{k})}$; and where $\beta_\mathrm{reflex}$ is the reflex_softmax paramter.

## The Episodic Graph

The Priming_ECM does not change the priming of actions on its own - expectation values must come from another system such as an episodic graph. We can create an episodic graph like so



In [7]:
from projective_simulation.ECMs import Episodic_Memory
em_example = Episodic_Memory(num_actions = 2, capacity = 50)
em_example

<projective_simulation.ECMs.Episodic_Memory at 0x1fa8f46a480>

In SiPS, the episodic graph has three node types, action reprensentations, sensory representations, and memory traces. Sensory representations and action representations (together called percept nodes) are connected to memory traces by edge weights given by the H-matrix. Because the newly intitialized episodic graph has not recieved any observations, it has no sensory representations. It is required, however, to predefine an action space for and episodice graph, and thus the H-matrix is current KxN (num_actions x capacity)

In [8]:
np.shape(em_example.hmatrix)

(2, 50)

### Representing the Percept

One goal of a SiPS agent is to learn about relations between individual elements or combinations of elements in observations. Thus, the episodic graph creates a sensory representation for each new value observed in *each dimension* of an observation vector. The SiPS agent has a "Percept Processor" that keeps track of which value and dimension of an observation each sensory representation is connected to. It can thus convert any observation to a sensory excitation state. The percept of a epsidic graph is defined as the concatenation of the sensory representation exitation state and the action representation excitation state. Let's see what happens when we give the preprocessor of a new SiPS agent a two dimensional observation, and an action triggered by that observation.

In [9]:
example_SiPS = Situated_Agent(num_actions = 2, memory_capacity = 10)
example_SiPS.percept_processor.percept_dict

{'actions': {'0': 0, '1': 1}}

Initially, the agent percept deictionary only maps actions '0' and '1' to action representations 0 and 1, respectively.

In [10]:
example_SiPS = Situated_Agent(num_actions = 2, memory_capacity = 10)
observation = np.array([0,4])
action = 1
example_SiPS.percept_processor.get_percept(observation, action)

array([0., 1., 1., 1.])

But after receiving a two-dimensional observation, the percept that the percept processor returns gives the excitation states of four percept nodes. We can see that the two additional nodes correspond to two sensory representations that have been added to the percept dictionary.

In [11]:
example_SiPS.percept_processor.percept_dict

{'actions': {'0': 0, '1': 1}, '0': {'0': 2}, '1': {'4': 3}}

Note the nested structure of this dictionary. Each value of the percept dictionary is also a dictionary, which keys the sensory representations for a given dimension of the observation (or the action).

If the dimensionality of the agent's observations, the percept processor handles this flexibly.

In [12]:
example_SiPS.percept_processor.get_percept(observation = np.array([1,4,0]), action = 0)

array([1., 0., 0., 1., 1., 1.])

In [13]:
example_SiPS.percept_processor.percept_dict

{'actions': {'0': 0, '1': 1},
 '0': {'0': 2, '1': 4},
 '1': {'4': 3},
 '2': {'0': 5}}

### Deliberation

Once a SiPS agent has chosen and action and reprsented its percept, it begins to deliberate. This is the process by which the agent assess its situated state and predicts what will happen next. In the process, it encodes relevant information about its current state to memory and potentially reorganizes existing connection in its episodice memory. We will go through the steps of deliberation one-by-one.

#### Adding new sensory representations

If the percept passed to the episodic graph by the percept processor is larger than the current number of percept nodes, new nodes are added by adding elements to all of the relevent state variables. See help(Episodic_Memory.add_percept) for details.

#### Calculating Suprise

Surprise is primary mechanism by which SiPS learns. It is used both the ascribe an emotional valence to memory traces and to reinforce or punish the weights of edges (more on this later). The surprise of each sensory representation is computed by treating its excitation as a sample from a binomial distribution with probability equal to that representations expectation state, i.e.

$$
I(s_\mathrm{j}(t)) = 
\left\{
\begin{array}{ll}
  -\mathrm{log}(\hat{s}_\mathrm{j}(t-1)) & \text{if } s_\mathrm{j}(t) = 1  \\
  -\mathrm{log}(1-\hat{s}_\mathrm{j}(t-1) & \text{if } s_\mathrm{j}(t) = 0,
\end{array}
\right.
$$

where $s_\mathrm{j}(t)$ is the current excitation state of sensory representation $\mathrm{j}$ and $\hat{s}_\mathrm{j}(t-1)$ is the expectation state of sensory representation $\mathrm{j}$ set in the last time step. The agent's total surprise is simply the sum of surprises across all sensory representations.

#### Exciting Memory Traces

The next step of delibration is to pass excitation information from percept nodes to memory traces. 