# What is SiPS?

SiPS, short for Situated Projective Simulation, is a model of decision making that integrates insights of Bayesian Filtering, Active Inference, and Projective Simulation to understand how cognitive systems learn to navigate in initially unknown environments. We use the word "Situated" to describe this agent in reference to the Ecological Pyschology of James Gibson. Accordingly, an assumption of our model is that it is fundamentally impossible for an embodied agent to possess a complete representation of the world in which it navigates. The information an embodied agent receives is a direct function of its position in the world, and necessarily incomplete. We call SiPS a model of an "Agent" because the information it recieves may have an influence on its actions, which in turn have differential effects on the agent's position in the world. Though a embodied agent may be able to act in optimal-enough ways using only immediate sensory information and an intrinsic set of priors about the world (given, for example, by a programmer or inherited via a system of decent with modification under selective pressure), a key question relevant to both the study of animal behavior and machine learning is how embodied agents acquire and store information such that it can discover and exploit new relations in the world, and thus become able to represent its "situation" in space and time.

Bayesian filtering is a normative method for updating one's beliefs about the state of the world (or one's situation within it) given some degree of uncertainty and new, partial evidence regarding the state in question. Active Inference is a normative method for choosing actions that minimimize one's uncertainty by maximizing the expected evidence, given an agent's uncertainty about its state and the outcomes of its actions. Both of these approaches, however, depend on the agent "knowing what it doesn't know". The agent must know, for example, the probability of error in its sensory information and whether its current sensory information could be evidence of multiple different situated states. How can agents construct an error model of their world that supports Bayesing filtering and Active Inference if they do not come pre-equipped with such "inductive biases"? 

Most methods to-date use offline learning methods such as gradient descent to update the agent's "world model" such that it becomes better able to predict the sequence of sensory information and actions states acquired over the last "online" period. These approaches are effective, but they fail to account for the moments of "online" insight known to any reader of this document and observed in many animals.

Projective Simulation is a model of decision making in which an agent selects its action by replaying "clips" of past situational sequences. The probability of the agent repeating the action taken in that clip is a function of the clip's "emotional valence". If the agent doesn't take the action, it replays another clip. An essential feature of Projective Simulation is that the selection of replayed clips is stochastic, modeled as a random walk on a graph where the nodes represent replayable clips. This stochasticity has two important outcomes. First, it allows the agent to "assign credit" to the deliberative pathway that produced an action and reinforce the traversed edges as a function of outcomes. Thus, Projective Simulation is a reinforcement learning model in which associative connections in a world model may be reinforced instead of, or in addition to, direct stimulus-response connections. And, therefore, learning from one situation may be generalized to another. The second important outcome of the stochastic deliberative process is that it may generate novel combinations of clips, which may themselves become represented as "imagined" or "hypothetical" nodes in the aforementioned graph. Based on these features, this graph - which provides a Projective Simulation agent with a world model - is called the Episodic and Compositional Memory (ECM).

The processes of Bayesian Filtering and Active Inference may also be represented using a graph structure. SiPS uses a single graph structure that can accomodate all three processes and has a parameter which we call "focus" that scales processes on this graph from operating purely as a system of Active Inference with Bayesian filtering to a system of Projective Simulation. In this work, we explore when, why, and how an agent might switch between these two modes, or adopt an intermediate "focus" state, to achieve insightful behavior in novel environments without relying on offline and computationally intensive learning methods.

# How does SiPS work?

## Overview

Like any reinforcement learning model, SiPS takes an input (often called its state, or sensory state) and returns an output (an action). SiPS, however, actually acts as two Reinforecement learning agents nested inside of each other; for clarity, we need to make some adjustments to the terminology. We will refer the state of systems that transmit information from the outside world to the internal processes of a SiPS agent as the "observation". We give this observation the random variable $\boldsymbol{O}$ and denote any discrete observation state $\boldsymbol{o^{(\mathrm{i})}}$. The observation state acts as the input for a SiPS agent's reflex system, a simple reinforcement learning system that returns an action. The weights of this system, however, may be tuned by the SiPS agent's Episodic and Compositional Memory (ECM). The ECM is *also* a reinforcement learning system It combines both the agent's observation and its action into an input we call the "percept" and returns ephemeral adjustments to the reflex system's weights which we call "priming". It does this through a process we call deliberation. Thus, a translation from standard RL looks as follows.

![ML_Compare](ML_to_SiPS_translation.png)
<figure>
<figcaption align = "center"> Figure 1: SiPS from a Machine Learning Perspective. </figcaption>
</figure>

The Episodic and Compositional Memory is best understood as a graph. Following from the principles of Active Inference, the ECM of SiPS is organized hierarchically such that information passed from higher levels to lower levels represent predictions, and information passed from lower levels to higher levels represent prediction errors. We refer to nodes in the lowest level of the ECM as "perceptual representations", which are further divided into "sensory representations" and "action representations". Percepts map to a set of excitation states for these representations. Higher levels of the ECM interact with the ECM's environment only through the perceptual representations. Higher level nodes in the ECM represent "memory traces" of past situational states; at any given time, the agent's perceptual state is encoded to a single trace by establishing connections between that trace and excited percept nodes. Over time, the strength of these connections may change as functions of forgetting and reinforcement, and new connections may be formed by the deliberative process. A key goal of the SiPS model is to understand how these three processes can be configured such that connections to a memory trace either decay to obselesence or crytalize into a representation of a learned relation in the world that can be generalized to other, new situations.

Figure 2 provides an overview of the SiPS structure, including both the reflex system and the ECM.

![Structure_Overview](SiPS_structure.png)
<figure>
<figcaption align = "center"> Figure 2: SiPS Structure Overview. SiPS combines a reflex system with episodic memory structure. The reflex system is defined as a matrix, <b>W</b>, with edges that connect each unique observation to each action in the agent's repertoire. The episodic memory structure can be represented as a graph with three type of nodes: sensory representations, action representations, and memory traces. Sensory representations and action representations are jointly refered to as percept nodes. Percept nodes are connected to memory traces by bidirectional edges called Hebbian weights, given by a matrix <b>H</b>. Memory Traces have recurrent edges with weights given by the matrix <b>M</b>. </figcaption>
</figure>

## States and Variables

Each node in a SiPS ECM is associated with multiple states. To denote states associated with sensory representations we will use the letter 's'; to denote states associted with action representations we will use the letter 'a'; to denote the states of percept nodes more generally we will use the letter 'p'; and to denote states associted with memory traces we will use the letter 'b' (beliefs).

There are three types of state associated with each node, which we distinguish notationally by using diacritical marks. Here, we describe each type of state breifly. Subsequent sections will explain and demonstrate the dynmaics of these states and their interactions in the ECM in much greater detail.

The <b>excitation state</b> of a node reflects the transmission of new sensory information through the graph, and is denoted by a plain letter with no diacritical mark. For example, the vector of excitation state for all sensory representations is denoted $\boldsymbol{s}$ and an excitation state of a single sensory representation by $s_\mathrm{j}$. $\boldsymbol{S}$ denotes the J dimensional random variable associated with sensory representation states, where J is the number of sensory representation nodes. Subscripts of this variable, $S_\mathrm{j}$, indicate random variables associated with the excitation state of a specific sensory representation. Likewise, $\boldsymbol{a}$ denostes the vector of excitation states for all action representations, $a_\mathrm{k}$ the excitation state of a single action representation, and $\boldsymbol{A}$ and $A_\mathrm{k}$ the reandom variables for these states. So on and so forth for the excitation of a percept node $p_\mathrm{l}$ and a memory trace $b_\mathrm{n}$. For clarity and convenience, the indices j,k,l, and n will always denote senory representations, action representations, percept nodes, and memory traces, respectively.

The <b>expectation state</b> of a node reflects the ECM's prediction regarding its next situated state and is denoted using the diacritical ^. For example, $\hat{b}_\mathrm{n}$ denotes the expectation state of memory trace n, which reflects the strength of the agent's belief that its next situated state will be well represented by the situated state encoded by memory trace $\mathrm{n}$. As with excitation states, $\hat{\boldsymbol{P}}$ denotes the random variable associated with the vector of percept node expectation states, and so on and so forth.

The <b>activation</b> state of a node reflects how strongly that node is currently being considered by the agent's deliberative process, and is denoted by the diacritical ~ ($\tilde{A}_\mathrm{k}$,$\tilde{\boldsymbol{s}}$, etc. . .). Importantly, while all other random variables in a SiPS ECM evolve on a discrete timescale $t$, the activation state evolves at a smaller timescale $\tau$. The evolution of the activation states across the ECM over an interval of $t$ defines the stochastic process by which the ECM computes the expectation states of memory traces, i.e. predictions about its next situated state. I think of this process as modeling the agent's attention, and will often describe it as such, but use the name "activation state" to avoid confusion with other well known attention mechanisms in ML.

Memory Traces have a fourth associated state: their <b>valence state</b> given by $v_\mathrm{n}$. The valence of a memory trace reflects how suprised the ECM was by the excitation state of sensory represenetations following the situated state encoded by that trace. The valence of a memory trace determines how much an associated action is inhibited or primed by an increase of the given memory trace's activation.

## Creating a SiPS agent

Let's begin an exploration of how SiPS agents work by using the "Projective Simulation" package to create a simple agent.

In [None]:
from projective_simulation.agents import Situated_Agent
from projective_simulation.environments import Delayed_Response
import nbdev
import numpy as np

In [None]:
example_SiPS = Situated_Agent(num_actions = 2, memory_capacity = 100)
example_SiPS

TypeError: Situated_Agent.__init__() got an unexpected keyword argument 'memory_capacity'

There are many parameters that can be used to customize a SiPS agent, but at a minimum the number of actions and memory traces available to the agent must be defined. For now, we will rely on default values for all other parameters. As with all projective simulation agents, a SiPS agent can return an action if given an observation.

In [None]:
test_observations = [1,0,5,1,1]
test_actions = [None] * len(test_observations)
for t in range(len(test_observations)):
    test_actions[t] = example_SiPS.get_action(test_observations[t])
print(test_actions)


The get_action function of SiPS agent has three steps: triggering a reflex, processing the percept, and setting expectations. Each of these steps is handled by a different object in the Situated_Agent class: the reflex_ECM, preprocessor, and episodic_ECM, respectively.

Let's cover the reflexes firsts

## Reflexes

The first thing a SiPS agent does when it recieves a new observation is it selects an action using its reflexes. Note that while we use an independant ECM class type (Priming_ECM) to define the SiPS reflexes, conceptually these reflexes are a *component* of a SiPS ECM. The Priming_ECM is a subclass of a Two_Layer ECM, and we can take a look at the documentation of these classes to see how they work. We refer the reader to (LINK TO BASIC PS TUTORIAL) for discussion of the role glow and damp parameters in reinforcement learning on a two-layer PS.

In [None]:
from projective_simulation.ECMs import Two_Layer, Priming_ECM
nbdev.show_doc(Two_Layer)

In [None]:
nbdev.show_doc(Priming_ECM)

The addition of action priming is important for SiPS, and is given by the expectation state of action representations. A positive action expectation increases the probability that the priming ECM returns the corresponding action for all input percept. Likewise, a negative value reduces the probability of returning that action. Take a look.

In [None]:
reflexes = Priming_ECM(num_actions = 2, action_primes = [0,4])
example_SiPS = Situated_Agent(reflex_ECM = reflexes, memory_capacity = 100)
# Notice that we define a reflex ECM for the Situated Agent instead of num_actions.
# If a reflex ECM is predefined, the SiPS agent uses that instead of creating a reflex ECM using the num_actions, glow, damp, and reflex_softmax variables

test_observations = [0] * 10 + [1] * 10 #observe 0 ten times, then observe 1 ten times
test_actions = [None] * len(test_observations)
for t in range(len(test_observations)):
    test_actions[t] = example_SiPS.get_action(test_observations[t])
print(test_actions)

Specifically, the probability that the reflex ECM returns action $k$ given a particular observation $o^{(i)}$ is given by the equation

\begin{equation}
Pr(\boldsymbol{A} = \boldsymbol{a}^{(\mathrm{k})}|\boldsymbol{O} = \boldsymbol{o}^{(\mathrm{i})}) = \frac{\mathrm{e}^{\beta_\mathrm{reflex}(W_\mathrm{ik} + \hat{A}_\mathrm{k})}}{\sum_\mathrm{k'=1}^\mathrm{K}{\mathrm{e}^{\beta_\mathrm{reflex}(W_\mathrm{ik'} + \hat{A}_\mathrm{k'})}}}
\end{equation}

Where the Action Space of the agent, $\mathcal{A}^\mathrm{K}$, is defined such that for a given number of actions K, a particular action state $\boldsymbol{a}^{(k)}$ is equivalent to the one-hot excitation of action representations where $a_\mathrm{k} = 1$; $W_{ik}$ is the reflex weight from observation state $\boldsymbol{o}^{(\mathrm{i})}$ to action state $\boldsymbol{a}^{(\mathrm{k})}$; and where $\beta_\mathrm{reflex}$ is the reflex_softmax paramter.

## The Episodic Graph

The Priming_ECM does not change the priming of actions on its own - expectation values must come from another system such as an episodic graph. We can create an episodic graph as follows.



In [None]:
from projective_simulation.ECMs import Episodic_Memory
em_example = Episodic_Memory(num_actions = 2, capacity = 50)
em_example

In SiPS, the episodic graph has three node types, action reprensentations, sensory representations, and memory traces. Sensory representations and action representations (together called percept nodes) are connected to memory traces by edge weights given by the H-matrix. Because the newly intitialized episodic graph has not recieved any observations, it has no sensory representations. It is required, however, to predefine an action space for an episodic graph, and thus the H-matrix is currently KxN (num_actions by capacity)

In [None]:
np.shape(em_example.hmatrix)

### Representing the Percept

One goal of a SiPS agent is to learn about relations between individual elements or combinations of elements in observations. Thus, the episodic graph creates a sensory representation for each new value observed in *each dimension* of an observation vector. The SiPS agent has a "Percept Processor" that keeps track of which value and dimension of an observation each sensory representation is connected to. It can thus convert any observation to a sensory excitation state. The percept of the ECM is defined as the concatenation of the sensory representation exitation state and the action representation excitation state. Let's see what happens when we give the preprocessor of a new SiPS agent a two dimensional observation, and an action triggered by that observation.

Initially, the agent's percept dictionary only maps actions '0' and '1' to action representations 0 and 1, respectively.

In [None]:
example_SiPS = Situated_Agent(num_actions = 2, memory_capacity = 10)
example_SiPS.percept_processor.percept_dict

But after receiving a two-dimensional observation, the percept processor returns the excitation states of four percept nodes. We can see that the two additional nodes correspond to two sensory representations that have been added to the percept dictionary.

In [None]:
example_SiPS = Situated_Agent(num_actions = 2, memory_capacity = 10)
observation = np.array([0,4])
action = 1
example_SiPS.percept_processor.get_percept(observation, action)

In [None]:
example_SiPS.percept_processor.percept_dict

Note the nested structure of this dictionary. Each value of the percept dictionary is also a dictionary, which keys the sensory representations for a given dimension of the observation (or the action).

If the dimensionality of the agent's observations changes, the percept processor handles this flexibly.

In [None]:
example_SiPS.percept_processor.get_percept(observation = np.array([1,4,0]), action = 0)

In [None]:
example_SiPS.percept_processor.percept_dict

### Pre-Deliberation

Once a SiPS agent has chosen and action and represented its percept, it begins to deliberate. This is the process by which the agent assesses its situated state and predicts what will happen next. In the process, it encodes relevant information about its current state to memory and potentially reorganizes existing connection in its episodic memory. We will go through the steps of deliberation one-by-one.

#### Adding New Sensory Representations

If the percept passed to the episodic graph by the percept processor is larger than the current number of percept nodes, indicating a new sensation, new nodes are constructed by adding elements to all of the relevent state variables. See help(Episodic_Memory.add_percept) for details.



In [None]:
example_SiPS = Situated_Agent(num_actions = 2, memory_capacity = 10) # start a new example
observation = np.array([0,2,1])
action = example_SiPS.reflex_ECM.deliberate(str(observation))
percept = example_SiPS.percept_processor.get_percept(observation, action)
example_SiPS.ECM.add_percept(percept) #creates new elements of expectation, h-matrix, trace_encoder, and action_encoder to account for new observation
print(np.shape(example_SiPS.ECM.hmatrix)) #hmatrix has three new rows (in addition to for actions) because all three dimensions of the observation have new values

#### Calculating Suprise

Surprise is the primary mechanism by which SiPS learns. It is used both the ascribe an emotional valence to memory traces and to reinforce or punish the weights of edges (more on this later). The surprise of each sensory representation is computed by treating its excitation as a Bernoulli trial with probability equal to that representation's expectation state, i.e.

\begin{equation}
\mathrm{I}(S_\mathrm{j}^{(t)}) = 
\left\{
\begin{array}{ll}
  -\mathrm{log}(\hat{S}_\mathrm{j}^{(t-1)}) & \text{if } S_\mathrm{j}^{(t)} = 1  \\
  -\mathrm{log}(1-\hat{S}_\mathrm{j}^{(t-1)}) & \text{if } S_\mathrm{j}^{(t)} = 0,
\end{array}
\right.
\end{equation}

where $s_\mathrm{j}(t)$ is the current excitation state of sensory representation $\mathrm{j}$ and $\hat{s}_\mathrm{j}(t-1)$ is the expectation state of sensory representation $\mathrm{j}$ set in the last time step. The agent's total surprise is simply the sum of surprises across all sensory representations.

To demonstate, let's compare the suprise if our example agent has an expectation state that strongly predicted the current percept versus the surprise when the expectation state includes weaker or wrong predictions.

In [None]:
example_SiPS.ECM.expectations = np.array([0.99 if percept[l] == 1. else 0.01 for l in range(len(percept))]) #set expectations that strongly predict percept
surprise1 = example_SiPS.ECM.get_surprise(percept)
example_SiPS.ECM.expectations = np.array([0.01 if percept[l] == 1. else 0.99 for l in range(len(percept))]) #set expectations that wrongly predict percept
surprise2 = example_SiPS.ECM.get_surprise(percept)
example_SiPS.ECM.expectations = np.array([0.5 for l in range(len(percept))]) #set uncertain expectations
surprise3 = example_SiPS.ECM.get_surprise(percept)

print(['correct = ' + str(surprise1), 'wrong = ' + str(surprise2), 'uncertain = ' + str(surprise3)])

Note that the expectation of action representations does not affect the total surprise

In [None]:
example_SiPS.ECM.expectations[0:2] = [0.99, 0.99] #Becuase there are two actions, the first and second elements of the expectations variable correspond to those two action representations
example_SiPS.ECM.surprise = example_SiPS.ECM.get_surprise(percept)
print(example_SiPS.ECM.surprise)

#### Exciting Memory Traces

The next step of delibration is to pass excitation information from percept nodes to memory traces. The excitation of a memory trace is equal to the likelihood of the current percept if the excitation of each percept node to which that trace is connected is treated like a Bernoulli trial with probability equal to a logistic transformation of the Hebbian edge weight between the two nodes. Formally,

\begin{equation}
B^{(t)}_\mathrm{n} = \prod_{i \in \boldsymbol{h_n}^*} f_\kappa(H^{(t)}_\mathrm{ni})^{S^{(t)}_\mathrm{i}}(1-f_\kappa(H^{(t)}_\mathrm{ni}))^{1-S^{(t)}_\mathrm{i}},
\end{equation}

where $h_\mathrm{n}^*$ is the set of indices denoting which percept nodes are connected to memory trace $\mathrm{n}$ at time $t$ and $f_\kappa(h)$ is the standard logistic function with scalable rate parameter $\kappa$.

Because our example agent has not yet encoded any memory traces, there are no edges connecting memory traces to percept nodes; thus memory traces will not be excited by an agent's first percept. In the next section, we introduce memory trace encoding and then return to their excitation.

#### Encoding a New Memory Trace

Once all nodes have been excited, a new trace is encoded as a function of those exitations. Three things happen during trace encoding.

First, edges are established to all excited percept nodes. I use the trace_encoder matrix to encode which connections have been estalished and the hmatrix to encode the strength of connections. Right now, when an edge is established, its strength is set from 0 to 1 and will revert to 0 as a function of the ECM's damping parameter. Allowing a scalable initial strength is also an important next step for implementation.

Second, a temporal edge is established between the current memory trace and the last memory trace encoded. In the current version of SiPS, an internal variable $t$ tracks temporal increments and during encoding the element of the memory matrix $M_{t-1,t}$ is set to 1. Additionally, $M_{t,t+1}$ is set to 0, which breaks a previously established connection in the event that $\mathrm{T}$ is greater than the agent's memory capacity and memory traces are re-used. An important direction for future work will be to consider other trace selection procedures (e.g. select from traces with lowest excitation) and allowing multiple temporal connections to be established from individual memory traces.

Third, the valence of the last encoded memory trace is set to the new total suprise of the ECM.

If we look at our example agent prior to encoding, we will see that it has no connections in the trace_encoder and all weights in the hmatrix are 0. After encoding, new connections and weights have been established

In [None]:
print(example_SiPS.ECM.trace_encoder)
print(example_SiPS.ECM.hmatrix)
example_SiPS.ECM.encode_trace(percept)
print(example_SiPS.ECM.trace_encoder)
print(example_SiPS.ECM.hmatrix)

Likewise, the last trace has a new valence and the memory matrix has a new temporal connection that points to this trace. Because it is the first time step, however, this trace has no connected edges in the h-matrix and is of no consequence. The valence is high because the agent sets a low expectation on sensory representations by defualt and we did not set new expectations, neither manually nor by completing the process of deliberation.

In [None]:
print(example_SiPS.ECM.mmatrix)
print(example_SiPS.ECM.valences)

To understand how trace excitation works, we can set the ECM's internal increment tracker forward and encode a new observation.

In [None]:
example_SiPS.ECM.t += 1
observation = np.array([0,5,5]) #the second and third elements of this observation are new, the first element is the same as the first observation
#~repeat previously described steps
action = example_SiPS.reflex_ECM.deliberate(str(observation))
percept = example_SiPS.percept_processor.get_percept(observation, action)
example_SiPS.ECM.add_percept(percept)
example_SiPS.ECM.excite_traces(percept)
#~
#Encode new trace
example_SiPS.ECM.encode_trace(percept)

print(example_SiPS.ECM.hmatrix)
print(example_SiPS.ECM.trace_excitations)

Using the default values for $\kappa$ (3) and a newly established h-value (1), $f_\kappa(1) \approx 0.731$. It can then be seen that the excitation of memory trace 0 is $\approx 0.731^n(1 - 0.731)^{k-n}$, where k is the number of connections from percept nodes to memory trace 0 and n is the number of connected percept nodes that are excited (in this example, $\mathrm{k} = 4$ and $n$ is either 1 or 2, depending whether or not the agent took the same action in both steps). The SiPS agent's kappa parameter adjusts the scaling parameter of the standard logistic function, such that higher values of kappa means the agent will assign a higher likelihood to the excitation of a percept node given that the connection weight remains constant. If we set kappa very high, for example, the agent will treat the failure to excite even two sensory representations connected to memory trace 0 as extremely unlikely if the current world state were representated by that trace, and thus the excitation of memory trace 0 given our new percept will be much lower. 

In [None]:
example_SiPS.ECM.kappa = 5
example_SiPS.ECM.excite_traces(percept)
print(example_SiPS.ECM.trace_excitations)
example_SiPS.ECM.kappa = 1 #reset for continued use in vignette
example_SiPS.ECM.trace_excitations[1] = 0 #reset for continued use in vignette

Because we have now encoded the second trace, it also becomes excited. Naturally, the excitation of a trace that encodes the current percept will be maximally close to 1 for the given kappa - it is only connected to excited percept nodes! This is one reason why traces are encoded AFTER they are excited, so as not to bias the agent's attention toward the obvious fact that its current world state looks identical to its current world state)

### Projection

Having excited the ECM as a function of the Percept, we now initialize the agent's attention, beginning the actual deliberative process. I've used "activation" to denote the agent's attention because "attention" is a different (but closely related) and well known mechanim in ML, but conceptually it remains useful to the think of the ECMs activation state and reflecting the weight of the agent's attention to the concept reprented by a node, whether it is a sensation, an action, or a memory. The general idea of Delibration is that the agent begins with all of its attention on the just-encoded memory trace; then, the attention diffuses across the ECM as if it were composed of particles performing a random walk on the ECM-graph, thus "projecting" its attention to other possible situated states. Currently, activation states are equivelent to the expected proportion of particles on the given node, given the current length of deliberation. This is calculated iteratively. Thus, it would be reasonably straightforward to implement the stochastic random walk of some number of discrete particles.

There is also a stochastic element even in the current implementation. This element reflects "Projective Simulation" on the graph, and can be scaled by the agent's focus paramter, $\theta$. Projective Simulation selects a random edge leading from each node of the ECM, with probability equal to that of an independant particle diffusing from the node, and increases the actual probability that particles diffuse along this edge. The probability is increased such that a proportion of particles (equal to the focus parameter) in that node that would be expected to diffuse along other edges diffuse along the selected edge instead. Thus, when focus is equal to 1, the entirety of the agent's attention mass walks along the graph as a single unit, and the process is functionally equivalent to the system of Projective Simulation laid out in Briegel and De La Cuevas 2012.

Formally, we define the probability that an edge $l^*$ selected by Projective Simulation connects memory trace $n$ to percept node $l$ as

\begin{equation}
\mathrm{Pr}_\beta(l^*=l|H^{(t)},n) = \frac{\mathrm{e}^{\beta H^{(t)}_{l\mathrm{n}}}}{\sum_{l'\in h^*_\mathrm{n}}{\mathrm{e}^{\beta H^{(t)}_{l'\mathrm{n}}}}}.
\end{equation}

where $\beta$ is the episodic graph's softmax paramter and $h^*_\mathrm{n}$ is the set of indices for percept nodes connected to memory trace $\mathrm{n}$.

Following from this definition and our assumption that $Pr_\beta(l^* = l)$ is equivalent to the probability that an unbiased particle in memory trace $\mathrm{n}$ diffuses along connection $l$, the activation state of a percept node $l$ during deliberative step $\tau > 0$ at time $t$ is thus given as

\begin{equation}
\tilde{P}^{(t,\tau)}_l = \sum_{\mathrm{n}=1}^\mathrm{N} \tilde{B}^{(t,\tau-1)}_\mathrm{n} (\mathrm{Pr}_\beta(l^* = l|H^{(t)},\mathrm{n}) + \theta *
\left\{
\begin{array}{ll}
  (1-\mathrm{Pr}_\beta(l^* = l|H^{(t)},\mathrm{n})) & \text{if } l^* = l  \\
  -\mathrm{Pr}_\beta(l^* = l|H^{(t)},\mathrm{n}) & \text{otherwise }
\end{array}
\right\}),
\end{equation}

where $\tilde{B}^{(t,\tau-1)}_\mathrm{n}$ is the activation state of memory trace $\mathrm{n}$ during the previous deliberation step and $\theta$ is the agent's focus parameter.

#### Initial activation

Deliberation begins at step $\tau = 0$ by setting the activation of the just-encoded memory trace to 1 and the activation of all other nodes to 0.

Because the just-encoded memory trace is only connected to excited percept nodes and with equal weights, the SiPS agent doesn't bother to run the first step of deliberation as normal, and simply computes the expected proportion of attention that would diffuse to percept nodes from the just-excited trace and initially activates those percept nodes accordingly. Thus, if the focus parameter is zero (the default value) the intitial activation of excited percept nodes will be 1 divided by the number of excited percept nodes.

In [None]:
example_SiPS.ECM.activate()
print("excitations: " + str(percept))
print("activations: " + str(example_SiPS.ECM.percept_activations))

Adjusting the focus parameter of the agent will shift a proportional amount of this activation to a randomly selected, excited percept node

In [None]:
example_SiPS.ECM.focus = 0.5
example_SiPS.ECM.activate()
print("excitations: " + str(percept))
print("focus = 0.5 activations1: " + str(example_SiPS.ECM.percept_activations))
example_SiPS.ECM.activate()
print("focus = 0.5 activations2: " + str(example_SiPS.ECM.percept_activations))
example_SiPS.ECM.focus = 1
example_SiPS.ECM.activate()
print("focus = 1 activations1: " + str(example_SiPS.ECM.percept_activations))
example_SiPS.ECM.activate()
print("focus = 1 activations2: " + str(example_SiPS.ECM.percept_activations))

#### Continuing Projection

Once the activation state of a SiPS agent has been thus initiated, we can simply propogate the activation state forward across increments of $\tau$ - except we have not yet defined how the activation state moves from percept nodes to memory traces! This definition is very similar to that for the diffusion of activation from memory traces to percept nodes. However, in this case the probability of diffusing along an edge is also influenced by both the expectation state and excitation state of the memory trace to which that edge is connected. Here is how to think of this: the probability that a 'particle' of attention moves from a representation of a sensation or action in the agents repretoire to a particular memory trace is proportional to the each of the following: the associative strength between the given perceptual representation and the memory trace, the agent's prior belief that the memory trace will effectively represent its new world state, and the strength of immediate sensory evidence that the memory trace effectively represents the agent's new world state.

We have previously defined the expectation state of a memory trace as the agent's prior belief that the given trace will effectively represent its current world state and the excitation state of a memory trace as the probability that the memory trace effectively represents the current world state given new sensory evidence. With this in mind, we define the probability that projective simulation from percept representation $l$ selects memory trace $\mathrm{n}$ as equivalent to the probability of a particle of attention diffusing from $l$ to $\mathrm{n}$, given as

\begin{equation}
\mathrm{Pr}_\beta(\mathrm{n}^*=\mathrm{n}|H^{(t)},l) = \frac{\mathrm{e}^{\beta H^{(t)}_{l\mathrm{n}}B^{(t)}_\mathrm{n}\hat{B}^{(t-1)}_\mathrm{n}}}{\sum_{\mathrm{n'\in h}^*_l}{\mathrm{e}^{\beta H^{(t)}_{l\mathrm{n'}}B^{(t)}_\mathrm{n'}\hat{B}^{(t-1)}_\mathrm{n'}}}}.
\end{equation}

where $\beta$ is the episodic graph's softmax paramter and $h^*_\mathrm{n}$ is the set of indices for percept nodes connected to memory trace $\mathrm{n}$. (NOTE: Putting the expectation and excitation states of the trace nodes in the exponent significantly dilutes their effect and divorces them from their meaning as probabilities. However, taking them out of the exponent leads to problems when either is 0. I have some thoughts about changing the overall structure such that these terms can be taken out of the exponent without causing problems, leading to better leaning and more interpretability.) We can then define the activation state of a memory trace $\mathrm{n}$ during deliberative step $\tau > 0$ at time $t$ exactly as we did for percept nodes:

\begin{equation}
\tilde{B}^{(t,\tau)}_\mathrm{n} = \sum_{l=1}^{|P^{(t)}|} \tilde{P}^{(t,\tau-1)}_l (\mathrm{Pr}_\beta(\mathrm{n}^* = \mathrm{n}|H^{(t)},l) + \theta *
\left\{
\begin{array}{ll}
  (1-\mathrm{Pr}_\beta(\mathrm{n}^* = \mathrm{n}|H^{(t)},l)) & \text{if } \mathrm{n}^* = \mathrm{n}  \\
  -\mathrm{Pr}_\beta(\mathrm{n}^* = \mathrm{n}|H^{(t)},l) & \text{otherwise }
\end{array}
\right\}),
\end{equation}

where $|P^{(t)}|$ is the number of percept nodes in the episodic graph at time $t$.

Thus, using the equations to update the activation states of both percept nodes and memory traces, the agent's attention can be propogated forward over interal's of $\tau$ for some number of steps $\mathrm{D}$, which we call the deliberation length. We will demonstrate shortly how the activation state of the agent when $\tau = \mathrm{D}$ is used as an updated belief state to make prediction, but first let us provide an example of deliberation. We start with a case where focus is zero, which emulates normative Bayesian filerting and removes the stochastic element of projective simulations

In [None]:
example_SiPS.ECM.focus = 0
example_SiPS.ECM.activate()
print("percept activations = ", str(example_SiPS.ECM.percept_activations))
print("trace activations = ", str(example_SiPS.ECM.trace_activations))

In [None]:
example_SiPS.ECM.hmatrix

In [None]:
example_SiPS.ECM.diffuse_activation()
print("percept activations = ", str(example_SiPS.ECM.percept_activations))
print("trace activations = ", str(example_SiPS.ECM.trace_activations))

You might note that activation in a percept node with connections to both memory traces with an encoding becomes divided evenly between the two traces. This is because the expectation state of both traces is zero (the agent has no information in its first time step with which to make a prediction) and the excitation state of both traces is effectively 0 (the first trace has some negligible excitation because it is connected to two excited percept nodes, but the agent treats the two edges connected to unexcited percept nodes as strong evidence its situation is not well represented by this trace - recall that this is a function of the h-values connecting those nodes and the agent's $\kappa$ parameter). If we manually add excitation or expectation to a trace, we can observe a difference. First with excitation . . .

In [None]:
example_SiPS.ECM.activate()
example_SiPS.ECM.trace_excitations[0:2] =[0.1,0.5]
example_SiPS.ECM.beliefs[0:2] = [0.1,0.1]
example_SiPS.ECM.diffuse_activation()
print("trace activations = ", str(example_SiPS.ECM.trace_activations))

. . . and the same results with trace expectation/belief weights

In [None]:
example_SiPS.ECM.activate()
example_SiPS.ECM.trace_excitations[0:2] = [0.1,0.1]
example_SiPS.ECM.beliefs[0:2] = [0.1,0.5]
example_SiPS.ECM.diffuse_activation()
print("trace activations = ", str(example_SiPS.ECM.trace_activations))

Note that in this example, excitatation and expectation of encoded memory traces were set to non-zero values. Because the edge weight, trace expectation, and trace excitation are multiplied to get transition probabilities, a zero for any of these values means the other two become disregarded. The edge weight will never be zero (it is in the exponent) and the excitation of a previously encoded trace will also never be zero (it is a likelihood derived from events with probabilites given by the logistic function). As the focus parameter approaches 1, however, expectation states on some encoded traces may diminish to zero and *can* be zero if the focus is one or attention is treated as discrete units instead of a proportion. In other words, an agent with a high focus paramter strongly weights it expectation toward a single situtated state (memory trace) and will always direct its attention to that trace when possible, regardless of the sensory evidence. We will come eventually to why this is an important feature of the SiPS agent. For now, must first finish describing the diliberative process

### Simulation and Prediction

Once the agent has deliberated for $\mathrm{D}$ steps, it "simulates" possible futures based on its updated belief about its situation. First, it predicts its next situation by passing the activation of its memory traces forward along the memory matrix,

\begin{equation}
\hat{\boldsymbol{B}}^{(t)} = \tilde{\boldsymbol{B}}^{(t,\mathrm{D})} \boldsymbol{M}^{(t)}
\end{equation}

With the structure of the Memory matrix in the current implementation, this means that the new expectation state of an encoded memory trace $\mathrm{n}$ is simply set to the final activation state of memory trace $\mathrm{n} - 1$.

The new expectation states of memory traces are then passed to the expectation state of percept nodes as a function of the associative connections in the h-matrix, 

\begin{equation}
\hat{S}^{(t)}_\mathrm{j} = f_\epsilon(\alpha_j, \sum_{\mathrm{n} \in \mathrm{h^*_j}} \hat{B}^{(t)}_\mathrm{n}f_\kappa(H^{(t)}_\mathrm{jn}))
\end{equation}

where $\mathrm{h^*_j}$ is the set of indices for memory traces that are connected to sensory representation $\mathrm{j}$, $\alpha_\mathrm{j}$ is the intrinsic expectation of sensory representation $\mathrm{j}$, and $f_\epsilon(\cdot)$ is an exponentiation with shift by a small constant $\epsilon$, which SiPS sets to $10^-2$. This function takes the form

\begin{equation}
f_\epsilon(\alpha,x) = \frac{(x + \epsilon)^{\mathrm{e}^\alpha}}{(1 + \epsilon)^{\mathrm{e}^\alpha}}.
\end{equation}

Why use this transformation? Recall that $f_\kappa(H^{(t)}_\mathrm{jn})$ can be understood as the probability with which the agent believes perceptual representation $\mathrm{j}$ will be excited, given that memory trace $n$ is an effective representation of its situated state, and that $\hat{B}^{(t)}_\mathrm{n}$ can be understood as the probabilty with which the agent believes its next situated state will be well represented by memory trace $\mathrm{n}$. Summing the product of these terms over all memory traces gives the probablity with which the agent believes sensory representation $\mathrm{j}$ will be excited in its next situated state. We want an ECM's positive *intrinsic expectation* for a particular sensation to increase the expectation state of that sensation's representation even if the agent believes there is a zero percent chance that it will be excited. Likewise, a negative intrinsic expectation should decrease the expectation state of a representation, except if it is already 0, and no intrinsic expectation should leave the expectation state unchanged. However, we don't want the agent's intrinsic expectation to cause a sensory representation's expectation state to leave the space between 0 and 1. The shifted exponential achieves this (there is techinically a very small increase when $\alpha = 0$, I'm open to suggestions), while also ensuring the expecatation state of a sensory representation can never be zero, which result in infinite surprise.

Setting the expectation states for action representations is less straightforward. A key goal of SiPS is to model the way animals make fast-and-frugal decisions in real time to enable continuous action. As such, a SiPS agent does not stop to deliberate between receiving new sensory information and acting; it acts reflexively and uses concurrent cognitive processing (deliberation) tp *prime* or *inhibit* certain actions based on its expected situation. Thus, the expecation state of an action representation should carry information about the *value* of that action given the expected situation(s). In accordance with the Free Energy Principle, value for a SiPS agent is measured by surprise, or more precisely: the *reduction of expected surprise*. The valence state of a memory trace denotes how surprised the agent was by the sensory evidence that followed the situated state that the trace represents. Here, it hopefully becomes apparent why it is important that the SiPS agent includes its action when encoding a situated state in a memory trace - if it expects to be in a situation that is well represented by that trace, it can prime or inhibit associated actions as a function of the trace's valence. The world is generally surprising however (surprise is positive-real by definition), and the agent needs some threshold to determine when to prime and when to inhibit actions. For now, the SiPS agent simply compares the valence of a memory trace against the average valence of all memory traces. The log quotient of these two values is called the surprise advantage (NOTE: is "advantage" only the difference?); it is negative when the the valence of a trace is greater than the average (events after the trace were more suprising than average) leading to inhibition and positive when the valence is lower than average.


$$
\hat{A}^{(t)}_\mathrm{k} = \sum_{\mathrm{n} \in \mathrm{h^*_k}} \hat{B}^{(t)}_\mathrm{n}H^{(t)}_\mathrm{kn}log(\frac{\bar{\boldsymbol{V}}^{(t)}}{v^{(t)}_n}).
$$

A SiPS agent updates all three of these expectation variables when it calls the predict function:

In [None]:
print("prior trace expectation (set manually): " + str(example_SiPS.ECM.beliefs))
print("activation state: "+ str(np.round(example_SiPS.ECM.trace_activations, decimals = 3)))
example_SiPS.ECM.predict()
print("new trace expectation: " + str(example_SiPS.ECM.beliefs))
print("new sensory expectation :" + str([np.round(example_SiPS.ECM.expectations[i], decimals = 6) for i in range(len(example_SiPS.ECM.expectations)) if not example_SiPS.ECM.action_encoder[i]]))
print("new action priming :" + str([np.round(example_SiPS.ECM.expectations[i], decimals = 6) for i in range(len(example_SiPS.ECM.expectations)) if example_SiPS.ECM.action_encoder[i]]))

There are are few things to note here. 

First, the activation of the just-encoded memory trace goes nowhere because that trace has no forward connections in the memory matrix. One can think of this like the agent failing to make a prediction because it has no relevant previous experience. We artificially inflated the activation of this trace for the sake of our example, but because the just-encoded trace will normally have no expectation or excitation weight, it should draw very little of the agent's attention.

Second, the just-encoded memory trace does not yet have a valence and therefor will not prime actions (the surprise advantage term is set to zero).

Third, the effect is very small. This is related to my note about putting expectation and excitation states of memory traces in the exponent of the softmax function. For now, it can be addressed by giving the ECM high softmax value, but I have some more elegant solutions in mind.



In [None]:
example_SiPS.ECM.valences

Let's create a new agent and run it for three steps to see how action priming works. We will look at the state of the trace activations after projection but before it makes predictions in the fourth step. 

In [None]:
#initialize
example_SiPS2 = Situated_Agent(num_actions = 2, memory_capacity = 10, PS_softmax = 5) # start a new example
#step 1
observation = np.array([0])
action = 0
percept = example_SiPS2.percept_processor.get_percept(observation, action)
example_SiPS2.ECM.deliberate(percept)
#step 2. Different percept from step 1
observation = np.array([1])
action = 0
percept = example_SiPS2.percept_processor.get_percept(observation, action)
example_SiPS2.ECM.deliberate(percept)
#step 3. Differnt percept from steps 1 and 2
observation = np.array([0])
action = 1
percept = example_SiPS2.percept_processor.get_percept(observation, action)
example_SiPS2.ECM.deliberate(percept)
#step 4 (test). Same percept as step two - we should see expectation on trace three and associated percept nodes
observation = np.array([1])
action = 0
percept = example_SiPS2.percept_processor.get_percept(observation, action)
example_SiPS2.ECM.add_percept(percept)
example_SiPS2.ECM.surprise = example_SiPS2.ECM.get_surprise(percept)
example_SiPS2.ECM.excite_traces(percept)
example_SiPS2.ECM.encode_trace(percept)
example_SiPS2.ECM.activate()
for deliberation_step in range(example_SiPS2.ECM.deliberation_length):
    example_SiPS2.ECM.diffuse_activation()
    
print(example_SiPS2.ECM.trace_activations)

Before prediction, we can see that deliberation has moved the majority of the agent's attention to the second memory trace.

In [None]:
example_SiPS2.ECM.predict()
print("trace expectations: " + str(example_SiPS2.ECM.beliefs))
print("percept expectations: " + str(example_SiPS2.ECM.expectations))

Indeed, we can see that the new expectation of the third memory and the sensory representation with which it is associated (the third percept node) are higher than other nodes of the same type. Based on the priming of the actions (first two percept nodes), we can see that action zero (fisrts pecept node - associated with the third memory trace) has been inhibited. From this, we can infer that the valence assigned to memory trace three is slightly larger than average. Action 1 (second percept node), on the other hand, has been primed, despite the only trace with which it is associated having lower expectation. We can infer, then, that this trace has a much lower valence than other traces. Let's double check.

In [None]:
example_SiPS2.ECM.valences

Why do the traces have these valences? Look back at the order of percepts the agent received and let's consider again how prediction and suprise works. After the agent takes its first action, it has no expereince with which to predict the outcome. It thus has low expectations overall and is necessarily surprised by what it obsereves. After the agent takes its second action it has only one experience to consider; in the first time step it took the same action as its current action, then observed a one - so it adds expectation to observing a 1. But it observes a zero, so the outcome of its second action is even more surprising then its first. Finally, in its third time step, the agent takes action 1 for the first time (so it can't use this information to make predictions) but its first step has the same sensory state as its current one, and that was followed by a 1. The agent's deliberation thus adds expectation to sensory representation 1, and when the agent observes a 1 it is much less surprised then it ever has been before. Hence, the low valence and the priming of the action that it took (1) when that trace gains expectation.

### How Does SiPS Learn?

Using only the properties described so far, SiPS agents *should* learn in certain environments with out the need for projcetive simulation or reinforcement. Specifically, a SiPS agent with a focus parameter of zero and no reinforcement of either reflexes or hebbian associations should learn to maximize the excitation of sensory reprsentations with intrinsic expectations if the state of the agent's sensors as it interacts with the environment can be described by a Markov Decision Process. Such a SiPS agent should still learn if its interactions with the environment can be described as a partially observable Markov Decision process, so long as reaching an expected sensory state does not require taking correct actions in *sequences* of aliased states.

However, the issues described in notes regarding how attention is affected by the excitation and expectation states of memory traces prove fatal to effective learning.

I have set up an environment for testing, below.

The Delayed Response environment progresses in terms of trials. At the start of each trial, the agent is presented with a stimulus, which subsequently disappears. The agent must wait some number of steps before taking a correct action associated with the stimulus that was presented at the beginning of the trial. If it takes that action, it is presented with a "reward" before the next trial begins. If it chooses an action associted with a different stimulus, the next trial begins immediately without a "reward" being presented. We put reward in quotes here because this reward is not used to directly reinforce weights internal to the agent; it is simply an environmental state that is passed to the agent as part of its observation and for which it is assumed the SiPS agent has an intrinsic expectation. We first consider the case where the agent is presented with one of two possible stimuli, and must wait 1 step (no more than 10) before choosing an action (taking an action before the delay period does not cause the trial to end)

In [None]:
#initialize agent and environment
test_env = Delayed_Response(W = 1, N = 2, max_trial_length=2)
# The delayed response environment give a two-dimensional observation. 
# The first dimension gives the current stimulus (1-N on first step of trial, 0 thereafter)
# The second dimension gives the reward availability (1 means the reward has been attained)
# Thus, we need to give the agent an intrinsic expectation for "1" in the first index of its observation, established using the nested dictionary structure of the agents percept library
test_SiPS = Situated_Agent(episode_ECM = Episodic_Memory(num_actions = 3, capacity = 100, intrinsic_expectations = {"1": {"1":2}}, softmax = 10, kappa = 4), 
                           reflex_softmax = 10,
                           num_actions = 3) #agent will need three actions, one to wait and one for each test stimulus in the environment


#set up experiment monitoring and data collection variables
trial_num = 0
reward_attained = False
total_trials = 50
action = None
trial_data = np.empty(shape = (total_trials,4)) #each trial, will collect the presented stimulus, the selected action, the trial length, and wether the reward was attained



In [None]:
while trial_num < total_trials:
    
    if test_env.state["trial_time"] == 0: #check if new trial
        trial_data[trial_num,0] = test_env.state["current_stimulus"] #record which stimulus the agent was presented with
        
    observation = test_env.get_observation()
    action = test_SiPS.get_action(observation)
    #check if agent acted after wait period to collect data
    if test_env.state["trial_time"] >= test_env.W and not action == 0: 
        trial_data[trial_num,1] = action
        trial_data[trial_num,2] = test_env.state["trial_time"]
        trial_data[trial_num,3] = test_env.state["rewarded_action"] == action

    test_env.transition(action)
    #if transition starts new trial, increase counter (ends experiment if final trial)
    if test_env.state["trial_time"] == 0: #check if new trial
        print("trial " + str(trial_num) + " results: " + str(trial_data[trial_num,0:]))
        trial_num += 1

In [None]:
# STEP BY STEP TESTING

# if test_env.state["trial_time"] == 0: #check if new trial
#     trial_data[trial_num,0] = test_env.state["current_stimulus"] #record which stimulus the agent was presented with

# print(test_env.state)
# print("beliefs")
# print(test_SiPS.ECM.beliefs[0:20])
# print("expectations")
# print(test_SiPS.ECM.expectations)

# observation = test_env.get_observation()
# action = test_SiPS.get_action(observation)
# print("Action: " + str(action))
# #check if agent acted after wait period to collect data
# if test_env.state["trial_time"] >= test_env.W and not action == 0: 
#     trial_data[trial_num,1] = action
#     trial_data[trial_num,2] = test_env.state["trial_time"]
#     trial_data[trial_num,3] = test_env.state["rewarded_action"] == action

# test_env.transition(action)
# #if transition starts new trial, increase counter (ends experiment if final trial)
# if test_env.state["trial_time"] == 0: #check if new trial
#     trial_num += 1



# print("excitations")
# print(test_SiPS.ECM.trace_excitations[0:20])
# print("activations")
# print(test_SiPS.ECM.trace_activations[0:20])


In [None]:
test_SiPS.reflex_ECM.action_primes

In [None]:
test_SiPS.ECM.valences