In [None]:
#| default_exp basic_PS

# Basic PS
> This notebook gathers the most basic implementation of PS

(currently copy pasted from [here](https://github.com/HendrikPN/rl-ion-trap-tutorial/blob/master/ps.py))



In [3]:
#| export

import numpy as np

class PSAgent(object):
    def __init__(self, 
                 num_actions: int, # The number of available actions.
                 glow: float = 0.1, # The glow (or eta) parameter. 
                 damp: float = 0., # The damping (or gamma) parameter. 
                 softmax: float = 0.1 # The softmax (or beta) parameter. 
                ):
        """
        Simple, 2-layered projective simulation (PS) agent. We initialize an h-matrix with a single row of `num_actions` 
        entries corresponding to a dummy percept clip being connected to all possible actions with h-values of all 1. We 
        initialize a g-matrix with a single row of `num_actions` entries with all 0s corresponding to the *glow* values 
        of percept-action transitions.
                      
        NOTE: This simple version misses some features such as clip deletion, emotion tags or generalization mechanisms.
        
        """
        self.num_actions = num_actions
        self.glow = glow
        self.damp = damp
        self.softmax = softmax
        #int: current number of percepts.
        self.num_percepts = 0
        #np.ndarray: h-matrix with current h-values. Defaults to all 1.
        self.hmatrix = np.ones([1,self.num_actions])
        #np.ndarray: g-matrix with current glow values. Defaults to all 0.
        self.gmatrix = np.zeros([1,self.num_actions])
        #dict: Dictionary of percepts as {"percept": index}
        self.percepts = {}
        
    def predict(self, 
                observation: object # A percept in form of an object.
               )-> int : # The action to be performed.
        """
        Given an observation, returns an action.
        (1) Create a percept from an observation.
        (2) Add percept if it has not been encountered before.
        (3) Get action from h-values.
        (4) Update g-matrix.
        """
        # (1) create percept from observation
        percept = self._get_percept(observation)
        # (2) add percept to clip network if it has not been encountered before
        if percept not in self.percepts.keys():
            # add new percept
            self.percepts[percept] = self.num_percepts
            # increment number of percepts
            self.num_percepts += 1
            # add column to h-matrix
            self.hmatrix = np.append(self.hmatrix, 
                                     np.ones([1,self.num_actions]),
                                     axis=0)
            # add column to g-matrix
            self.gmatrix = np.append(self.gmatrix, 
                                     np.zeros([1,self.num_actions]),
                                     axis=0)
        
        # (3) get action from h-value
        # get index from dictionary entry
        percept_index = self.percepts[percept]
        # get h-values
        h_values = self.hmatrix[percept_index]
        # get probabilities from h-values through a softmax function
        prob = self._softmax(h_values)
        # get action
        action = np.random.choice(range(self.num_actions), p=prob)
        
        # (4) update g-matrix
        self.gmatrix[int(percept_index),int(action)] = 1.

        return action

    def train(self, reward):
        """
        Given a reward, updates h-matrix. Updates g-matrix with glow.
        """
        # damping h-matrix
        self.hmatrix = self.hmatrix - self.damp*(self.hmatrix-1.)
        # update h-matrix
        self.hmatrix += reward*self.gmatrix
        # update g-matrix
        self.gmatrix = (1-self.glow)*self.gmatrix
    
    # ----------------- helper methods -----------------------------------------

    def _get_percept(self, observation):
        """
        Given an observation, returns a percept.
        This function is just to emphasize the difference between observations
        issued by the environment and percepts which describe the observations
        as perceived by the agent.
        """
        percept = str(observation)
        return percept
    
    def _softmax(self, x):
        """
        Given an input, calculates the normalized exponential function.
        """
        # rescale exponential to avoid large numbers
        rescale = max(x)
        exp_x = np.exp(self.softmax*(x-rescale))
        # get normalization
        norm = sum(exp_x)
        # calculate normalized exponential
        softmax_x = exp_x/norm

        return softmax_x

## Properly documenting your functions

In the class above you will see the preferred way of documenting your code (I have only done so for the main class and the funcion `predict`. It is based in [`docments`](https://fastcore.fast.ai/docments.html). As you will see, this translates directly into a nice webpage documentation. Nonetheless, [`sphynx`](https://www.sphinx-doc.org/en/master/usage/extensions/example_numpy.html) type documentation is also supported (and helpful when having very long descriptions).

To see how the documentation will look like, you can use:

In [4]:
from nbdev import show_doc

In [6]:
show_doc(PSAgent)

---

[source](https://github.com/{user}/projective_simulation/blob/master/projective_simulation/basic_PS.py#L9){target="_blank" style="float:right; font-size:smaller"}

### PSAgent

>      PSAgent (num_actions:int, glow:float=0.1, damp:float=0.0,
>               softmax:float=0.1)

Simple, 2-layered projective simulation (PS) agent. We initialize an h-matrix with a single row of `num_actions` 
entries corresponding to a dummy percept clip being connected to all possible actions with h-values of all 1. We 
initialize a g-matrix with a single row of `num_actions` entries with all 0s corresponding to the *glow* values 
of percept-action transitions.

NOTE: This simple version misses some features such as clip deletion, emotion tags or generalization mechanisms.

|    | **Type** | **Default** | **Details** |
| -- | -------- | ----------- | ----------- |
| num_actions | int |  | The number of available actions. |
| glow | float | 0.1 | The glow (or eta) parameter. |
| damp | float | 0.0 | The damping (or gamma) parameter. |
| softmax | float | 0.1 | The softmax (or beta) parameter. |

When creating the webpage documentation, `nbdev` will put there whatever does not have `#| hide`. Moreover, it will also put all the `show_doc` it finds within tghe notebooks. Sometimes, you want to hide the doc from a class, but show instead the documenation of a particular function. To do this, put `#| hide` in the corresponding cell and then use `show_doc` of the desired function:

In [7]:
show_doc(PSAgent.predict, name = 'predict')

---

[source](https://github.com/{user}/projective_simulation/blob/master/projective_simulation/basic_PS.py#L38){target="_blank" style="float:right; font-size:smaller"}

### predict

>      predict (observation:object)

Given an observation, returns an action.
(1) Create a percept from an observation.
(2) Add percept if it has not been encountered before.
(3) Get action from h-values.
(4) Update g-matrix.

|    | **Type** | **Details** |
| -- | -------- | ----------- |
| observation | object | A percept in form of an object. |
| **Returns** | **int** | **The action to be performed.** |

## A nice example

Aside of full explanations given in the tutorials, it is also nice to include some examples of use after defining a class/function:

In [8]:
num_actions = 5
agent = PSAgent(num_actions)

In [9]:
agent

<__main__.PSAgent at 0x7fd8c1f31810>