# Overview

<b>Context</b>: We are controlling a system ('the plant') for which we have no model of. We want to learn how to make it move like we want it to move.

<b>Idea</b>: Use a wake / sleep pattern, where the connectivity of the network changes such that you're learning how the plant moves during the wake cycle, and learning what doing different actions resulted in during the sleep cycle. Iterate until you've learned how to make the plant move as desired. To change the connectivity we use a routing system, which is just a node now but represents the basal ganglia but jesus how much free time do you think I have?

I'll walk through the construction now by building the wake and sleep networks, and then we'll mesh them together.

## Wake cycle network

We want to learn two things when we're awake, the output of the network, and what input caused that output. To simplify we're not worrying about any delay from the plant or synapses. 

So the wake cycle network has a desired state for network, called des_x, and it sends that down to the plant. The predicted_plant_output ensemble also receives the same input as the plant, and on this connection we're learning to predict the output of the plant given the input. 
The predicted_plant_output, in turn, feeds into predicted_des_x, and on this connection we're learning given a plant_output, what was the input that caused it.

In [1]:
import nengo
%load_ext nengo.ipynb

import numpy as np

%matplotlib inline 
import pylab
    
model = nengo.Network()
with model:

    desired_output = nengo.Node(output=np.sin)
    des_x = nengo.Ensemble(n_neurons=100, dimensions=1)
    nengo.Connection(desired_output, des_x)
    
    # create our 'plant'
    def plant(t,x):
        return -x
    plant = nengo.Node(output=plant, size_in=1)
    nengo.Connection(des_x, plant)
    
    predicted_plant_output = nengo.Ensemble(n_neurons=100, dimensions=1)
    learn_conn1 = nengo.Connection(des_x, predicted_plant_output,
                                 learning_rule_type=nengo.PES(learning_rate=1e-5))
    nengo.Connection(plant, learn_conn1.learning_rule, transform=-1)
    nengo.Connection(predicted_plant_output, learn_conn1.learning_rule)
    
    predicted_u = nengo.Ensemble(n_neurons=100, dimensions=1)
    learn_conn2 = nengo.Connection(predicted_plant_output, predicted_u,
                                  learning_rule_type=nengo.PES(learning_rate=1e-5))
    nengo.Connection(des_x, learn_conn2.learning_rule, transform=-1)
    nengo.Connection(predicted_u, learn_conn2.learning_rule)

from nengo_gui.ipython import IPythonViz
IPythonViz(model, cfg='wake_cycle.viz.cfg')

<IPython.core.display.Javascript object>

## Sleep cycle network

Here things are a little tricky. First off, we create an intermediary population between the des_x and the plant. 
The role of this population is to transform the signal from a desired plant output into the control signal u that will generate this output. 

Assuming that the predicted_des_x and predicted_plant_output were trained up in the wake cycle, want we want to do then is drive predicted_plant_output, and as its value changes learn what input to the plant generated them.

Of course, when you run this it doesn't generate anything of interest because it hasn't been trained. But you can see the structure we want.

In [2]:
model = nengo.Network()
with model:

    desired_output = nengo.Node(output=np.sin)
    des_x = nengo.Ensemble(n_neurons=100, dimensions=1)
    
    # create our 'plant'
    def plant(t,x):
        return -x
    plant = nengo.Node(output=plant, size_in=1)
    
    generated_u = nengo.Ensemble(n_neurons=100, dimensions=1)
    learn_conn3 = nengo.Connection(des_x, generated_u,
                                  learning_rule_type=nengo.PES(learning_rate=1e-5))
    nengo.Connection(generated_u, plant)
    
    predicted_plant_output = nengo.Ensemble(n_neurons=100, dimensions=1)
    nengo.Connection(desired_output, predicted_plant_output)
    nengo.Connection(predicted_plant_output, des_x)
    
    predicted_u = nengo.Ensemble(n_neurons=100, dimensions=1)
    nengo.Connection(predicted_plant_output, predicted_u)
    
    nengo.Connection(predicted_u, learn_conn3.learning_rule, transform=-1)
    nengo.Connection(generated_u, learn_conn3.learning_rule)

from nengo_gui.ipython import IPythonViz
IPythonViz(model, cfg='sleep_cycle.viz.cfg')

## Combining the networks using dynamic routing

There's a lot of comments in here, but I've got a lot of comments to hopefully help through.

I've added a BG node that does the routing here based on if you're awake (== 0) or asleep (== 1). 
When you're awake you're moving the plant and learning to predict output and the corresponding input. 
When you're asleep you're learning if you want the plant to do 'this' then do 'that'. 

The cycle switches every 50 seconds and you can see that after one iteration we've learned how to make the plant move how it wants! By 'you can see' I mean by looking at 'des_x' and 'plant_output'. On the first round of being awake they're different, but they match for the second! 

Important notes: 
<ul>
<li>In sleep mode, when driving predicted_plant_output we drive it directly, not through its learned connection.
<li>Is sleep mode, when connecting predicted_plant_output to predicted_des_x, connect through its learned connection.
</ul>

In [3]:
model = nengo.Network()
with model:
    
    driving_input = nengo.Node(output=np.sin)
    des_x = nengo.Ensemble(n_neurons=100, dimensions=1)
    
    # create our 'plant'
    def plant(t,x):
        return -x
    plant = nengo.Node(output=plant, size_in=1, size_out=1)
    def mode_switching(t):
        if (t % 100) < 50:
            return 0
        return 1
    BG_mode = nengo.Node(output=mode_switching)
    
    predicted_plant_output = nengo.Ensemble(n_neurons=100, dimensions=1)
    predicted_des_x = nengo.Ensemble(n_neurons=100, dimensions=1)
    
    def router_func(t,x):
        # input is 
        # 0: BG mode
        # 1: driving_input 
        # 2: learn_pop3 (from des_x)
        # 3: plant
        # 4: predicted_plant_output
        # 5: predicted_des_x

        # output is 
        # 0: des_x
        # 1: plant
        # 2: learn_pop1 (to predicted_plant_output)
        # 3: learn_conn1.learning_rule
        # 4: learn_pop2 (to predicted_des_x)
        # 5: learn_conn2.learning_rule
        # 6: predicted_plant_output
        # 7: learn_conn3.learning_rule
        
        # when awake
        if abs(x[0]) < .1: 
            return np.hstack([x[1], # driving_input to des_x
                              x[2], # learn_pop3 (des_x) to plant
                              x[2], # learn_pop3 (des_x) to learn_pop1 (predicted_plant_output)
                              x[4] - x[3], # predicted_plant_output - plant to learn_conn1
                              x[4], # predicted_plant_output to learn_pop2 (predicted_des_x)
                              x[5] - x[2], # predicted_des_x - des_x to learn_conn2
                              np.zeros(2), # 0 to the rest 
                              ])
        # when asleep
        return np.hstack([x[4], # predicted_plant_output to des_x
                          0.0, # 0 to the plant
                          0.0, # 0 to predicted_plant_output through its learned connection
                          0.0, # 0 to learn_conn1 
                          x[4], # predicted_plant_output to predicted_des_x
                          0.0, # 0 to learn_conn2
                          x[1], # driving_input to predicted_plant_output
                          x[2] - x[5], # learn_pop3 (des_x) - predicted_des_x to learn_conn3
                          ])
    BG = nengo.Node(output=router_func, size_in=11, size_out=8)

    learn_pop1 = nengo.Ensemble(n_neurons=100, dimensions=1)
    learn_conn1 = nengo.Connection(learn_pop1, predicted_plant_output,
                                  learning_rule_type=nengo.PES(learning_rate=1e-5))
    learn_pop2 = nengo.Ensemble(n_neurons=100, dimensions=1)
    learn_conn2 = nengo.Connection(learn_pop2, predicted_des_x,
                                   learning_rule_type=nengo.PES(learning_rate=1e-5))
    generated_u = nengo.Ensemble(n_neurons=100, dimensions=1)
    learn_conn3 = nengo.Connection(des_x, generated_u, 
                                   learning_rule_type=nengo.PES(learning_rate=1e-5))

    # BG inputs
    nengo.Connection(BG_mode, BG[0])
    nengo.Connection(driving_input, BG[1])
    nengo.Connection(generated_u, BG[2])
    nengo.Connection(plant, BG[3])
    nengo.Connection(predicted_plant_output, BG[4])
    nengo.Connection(predicted_des_x, BG[5])
    # BG outputs
    nengo.Connection(BG[0], des_x)
    nengo.Connection(BG[1], plant)
    nengo.Connection(BG[2], learn_pop1)
    nengo.Connection(BG[3], learn_conn1.learning_rule)
    nengo.Connection(BG[4], learn_pop2)
    nengo.Connection(BG[5], learn_conn2.learning_rule)
    nengo.Connection(BG[6], predicted_plant_output)
    nengo.Connection(BG[7], learn_conn3.learning_rule)
    
from nengo_gui.ipython import IPythonViz
IPythonViz(model, cfg='sleep.py.cfg')

# Extensions

Lots of places to go from here! 
<ol>
<li> Test on more complex plants (TWO dimensions???)
<li> Use a full basal ganglia model
<li> Look at more intelligently driving 'predicted_plant_output' during sleep cycle (e.g. retracing paths followed during the wake cycle, or only exploring areas of state space explored during wake cycle)
<li> Do some learning in an SP state space
<li> Build into a more complex network that cycles between learning and trying to achieve a specific goal
<li> Use better tests for switching to sleep state (e.g. you've accumulated this much error, go to sleep and learn)
<li> Explore usefulness of an exploitation only awake cycle (where you don't learn anything when awake)
</ol>