# SenMo Net

concepts:
1. network
2. layer
3. neuron
4. connection

example: 

froward network:
layer a predicts b, c, and d, with decreasing accuracy
layer b predicts c, and d
layer c predicts d
layer d predicts nothing

backwards network:
layer d predicts c, b, and a, with decreasing accuracy
layer c predicts b, and a
layer b predicts a
layer a predicts nothing

forwards network competes against backwards network for the highest overall score of predictions. so, layer a in forward network competes against d in backwards network when it predicts d and d predicts a, whoever loses gets harsher errors applied. a also competes against b when they predict each other.

predictions are applied in time. a gets activated and produces predictions for all subsequent layers at t=0, but its prediction for b layer is applied right away and checked at t=1, whereas its prediction for c layer is applied at t=1 and checked at t=2, etc.

the backwards network is what trains the forward network, they replace back propogation for each other.

to be used in a sensorimotor context - an autonomous agent exploring an environment. forward network can be thought of as the sensory network, "given what I see (layer a), what motor neurons will fire (layer d)?". backwards network can be seen as the motor network, "given what motor neurons fire (layer d), what will I see (layer a)?"

this is the simplest form of the idea. for more efficiency shape as an autoencoder. for further efficiencies combine hierarchically.

https://www.reddit.com/r/learnmachinelearning/comments/muu060/how_to_instantiate_a_neural_network_design/

In [None]:
from random import random

In [9]:
class Layer():
    def __init__(self, id):
        self.id = id
    
    def connect(self, neurons: list):
        self.neurons = neurons


class Neuron():
    def __init__(self, layer):
        self.layer = layer
        self.activity = 0
        self.active = False

    def connect(self, weights: list):
        self.weights = weights
    
    def fire(self):
        self.active = True
        for weight in self.weights:
            weight.activate()


class Connection():
    def __init__(self, source: Neuron, destination: Neuron, distance: int):
        self.source = source
        self.destination = destination
        self.distance = distance
        self.weight = 0
        self.activations = []
        self.used = []
        
    def activate(self):
        # delay according to distance
        self.activations.append(self.weight)
    
    def deliver(self):
        weight = self.activations.pop()
        self.destination.activity = self.destination.activity + weight
        # wait for it to activate and make sure we re-adjust the weight...
        self.used.append(weight)
    
    def adjust(self):
        weight = self.used.pop()
        if weight:
            error = self.destination.activity - weight
            self.weight = weight + (error / 5)
            

we're going to make a 3 layered network:
- a  c  e
- b  d  f


In [13]:
#initialize network
first = Layer(0)
second = Layer(0)
third = Layer(0)
a = Neuron(layer=first)
b = Neuron(layer=first)
c = Neuron(layer=second)
d = Neuron(layer=second)
e = Neuron(layer=third)
f = Neuron(layer=third)
ac = Connection(source=a, destination=c, distance=1)
ad = Connection(source=a, destination=d, distance=1)
ae = Connection(source=a, destination=e, distance=2)
af = Connection(source=a, destination=f, distance=2)
bc = Connection(source=b, destination=c, distance=1)
bd = Connection(source=b, destination=d, distance=1)
be = Connection(source=b, destination=e, distance=2)
bf = Connection(source=b, destination=f, distance=2)
ca = Connection(source=c, destination=a, distance=1)
cb = Connection(source=c, destination=b, distance=1)
ce = Connection(source=c, destination=e, distance=1)
cf = Connection(source=c, destination=f, distance=1)
da = Connection(source=d, destination=a, distance=-1)
db = Connection(source=d, destination=b, distance=-1)
de = Connection(source=d, destination=e, distance=1)
df = Connection(source=d, destination=f, distance=1)
ea = Connection(source=e, destination=a, distance=-2)
eb = Connection(source=e, destination=b, distance=-2)
ec = Connection(source=e, destination=c, distance=-1)
ed = Connection(source=e, destination=d, distance=-1)
fa = Connection(source=f, destination=a, distance=-2)
fb = Connection(source=f, destination=b, distance=-2)
fc = Connection(source=f, destination=c, distance=-1)
fd = Connection(source=f, destination=d, distance=-1)
a.connect([ac, ad, ae, af])
b.connect([bc, bd, be, bf])
c.connect([ca, cb, ce, cf])
d.connect([da, db, de, df])
e.connect([ea, eb, ec, ed])
f.connect([fa, fb, fc, fd])

In [61]:
# initialize weights
all_weights = [
    ac, ad, ae, af, 
    bc, bd, be, bf, 
    ca, cb, ce, cf, 
    da, db, de, df, 
    ea, eb, ec, ed, 
    fa, fb, fc, fd]
for weight in all_weights:
    weight.weight = random()

In [62]:
# normalize weights to layers
sensory_layers = [
    [ac, ad, bc, bd],
    [ae, af, be, bf],
    [ce, cf, de, df]] 
motor_layers = [
    [ca, cb, da, db],
    [ea, eb, fa, fb], 
    [ec, ed, fc, fd]]
    
for n in [sensory_layers, motor_layers]:
    for l in n:
        total = sum([c.weight for c in l])
        for c in l:
            c.weight = c.weight / total

ok, my first design, with all sensory nodes on the left and all motor neurons on the right is correct, but only for the smallest scale. At the larger scale this is effectively false because we don't have all our motor neurons on one side of our head.

but we must compress the information flow in both directions at all times. here are my notes:

Sensorimotor nodes on the surface of a sphere.

Map outputs back to inputs.

Move nodes on surface so they are close to each other to minimize bandwidth.

When there is surprise, make a hidden node at the best layer inside that describes the fork.

...

Well intelligence is an union. Every layer gets outside input from deeper within the union and from ultimately, the surface. Each layer takes that input and decides which subset of neurons fire. However, it has, during that round, a new, large set of neurons that are known to fire after this subset, that are put in to a predictive state. Which the outside world, other layers, then choose the subset of those and others too.

Predictive means the activation threshold is lowered...

There are therefore 3 vectors of prediction, sensory motor, which are conjoined, but signals coming from the deep can be seen basically as coming out to ultimately terminate in motor neurons, while from the surface you get sensory data, and the third is from inside the layer.

They condense down the options for each other. We do prediction and compression everywhere in the brain, and we minimize bandwidth.

On the smallest scales, there is sensory on the left and motor on the right. When you scale up you realize, not all the network needs to go through the corpus collasum... So you make miniature ones.

"I'm a sensory neuron that is most closely associated with this set of sensory neurons. So if we were arranged on the surface of a sphere I would be in the center of this group. Now perhaps this group is really 2 groups, I fire when two disparate things happen. Then the two groups I'm a part of would kind have me bridging the gap. Etc. Anyway. When I fire I send off a signal to hidden nodes, ones that have learned they fire after me and saying are near by, many are, but don't are far away, many layers in... If they do fire in the next time step then those connections are strengthened, if not they are weakened. That just means we keep track of how many times we activated them and we're correct and how many times we were wrong. It's just an average. How to make these connections highly contextual? Idk. Anyway sometimes I will be put in a predictive state and that means the inner layers think I'll fire next, but I may not, actually, it all depends on the environment since I'm a sensory neuron. If the inner layers predictions are violated the connection weakens but only in that context... Not sure how to manage that... Maybe it'll do it itself.

Anyway. Yes those particular connections are weakened which is to say, this combination of neurons is not good at predictions me... But each of those neurons might be good at predicting me in different contexts so... It just doesn't seem fine tuned enough, but maybe hey, it's really not a big deal to black out a small portion... Idk. Anyway. Simply because we are constantly reducing bandwidth, things that are similar get put code together. Semantics are naturally exposed. Encoders are naturally produced. Not sure how, but I can see it leads there.

So to recap, there are 2 alternating processes. Inputs from other layers, by the way the further away the connection the higher the threshold for fidelity. You have to be right more than 50% of the time for next layer stuff. But far away connections it's like 90% otherwise the connection is severed. But if course this needs to be converted to significance with a bell curve. New connections are made at random, but these don't last long usually.

The other process is quite simply internal to each layer itself. And it's slightly different on the outer most layer than it is on all the hidden layers. On the outer most one the sensory inputs are chosen by the environment. But the layer has the final say on which motor neurons fire. This means and it puts the next set of both in predictive state. If it's right about the next time step of activations then those inter layer connections get stronger. If it's wrong, weaker.

So it seems the layer only has the final say half the time. In this view, the external layers send signals, choosing which ones are active, as well as a list of neurons that should be predictive, then the layer chooses which ones are active of those, then chooses a list of prective, that doesn't seem right. 

Instead the higher layers (deeper) should choose a list of predictives, and the lower layers (outer) should choose which of that list are actually activated. that way from the motor point of view the sensory is higher, and from the sensory point of view the motor is higher. they are each other's boss.

what role then is there fore the layer itself? idk, perhaps none. Another thing, no need to have a time delay of predictions, actually. I don't think that's necessary. but I do think it's necessary to have more than 1 layer connections. you need to be able to hear from nodes that are far away. This is looking more and more like a sensorimotor autoencoder. but frankly, perhaps this speaks more to how those autoencoders should be wired up together than it does about anything else. 

had a realization this morning:
Two more things

1 each layer has inter connections that go from a node to an inner node. That way specific sets of connections fire based upon the whole layer.

2 don't forget distance metric. If a node gets a signal from a node that shares no inputs with the rest of the nodes that this gets a signal from, it's farther away.