Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to connection transform simulation #1330

Merged
merged 1 commit into from
Jul 10, 2017
Merged

Changes to connection transform simulation #1330

merged 1 commit into from
Jul 10, 2017

Conversation

drasmuss
Copy link
Member

Motivation and context:
Two changes:

  1. nengo.Connection(x, ens.neurons) was applying the neural gains before the synapse (synapse(x*gains)), whereas nengo.Connection(x, ens) was applying the gains after the synapse (synapse(x)*gains). This changes it so that the implementation is consistent in both cases (I went with the latter formulation).

  2. All Connections get created with a default transform=1, which has no effect. For ensemble->x connections this gets rolled into the decoders, but for node->x connections we're just doing an unnecessary multiplication by 1 for all those connections. The second commit optimizes those identity multiplications out of the connection builder. (Note: we can't do a similar optimization for identity matrices, because those might be the target of learning rules).

How has this been tested?
One of our unit tests failed after the neural gain change, because it was tuned to the previous implementation. I adjusted a parameter slightly to get it to pass again.

How long should this take to review?

  • Quick (less than 40 lines changed or changes are straightforward)

Types of changes:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Checklist:

  • I have read the CONTRIBUTING.rst document.
  • [n/a] I have updated the documentation accordingly.
  • [n/a] I have included a changelog entry.
  • [n/a] I have added tests to cover my changes.
  • All new and existing tests passed.

Copy link
Member

@tbekolay tbekolay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@drasmuss
Copy link
Member Author

I've been thinking about the first change a bit more (omitting identity transformations). It might be a bit too magicky, in that the API is suggesting there is a transform op happening (since the default transform is 1), and the simulator is suggesting there is a transform op there (by making a probeable weights signal), but it isn't actually being used. A regular nengo user would never notice the magic, but it might be surprising to someone digging under the hood of nengo a bit.

So, a couple ideas:

  1. Don't worry about it (it's an acceptable level of magic, only affects more advanced users)
  2. Don't create the weights signal when we optimize out identity transforms. However, this means that the weights would not be probeable, which might surprise people if they expect a transform on the Connection.
  3. Changes from 2, plus change the default transform to None. Everything works the same, but now the API is telling the user that if they don't specify a transform there won't be one there. The idea is to change the mental model from "default is 1 -> optimize that out" to "default is None -> add a transform if the user provides one".

I lean towards 3, because it is the most explicit. However, it is the least transparent, since it changes the Connection API. But since we're only changing the default value, and no behaviour is changing, it should be functionally invisible and backwards compatible.

@tbekolay
Copy link
Member

I like option 3, having an explicit "don't apply a transform to this", but I'm not sure exactly what we should do about the weights signal. You're always going to need a weights signal when there are decoders, so it's not sufficient to say that there are no weights when transform is None. Of course, it's only people digging into the backend that will be affected by this, so inconsistencies are fine (and can be cleaned up in the builder refactoring).

Another question is whether we will handle the transform=None case only for 1D -> 1D connections? Or for all cases in which we're going from an N-d signal to another N-d signal? What about for neuron to neuron connections in which both ensembles have the same number of neurons (though we might handle this in the Connection.transform parameter already)?

@drasmuss
Copy link
Member Author

I like option 3, having an explicit "don't apply a transform to this", but I'm not sure exactly what we should do about the weights signal. You're always going to need a weights signal when there are decoders, so it's not sufficient to say that there are no weights when transform is None

Yeah it's all kind of dependent on how people think about the connection weights. For decoded connections (ens->x) we still always have a weights signal (in the current implementation and the proposed transform=None implementation). The previous mental model was

weights = transform if decoders is None else dot(decoders, transform)
-> transform = 1
-> weights == decoders * 1 == decoders

the new mental model would be

weights = decoders if transform is None else dot(decoders, transform)
-> transform = None
-> weights == decoders

But in either case weights is what you're probing, so nothing really changes.

For the Node case, the previous mental model was

weights = transform if decoders is None else dot(decoders, transform)
-> decoders = None
-> weights == transform

which in the new model would be

weights = decoders if transform is None else dot(decoders, transform)
-> decoders = None
-> weights == decoders == None

So under this new theoretical mental model you wouldn't expect there to be connection weights on a non-decoded connection if transform=None.

I think the new mental model makes slightly more sense, because it is conditional on the thing that the user sets/interacts with (transform) rather than the hidden build-stage property (decoders). But that's a pretty fuzzy/subjective/minor distinction. The counterargument is that the mental model is just "connections have connection weights", so any case in which users couldn't probe the connection weights they would be surprised. Although the counter-counterargument might be that that is a bad mental model, which we should discourage by not allowing them to probe "connection weights" that don't actually represent connections between neurons (similarly to how we don't let people set function on node->x connections).

Another question is whether we will handle the transform=None case only for 1D -> 1D connections? Or for all cases in which we're going from an N-d signal to another N-d signal?

My inclination would be to keep the current behaviour, where it works for any N-d -> N-d signal. Perhaps that's another argument for changing the mental model, as previously if you probed the weights on a N-d -> N-d connection you'd get a scalar signal, which doesn't make a lot of sense if you're thinking of that signal as "connection weights".

What about for neuron to neuron connections in which both ensembles have the same number of neurons (though we might handle this in the Connection.transform parameter already)?

Yeah that's kind of messy. In the current implementation this is implemented as weights=1 , it doesn't get expanded or anything. So it has the same issue where that's an odd result if your mental model is that you're probing connection weights. But in that case it also seems like it'd be weird if the weights were not probeable at all (we'd definitely expect there to be weights on a neuron--neuron connection). We could add a dummy probeable weights signal that is the full connnection weight matrix? But we wouldn't want to do that unless it's actually being probed, since that could waste a lot of memory. So then we're adding a bunch of special case code to this, which isn't great.

To provide some further perspective on this, the reason I first started reconsidering this change is because in nengo_dl, a transform=1 could become a transform=x after training. So by optimizing out the transform we're decreasing the representational power of the network -- if the user's mental model is that Connections have a trainable transform=1. So that's why I thought it might be better to have the mental model transform=None, in which case users wouldn't expect a trainable parameter there (they could add one explicitly by setting transform=1).

Sooo maybe the conclusion from all this is that we should just not do this optimization in the reference builder, and leave it up to backends (or a more downstream stage in the refactored builder)?

Also, I know I've written a ton of words on this, but that's just because I think it's a kind of interesting/subtle design decision. I'm not actually that passionate about the outcome, whatever we decide.

Copy link
Collaborator

@jgosmann jgosmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Not sure what the best thing with regard to the transform is. It is indeed pretty subtle, there seem to be good arguments for all the options, and I haven't thought deeply about it.

@drasmuss
Copy link
Member Author

I implemented this in the backend and I think it makes the most sense there. You can do more detailed optimizations when you have access to knowledge about the backend anyway, so I think we want the reference builder to focus more on just building a nice, simple, reliable representation, and leave the fancy stuff up to a more downstream system.

So I reverted that change, just leaving the synapse/gain change.

@tbekolay
Copy link
Member

Sounds good, I'll merge shortly. We should move the discussion on transforms to a new issue perhaps?

@tbekolay tbekolay merged commit e687452 into master Jul 10, 2017
@tbekolay tbekolay deleted the conn_transform branch July 10, 2017 17:36
@adityagilra
Copy link

Hi,

I have some comments regarding change 1 by @drasmuss , i.e. """nengo.Connection(x, ens.neurons) was applying the neural gains before the synapse (synapse(x*gains)), whereas nengo.Connection(x, ens) was applying the gains after the synapse (synapse(x)*gains). This changes it so that the implementation is consistent in both cases (I went with the latter formulation)."""

The product and the filtering operations commute if synapse is a linear filter which it usually is being an exponentially decaying filter. So on the face of it, it shouldn't matter how you write this. But learning of weights get screwed up both for the test: nengo/tests/test_learning_rules.py as @drasmuss mentioned and in my simulations!

Here I try to outline my understanding of this issue. Please correct me if I am wrong.

I have Ensemble.neurons to Ensemble.neurons connection conn whose weights are being learned. Now in builder/learning_rules.py, it is model.sig[conn]['weights'] which gets updated if learning rule is set.

Now the key difference between the two implementations:

A) In old Nengo v 2.4.0 i.e. before this change 1, this is what would happen:
model.sig[conn]['weights'] was pre-multiplied by gain in builder/connection.py once at build time only.
Then learning acted on 'weights' signal as usual in builder/learning_rules.py . After this, gains were never applied to the weights in calculating the input to the neurons. So if I started out with zero weights, gains had no effect. Learning would train the weights.

B) In new Nengo v 2.5.0 i.e. after the change 1, this what happens now:
model.sig[conn]['weights'] is not pre-multiplied by gain in builder/connection.py at build time.
Learning acts on 'weights' signal as usual in builder/learning_rules.py. But now gains are applied at every time step to the weights, using an ElementWiseInc operation (which was not there in previous version), to calculate the input to the neurons.

For your ready reference, here's the commit that introduced this:


I think the new implementation is the correct one if gains are to be properly taken into account during learning i.e. consistent with the NEF equations as advertized. However, I had already used the old Nengo for extensive simulation for my pre-print. So I had to re-write the equations for the neural current without using the gain \nu on the weights that are being learned. Everything in my pre-print is valid and consistent now, just caused me a low of searching for why learning rates and distributions of neural activity that were working fine before with v2.4.0, suddenly went for a toss with the new version 2.5.0!

I've just uploaded the code for the pre-print and README to github https://github.com/adityagilra/FOLLOW.

Cheers!

@jgosmann
Copy link
Collaborator

jgosmann commented Oct 3, 2017

@Seanny123 Tangentially related to some issues you had, might clear some things up? Especially this line?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants