Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NNLS weight solvers #1027

Merged
merged 5 commits into from
Jun 11, 2016
Merged

Fix NNLS weight solvers #1027

merged 5 commits into from
Jun 11, 2016

Conversation

hunse
Copy link
Collaborator

@hunse hunse commented Apr 22, 2016

Fixes #1019, as well as some other small issues.

@hunse hunse added this to the 2.1.1 release milestone Apr 25, 2016
@hunse
Copy link
Collaborator Author

hunse commented May 24, 2016

@tcstewar, can you review, since you filed the original #1019 bug that this addresses?

@tcstewar
Copy link
Contributor

This looks good to me, and fixes the original problem. I also confirmed that if you use the nnls with weights=True, you get the same result as if you used weights=False (if the encoders are all positive).

However, oddly, if we use normal encoders and weights=True, the results are worse than I'd expect. For example:

model = nengo.Network()
with model:
    stim = nengo.Node(lambda t: np.sin(t*np.pi*2))
    a = nengo.Ensemble(50, 1)
    b = nengo.Ensemble(50, 1)#, encoders=nengo.dists.Choice([[1]]))

    nengo.Connection(stim, a)
    nengo.Connection(a, b, solver=nengo.solvers.Nnls(weights=True))

    p = nengo.Probe(b, synapse=0.03)
    p_stim = nengo.Probe(stim, synapse=0.03)

sim = nengo.Simulator(model)
sim.run(2)    

pylab.plot(sim.trange(), sim.data[p_stim])
pylab.plot(sim.trange(), sim.data[p])
pylab.show()    

produces this:

image

It looks to me like it could have gotten a better fit by just multiplying all the weights by ~1.4. Why didn't it find that solution?

@hunse
Copy link
Collaborator Author

hunse commented May 30, 2016

That is a good question. Here's my theory:

  • We pose our normal decoder solving problem as dot(A, X) = Y where A is our activation, X is the decoders, and Y is the targets (in the case of representation, equal to our eval points).
  • To solve for the weights, I take the targets and multiply them by the encoders E, so we get dot(A, W) = dot(Y, E), where W is the weights we want to solve for.
  • If we're doing NNLS, all the weights must be non-negative, and so are the activities, so their product is non-negative. dot(Y, E) can be quite negative, though.
  • Solving the NNLS problem minimizes the RMS amplitude of dot(A, W) - dot(Y, E). Since there are negative terms that we can never get close to, the best we can hope to do is zero for those terms, so those terms end up contributing a lot of the error.
  • All this is being done in the input space of the post-synaptic neurons, and doesn't account for the bias and nonlinearity of the neurons. So in the example above, doubling the weights helps, but only because there are a lot of neurons that are already silent who are unaffected by the weight change. If all the output neurons were linear, doubling the weights doesn't help. So in the eyes of the solver, it's found the best solution to the linear problem it's given. (I don't fully understand this yet, though.)
  • Choosing all non-zero intercepts allows the solver to perform almost perfectly. i.e. intercepts=nengo.dists.Uniform(0, 1) in the output population. I don't fully understand this yet either, but basically my thinking is that this makes the active (i.e. firing, non-silent) range of the neuron the same as the controllable range of the neuron. If you have a neuron with a negative intercept, this means that you need a negative current to make the neuron silent, and we can't get a negative current with non-negative weights. So essentially we always have some off and some on neurons fighting against each other, making the amplitude of the result too small. Here's the result with non-negative intercepts:
    figure_1

@tcstewar
Copy link
Contributor

Ah, that makes total sense. For about half the neurons, it's finding a pretty perfect set of weights. For the other half, it's supposed to find weights that give a negative number for the input current, but it can't, so it just gets that as small as possible. But that screws up the decode, giving a smaller value than it should (since those neurons that are getting too much input are screwing up the decode, and will always be pushing the decoded value towards zero). When I did the "just multiply by 1.4" thing, I was actually making the input current wrong for all the neurons, but in a way that things kinda cancelled as far as the decode was concerned. But in this scenario the weights that are best at getting the right input current to each neuron are not the weights that are best at getting the right decoded value out of the population.

It'd be nice to toss this observation into the NNLS documentation, along with your suggestion for the intercepts. But that's a separate PR from this one. Now that I (think I) understand what's happening here, this PR looks great to me.

@hunse
Copy link
Collaborator Author

hunse commented May 30, 2016

There's still a few things I'm working on adding, so I can throw that bit of documentation in as well.

@hunse hunse force-pushed the fix-weight-solvers branch 2 times, most recently from fc19682 to 8373d0c Compare May 30, 2016 14:52
@hunse
Copy link
Collaborator Author

hunse commented May 30, 2016

Ok @tcstewar, two more little commits you can look at. There was some strange bug in the NnlsL2 solver and I couldn't figure out why it wasn't working when calling out to the Nnls solver for the core of the solving, so I just copied over the code and now it works.

Also, one thing worth mentioning is that Nnls solves the least squares problem directly, whereas NnlsL2 solves the Gram system (i.e. the normal equations). These give slightly different answers, where NnlsL2(weights=True, reg=0.0) does slightly worse than Nnls(weights=True) (0.215 RMSE vs 0.186). However, NnlsL2 is considerably faster, at least for the system I was trying it on, because the Gram system has much smaller matrices to deal with. We should probably document this somewhere, though I'm not sure if it's best to put it in the docstring or a notebook. I'm already not totally happy with the redundancy between the docstrings for the different types of Nnls solvers (i.e. I put that comment about non-negative intercepts in all of them).

Y, m, n, _, matrix_in = format_system(A, Y)
Y = self.mul_encoders(Y, E, copy=True)
d = Y.shape[1]
Y[Y < 0] = 0 # makes the final RMS error more reasonable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should be doing this. The RMS error is what it is -- this is returning the RMS error where any negative target value is treated as if its target value was 0, so it's not really the RMS error any more. Having this happen silently seems a bit odd to me...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, never mind -- I misinterpreted what was happening here (I thought it was just about giving a more reasonable RMS report out of the solver, rather than something that affects the actual decoders)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, you're right, this is bad. I liked it because it helped to compare different solvers when I was just playing around, but you're right that it misrepresents things. It could also be problematic if we ever have neurons that can have negative activities.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's so strange. It does make a difference in the Gram system (if you do it before multiplying Y by the transposed activities), but not in the regular system. I have no idea why that is.

@tcstewar
Copy link
Contributor

Looks good to me. I think doing the extra documentation in a notebook makes the most sense, as there are a lot of possibilities of different ways of making use of this solver, and it's not something that people have done much of yet.

@hunse
Copy link
Collaborator Author

hunse commented May 30, 2016

Ok, I redid those last two commits so they don't affect the RMSE. We just clip Y to be non-negative in the Gram system, because this helps for some reason.

@tcstewar
Copy link
Contributor

Cool. That looks great to me. Thanks! (and thanks for adding the TODO about figuring out what's happening there in the future... ) :)

@hunse
Copy link
Collaborator Author

hunse commented May 30, 2016

Yeah, I mean it kind of makes sense. Making Y non-negative doesn't change the original system, because the solution can never achieve those negative values, so it just does what it can to minimize the error on the positive ones and keeps the negatives zero. The value that minimizes f(x) + large_number is the same that minimizes f(x). Though that said, clipping things to zero in the original system might change the balance of things, for example if the target value is -10, the squared error difference between returing 0 and 1 is larger than if the target value is 0.

In the Gram system, if we don't clip Y, then when we multiply by A transpose, some of the positives and negatives might cancel out, and we end up with a different system than if we clip Y first. So it makes sense that things change, and the whole Gram system/normal equation idea is based on things being linear (i.e. linear least squares), and when we have the non-negative constraint that's no longer the case.

@tcstewar tcstewar removed their assignment May 31, 2016
@hunse
Copy link
Collaborator Author

hunse commented Jun 7, 2016

Added changelog entries. I think this is ready for merge.

@tbekolay
Copy link
Member

tbekolay commented Jun 9, 2016

I'll look at this shortly as it's the last thing for 2.1.1!

@tbekolay
Copy link
Member

We found a minor issue which Eric fixed; history is clean now so I'll merge this tomorrow morning unless there are objections!

Also fix up regularization to be more accurate.
`ravel` is more efficient as it does not make copies.
- Fixes a bug where Nnls was returning weights of the wrong
  dimensionality when `weights=True` (fixes #1019).
- Also fixes some solvers to be safer and faster in flattening
  the weights they return.
This was not working properly because we were not clipping `Y`
values to be non-negative before forming the Gram system.
Now, `Nnls(weights=True)` and `NnlsL2(weights=True, reg=0)`
give almost identical results.
@tbekolay tbekolay merged commit bdcb2c0 into master Jun 11, 2016
@tbekolay tbekolay deleted the fix-weight-solvers branch June 11, 2016 15:27
@tbekolay tbekolay removed their assignment Oct 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants