-
Notifications
You must be signed in to change notification settings - Fork 11
Integrator accuracy #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrator accuracy #124
Conversation
|
Here's the test from #114 run on this branch. |
66bb8c1 to
6dbb4c7
Compare
4a2f137 to
18ac0c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code and the logic behind it seems solid to me. The main thing I did (other than rebasing it to master) is to do some exploration of what happens with different Ensemble parameters, to make sure we're not optimizing for one special case. Overall, I would say that this PR is a definitely improvement, but it does not overall fix #114 .
Here's some plots of running the same test, but varying n_neurons, max_rates, and intercepts. First, here's the current performance in master (with `target="sim"):
(The coloured lines are n_neurons=50, 100, and 300 and the error bars cover the three different input patterns tested: Cosine, Step, and Zero. intercept=X means intercepts=nengo.dists.Uniform(-X,X) and
max_rate=Y means max_rates=nengo.dists.Uniform(Y/2,Y))
Here's that same data with this PR:
So, this PR seems to be overall improving things, although for max_rate==400, intercepts==1, and n_neurons==50 or 100 we do actually get a bit worse. But my guess is that there are other problems in that case, so I'm happy with this PR's improvements.
However, that's all just on the simulator. Here's what that data looks like when target="chip". Note that for time reasons I've only run it for n_neurons=50 -- I'll run the other parameters asap and replace the chart:
So for this, it's less clear that it's an improvement. It seems to be a bit better, but the pattern is not what I was expecting. I'll run more examples to try to clarify things.
As a final point, it looks like the test_loihi_api.py::test_decay_magnitude test is failing:
> assert np.all(relative_diff < 1e-6)
E assert False
E + where False = <function all at 0x000001D9265329D8>(array([ 9.42821154e-01, -9.79198554e-04, -5.75119556e-04, -4.15444346e-04,\n -3.31192540e-04, -2.736454
03e-04, -2...7, 1.11720048e-07,\n -3.21961686e-07, -4.17173169e-07, -6.73085989e-07, -7.23017521e-08,\n -4.78374927e-07]) < 1e-06)
E + where <function all at 0x000001D9265329D8> = np.all
I'm not sure whether this was happening before, or if it was caused by my rebasing (although the rebasing went pretty easily, so I don't think it was that). Was this test passing before?
|
Two things: Why are you using As for the failing test, I'm not seeing that, and neither is Travis CI (I ran it a number of times, too). And it's not a chip thing, because that test doesn't run on the chip (it's just a unit test of a helper function). So maybe something about your Numpy is different, or there's some other difference on your machine? Though looking at your printout, it looks WAY off, so there must be a bug somewhere. I don't get why I'm not seeing it, though. |
Good point, but the test that you're running (from #114 ) is using
Interesting.... I just tried it on the intel cloud and it also passes there. |
|
One thing I'm puzzled about is why there are such significant differences between the emulator and the chip. With Finally, the biggest thing that concerns me is that performance deteriorates a lot for higher numbers of neurons, both in emulator and in chip, before and after these changes. So that's obviously something for future work, but that seems like pretty important future work to me. |
One possibility is that it might be the same weirdness that's giving me the failing test. I'll re-run the simulator tests on the intel cloud and see what that looks like.
I don't think there are interneurons in this model: the solver has
This is with |
Nope, that doesn't affect things -- I re-ran it all on the cloud and get exactly the same results (I've replaced the images above with the cloud-run data, just in case) |
|
That's on |
I found the cause! Here's the line in the test: s = np.sum(ys, axis=0)that needs to be changed to s = np.sum(ys, axis=0, dtype=np.int64)(the sum is silently overflowing as on some machines it seems to default to using int32) |
FYI, numpy/numpy#9464 -- Doesn't sound like there's any way to force int64 to be the default at the moment. :/ |
|
Test passes for me 👍 |
2eef211 to
f69d7aa
Compare
|
For what it's worth, this branch seems to dramatically improve (up to 2x improvement in accuracy versus master) the Delay Network for lower recurrent taus (e.g., 10ms). For larger recurrent taus (e.g., 100ms)---with the same readout tau as before---they both seem to perform the same (edit: not shown). In all cases I'm using the emulator (@celiasmith), nengo.Ensemble.max_rates.default = nengo.dists.Uniform(100, 120)
nengo.Ensemble.intercepts.default = nengo.dists.Uniform(-1, 0.5)
solver = nengo.solvers.LstsqL2(reg=0.1, weights=True)The inset in the top-right corner is the important part (plotting the error across the window). This is just a single trial, but they are both very consistent across trials. |
Looking at those little top-right corner plots (that's what we're talking about here, right?), it appears to me that it's the opposite. I assume the x-axis is the tau you're talking about here. It looks like for up to taus of around 50 ms (0.05 s), the plots are roughly the same. For example, at 0.05 s, they both appear to be around 0.15 NRMSE. It's for larger taus that there's a difference, for example at 0.1 s you see this 2x difference you were referring to (with the old one being around 0.3 NRMSE, and the new one around 0.15). Or is the tau on the x-axis of those little plots the "readout tau", which I assume is the delay that the network is trying to mimic or read out? And then you did analyses like the one above for many different recurrent taus, but you're just showing us the analysis for a shorter one (around 5 to 10 ms)? |
|
The x-axis is the time point within the window. Both plots are generated by a single trial with a fixed tau. The state of the DN decodes the entire window, and then we're plotting the error (y-axis) across that window (x-axis). I am showing the recurrent tau=10ms trial. The tau=100ms trial is not shown, but they both perform far worse and nearly-identically to each other. |
|
So the results that you're talking about---that this PR improves things for shorter recurrent taus---is not demonstrated by the plots above, since those are just for a single recurrent tau. Is that correct? When you look at the error for a particular recurrent tau, is that the integrated error over that whole window you're talking about (i.e. for any possible readout delay in the range 0 to 0.1 s)? Or is it just the error at the longest readout delay (0.1 s)? |
That's right. The main thing I was pointing out is that the overall error (y-axis) is better by a factor of ~2x for this branch versus master. We can talk about the longer taus but I think something much stranger is happening that could be somewhat orthogonal. I need to look into this. It could have to do with the spike generator on the input side in conjunction with the input being scaled by tau. |
|
Edit: Fixed this problem via #115 (comment). This is what it looks like for the longer taus (on either branch): nengo_loihi (tau=100ms): And this is what it's supposed to look like (simply changing nengo (tau=100ms): Again I think it has something to do with the spike generator. Maybe #115 as well. Any hints/ideas would also be helpful! In either case I think this is mostly independent of the improvements of this branch. |
|
And FWIW again here is an integrator running on this branch: versus master: Both are averaged across 200 trials, use the ReLU model, and employ a number of "tricks" (see #115 (comment)). Would be nice to have this in master so that my thesis doesn't need to reference a branch. 👍 (Code available upon request.) Willing to review if that helps. |
49801e2 to
3e54ea2
Compare
tbekolay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, except for a few things! I made a bunch of inline comments, most of which should be relatively trivial, but I think it would be good for @hunse to do the changes as I'm not sure what should happen with the commented out lines, and it'd be good to brush up on directives for future docstrings.
I was looking at the linked issue and thought the explanation in this comment helped me immensely in figuring out what this PR is doing. I'm not sure the best way to incorporate that explanation here. One thing that should definitely happen is adding a comment to u_infactor and v_infactor as knowing what those variables are helped me when reading through the diff. The rest could maybe be simplified and also made into code comments, though I'm not 100% sold on that ... it might be enough to link that that issue in the changelog?
Speaking of which, I think that this PR likely warrants a changelog entry. We haven't been doing them up to this point, but it's in our "definition of done" so now's as good a time as any to start.
| The decayed value is given by | ||
| sign(x) * floor(abs(x) * (2**bits - offset - decay) / 2**bits) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will render as a blockquote, which is probably not what you want. If you want it to show up like math, use the .. math:: directive; if you want it to look like code you can either end the previous line with :: or use the .. code-block:: directive; see https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we talk about standards for how we do equations? I find LaTeX great for stuff that will definitely be rendered, but these docstrings may either be rendered (for online) or just as strings (when reading the code, or calling help on a function).
For that reason, I think that we should stay away from the math directive, unless we have a really good reason for it in a particular situation. I'd opt for using :: and displaying stuff as code wherever possible, since this is clear in either raw or rendered format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having said that, I realize that some things are hard to do in code (or at least need a standard way of writing them), for example a subscript like x_{i-1}. Sums are another big one that I tend to write in LaTeX notation. So I'm not sure which way to go on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like it's mostly a case by case thing, sometimes the equation makes sense to be plain text and other times I prefer LaTeX. I feel like LaTeX notation doesn't look too bad in plaintext (I think that was one of its design goals) so it mostly depends on variable names and complexity. If they're short one-letter names and complicated math, I go LaTeX. If they're long textual names, and simpler math, I do plaintext. Definitely some tough calls along that spectrum, but I feel like we can always revisit those choices when they get rendered in documentation and it's easier to see how it reads on a page. If you want to talk more about this we can make an issue for it.
|
Comments addressed. |
|
I realized I forgot to address your original (not-inline) comment, so I added a few commits to do that too. Between the comment briefly describing The one thing I don't talk about is how to choose the initial value for computing the decay magnitude ( |
tbekolay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All comments addressed! Merging once I finish running tests on hardware.
d980aac to
bbfd500
Compare
bbfd500 to
d1284b4
Compare
The biggest improvement comes from the fact that we now account for the actual U decay being one more than the provided U decay.
The chip adds one to decayU, so take this into account.
When the wgtExp is small, we lose some of the least significant bits in the weights (due to how the chip implements the shift). Rather than just chopping these bits, round them before chopping so that weights are not biased to be smaller than expected.
This is especially important for long `tau_rc`, since more rounding occurs.
d1284b4 to
3cd5bac
Compare











Improving the accuracy of an integrator network. Fixes #114, namely an integrator using a recurrent weight connection.
The integrator network still does not work as expected when using a decoded (non-weight) connection (now documented in #125).