New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrator accuracy #124

Merged
merged 4 commits into from Jan 13, 2019

Conversation

Projects
None yet
5 participants
@hunse
Copy link
Contributor

hunse commented Oct 15, 2018

Improving the accuracy of an integrator network. Fixes #114, namely an integrator using a recurrent weight connection.

The integrator network still does not work as expected when using a decoded (non-weight) connection (now documented in #125).

@hunse

This comment has been minimized.

Copy link
Contributor Author

hunse commented Oct 15, 2018

integration

integration.pdf

Here's the test from #114 run on this branch.

@hunse hunse force-pushed the integrator-accuracy branch 2 times, most recently from 66bb8c1 to 6dbb4c7 Oct 15, 2018

@hunse hunse requested a review from tcstewar Oct 22, 2018

@hunse hunse referenced this pull request Oct 25, 2018

Merged

Interneuron improvements #132

@tcstewar tcstewar force-pushed the integrator-accuracy branch from 4a2f137 to 18ac0c7 Nov 1, 2018

@tcstewar
Copy link
Contributor

tcstewar left a comment

The code and the logic behind it seems solid to me. The main thing I did (other than rebasing it to master) is to do some exploration of what happens with different Ensemble parameters, to make sure we're not optimizing for one special case. Overall, I would say that this PR is a definitely improvement, but it does not overall fix #114 .

Here's some plots of running the same test, but varying n_neurons, max_rates, and intercepts. First, here's the current performance in master (with `target="sim"):

image

(The coloured lines are n_neurons=50, 100, and 300 and the error bars cover the three different input patterns tested: Cosine, Step, and Zero. intercept=X means intercepts=nengo.dists.Uniform(-X,X) and
max_rate=Y means max_rates=nengo.dists.Uniform(Y/2,Y))

Here's that same data with this PR:

image

So, this PR seems to be overall improving things, although for max_rate==400, intercepts==1, and n_neurons==50 or 100 we do actually get a bit worse. But my guess is that there are other problems in that case, so I'm happy with this PR's improvements.

However, that's all just on the simulator. Here's what that data looks like when target="chip". Note that for time reasons I've only run it for n_neurons=50 -- I'll run the other parameters asap and replace the chart:

image

image

So for this, it's less clear that it's an improvement. It seems to be a bit better, but the pattern is not what I was expecting. I'll run more examples to try to clarify things.

As a final point, it looks like the test_loihi_api.py::test_decay_magnitude test is failing:

>       assert np.all(relative_diff < 1e-6)
E       assert False
E        +  where False = <function all at 0x000001D9265329D8>(array([ 9.42821154e-01, -9.79198554e-04, -5.75119556e-04, -4.15444346e-04,\n       -3.31192540e-04, -2.736454
03e-04, -2...7,  1.11720048e-07,\n       -3.21961686e-07, -4.17173169e-07, -6.73085989e-07, -7.23017521e-08,\n       -4.78374927e-07]) < 1e-06)
E        +    where <function all at 0x000001D9265329D8> = np.all

I'm not sure whether this was happening before, or if it was caused by my rebasing (although the rebasing went pretty easily, so I don't think it was that). Was this test passing before?

@tcstewar tcstewar removed their assignment Nov 1, 2018

@hunse

This comment has been minimized.

Copy link
Contributor Author

hunse commented Nov 1, 2018

Two things: Why are you using intercepts=nengo.dists.Uniform(-X, X)? Didn't we decide that intercepts=nengo.dists.Uniform(-1, X) is almost always better (i.e. it's the high intercepts that cause problems, not the low ones)? I'd be interested to see those plots redone with that change.

As for the failing test, I'm not seeing that, and neither is Travis CI (I ran it a number of times, too). And it's not a chip thing, because that test doesn't run on the chip (it's just a unit test of a helper function). So maybe something about your Numpy is different, or there's some other difference on your machine? Though looking at your printout, it looks WAY off, so there must be a bug somewhere. I don't get why I'm not seeing it, though.

@tcstewar

This comment has been minimized.

Copy link
Contributor

tcstewar commented Nov 1, 2018

Why are you using intercepts=nengo.dists.Uniform(-X, X)? Didn't we decide that intercepts=nengo.dists.Uniform(-1, X) is almost always better (i.e. it's the high intercepts that cause problems, not the low ones)? I'd be interested to see those plots redone with that change.

Good point, but the test that you're running (from #114 ) is using intercepts=nengo.dists.Uniform(-0.5, 0.5), so I wanted to make sure I tested that same point. I'll try the (-1, X) too, just in case. :)

As for the failing test, I'm not seeing that, and neither is Travis CI (I ran it a number of times, too). And it's not a chip thing, because that test doesn't run on the chip (it's just a unit test of a helper function). So maybe something about your Numpy is different, or there's some other difference on your machine? Though looking at your printout, it looks WAY off, so there must be a bug somewhere. I don't get why I'm not seeing it, though.

Interesting.... I just tried it on the intel cloud and it also passes there. pip freeze claims that on both the intel cloud (where it passes) and on my laptop (where it fails), I'm on numpy==1.14.3. I am on Windows on my laptop, but that really shouldn't matter.... I'm on Python 3.6.1 on the laptop and the intel cloud is Python 3.5.2...

@hunse

This comment has been minimized.

Copy link
Contributor Author

hunse commented Nov 1, 2018

One thing I'm puzzled about is why there are such significant differences between the emulator and the chip. With max_rates=120 they look pretty similar (though even there, you can see a bit of evidence of the chip being noisier, e.g. with intercepts=0.5 and transform=1.05). But with higher firing rates, they're pretty different: On the emulator, transform=1.0 seems pretty clearly to be the best, whereas on the chip it looks like a slightly smaller transform (0.95) would be better. It's possible this has to do with noisy interneurons; I'd be curious to try this with #132 to see if that makes any difference. I assume that if you're using precompute=False, you've set the number of snip spikes high enough that nothing is getting dropped? (EDIT: or if precompute=True, maybe this is another symptom of things falling apart for larger networks)

Finally, the biggest thing that concerns me is that performance deteriorates a lot for higher numbers of neurons, both in emulator and in chip, before and after these changes. So that's obviously something for future work, but that seems like pretty important future work to me.

@tcstewar

This comment has been minimized.

Copy link
Contributor

tcstewar commented Nov 1, 2018

One thing I'm puzzled about is why there are such significant differences between the emulator and the chip. With max_rates=120 they look pretty similar (though even there, you can see a bit of evidence of the chip being noisier, e.g. with intercepts=0.5 and transform=1.05). But with higher firing rates, they're pretty different: On the emulator, transform=1.0 seems pretty clearly to be the best, whereas on the chip it looks like a slightly smaller transform (0.95) would be better.

One possibility is that it might be the same weirdness that's giving me the failing test. I'll re-run the simulator tests on the intel cloud and see what that looks like.

It's possible this has to do with noisy interneurons; I'd be curious to try this with #132 to see if that makes any difference.

I don't think there are interneurons in this model: the solver has weights=True

I assume that if you're using precompute=False, you've set the number of snip spikes high enough that nothing is getting dropped?

This is with precompute=True, for speed reasons (I was running a lot of sims...).... I can try without precompute too....

@tcstewar

This comment has been minimized.

Copy link
Contributor

tcstewar commented Nov 1, 2018

One possibility is that it might be the same weirdness that's giving me the failing test. I'll re-run the simulator tests on the intel cloud and see what that looks like.

Nope, that doesn't affect things -- I re-ran it all on the cloud and get exactly the same results (I've replaced the images above with the cloud-run data, just in case)

@drasmuss

This comment has been minimized.

Copy link
Contributor

drasmuss commented Nov 6, 2018

test_decay_magnitude also fails for me:

>       assert np.all(relative_diff < 1e-6)
E       assert False
E        +  where False = <function all at 0x0000014677028730>(array([ 9.71177396e-01, -1.08811599e-03, -6.13906343e-04, -4.32027241e-04,\n       -3.39373637e-04, -2.80564914e-04, -2...7, -3.82642914e-07,\n       -9.86391045e-08, -7.17645022e-07, -1.80609154e-07, -4.31690146e-07,\n       -1.07072338e-07]) < 1e-06)
E        +    where <function all at 0x0000014677028730> = np.all

That's on numpy=1.14.5, python=3.5.4

@tbekolay tbekolay force-pushed the master branch from 701f997 to 1fe237c Nov 11, 2018

@tcstewar

This comment has been minimized.

Copy link
Contributor

tcstewar commented Nov 12, 2018

test_decay_magnitude also fails for me:

I found the cause! Here's the line in the test:

s = np.sum(ys, axis=0)

that needs to be changed to

s = np.sum(ys, axis=0, dtype=np.int64)

(the sum is silently overflowing as on some machines it seems to default to using int32)

@arvoelke

This comment has been minimized.

Copy link

arvoelke commented Nov 12, 2018

(the sum is silently overflowing as on some machines it seems to default to using int32)

FYI, numpy/numpy#9464 -- Doesn't sound like there's any way to force int64 to be the default at the moment. :/

@hunse

This comment has been minimized.

Copy link
Contributor Author

hunse commented Nov 12, 2018

Well that's very good to know about that difference. I had assumed that np.array([1]) would always return an array of int64; I didn't realize it was machine-specific.

@tcstewar @drasmuss: can you check out 2eef211 (the new version of this branch) and make sure that fixes the test for you?

@drasmuss

This comment has been minimized.

Copy link
Contributor

drasmuss commented Nov 13, 2018

Test passes for me 👍

@hunse hunse force-pushed the integrator-accuracy branch from 2eef211 to f69d7aa Dec 6, 2018

@arvoelke

This comment has been minimized.

Copy link

arvoelke commented Dec 6, 2018

For what it's worth, this branch seems to dramatically improve (up to 2x improvement in accuracy versus master) the Delay Network for lower recurrent taus (e.g., 10ms). For larger recurrent taus (e.g., 100ms)---with the same readout tau as before---they both seem to perform the same (edit: not shown).

In all cases I'm using the emulator (@celiasmith), precompute=True, remove_passthrough=True, and the following settings (fewer spikes, broader intercepts, and no interneurons):

nengo.Ensemble.max_rates.default = nengo.dists.Uniform(100, 120)
nengo.Ensemble.intercepts.default = nengo.dists.Uniform(-1, 0.5)
solver = nengo.solvers.LstsqL2(reg=0.1, weights=True)

Master (v0.4.0 release):
loihi_master_delay_network

Branch (integrator-accuracy):
loihi_branch_delay_network

The inset in the top-right corner is the important part (plotting the error across the window). This is just a single trial, but they are both very consistent across trials.

@hunse

This comment has been minimized.

Copy link
Contributor Author

hunse commented Dec 6, 2018

For what it's worth, this branch seems to dramatically improve (up to 2x improvement in accuracy versus master) the Delay Network for lower recurrent taus (e.g., 10ms). For larger recurrent taus (e.g., 100ms)---with the same readout tau as before---they both seem to perform the same.

Looking at those little top-right corner plots (that's what we're talking about here, right?), it appears to me that it's the opposite. I assume the x-axis is the tau you're talking about here. It looks like for up to taus of around 50 ms (0.05 s), the plots are roughly the same. For example, at 0.05 s, they both appear to be around 0.15 NRMSE. It's for larger taus that there's a difference, for example at 0.1 s you see this 2x difference you were referring to (with the old one being around 0.3 NRMSE, and the new one around 0.15).

Or is the tau on the x-axis of those little plots the "readout tau", which I assume is the delay that the network is trying to mimic or read out? And then you did analyses like the one above for many different recurrent taus, but you're just showing us the analysis for a shorter one (around 5 to 10 ms)?

@arvoelke

This comment has been minimized.

Copy link

arvoelke commented Dec 6, 2018

The x-axis is the time point within the window. Both plots are generated by a single trial with a fixed tau. The state of the DN decodes the entire window, and then we're plotting the error (y-axis) across that window (x-axis).

I am showing the recurrent tau=10ms trial. The tau=100ms trial is not shown, but they both perform far worse and nearly-identically to each other.

@hunse

This comment has been minimized.

Copy link
Contributor Author

hunse commented Dec 6, 2018

So the results that you're talking about---that this PR improves things for shorter recurrent taus---is not demonstrated by the plots above, since those are just for a single recurrent tau. Is that correct?

When you look at the error for a particular recurrent tau, is that the integrated error over that whole window you're talking about (i.e. for any possible readout delay in the range 0 to 0.1 s)? Or is it just the error at the longest readout delay (0.1 s)?

@arvoelke

This comment has been minimized.

Copy link

arvoelke commented Dec 6, 2018

So the results that you're talking about---that this PR improves things for shorter recurrent taus---is not demonstrated by the plots above, since those are just for a single recurrent tau. Is that correct?

When you look at the error for a particular recurrent tau, is that the integrated error over that whole window you're talking about (i.e. for any possible readout delay in the range 0 to 0.1 s)?

That's right. The main thing I was pointing out is that the overall error (y-axis) is better by a factor of ~2x for this branch versus master.

We can talk about the longer taus but I think something much stranger is happening that could be somewhat orthogonal. I need to look into this. It could have to do with the spike generator on the input side in conjunction with the input being scaled by tau.

@arvoelke

This comment has been minimized.

Copy link

arvoelke commented Dec 6, 2018

Edit: Fixed this problem via #115 (comment).

This is what it looks like for the longer taus (on either branch):

nengo_loihi (tau=100ms):

loihi_delay_network

And this is what it's supposed to look like (simply changing nengo_loihi.Simulator to nengo.Simulator):

nengo (tau=100ms):

nengo_delay_network

Again I think it has something to do with the spike generator. Maybe #115 as well. Any hints/ideas would also be helpful! In either case I think this is mostly independent of the improvements of this branch.

@arvoelke

This comment has been minimized.

Copy link

arvoelke commented Dec 18, 2018

And FWIW again here is an integrator running on this branch:

integrator-accuracy

versus master:

integrator-master

Both are averaged across 200 trials, use the ReLU model, and employ a number of "tricks" (see #115 (comment)). Would be nice to have this in master so that my thesis doesn't need to reference a branch. 👍 (Code available upon request.) Willing to review if that helps.

@hunse hunse referenced this pull request Dec 21, 2018

Merged

Refactor files, emulator, api #159

7 of 7 tasks complete

@hunse hunse force-pushed the integrator-accuracy branch 2 times, most recently from 49801e2 to 3e54ea2 Jan 3, 2019

@tbekolay
Copy link
Member

tbekolay left a comment

LGTM, except for a few things! I made a bunch of inline comments, most of which should be relatively trivial, but I think it would be good for @hunse to do the changes as I'm not sure what should happen with the commented out lines, and it'd be good to brush up on directives for future docstrings.

I was looking at the linked issue and thought the explanation in this comment helped me immensely in figuring out what this PR is doing. I'm not sure the best way to incorporate that explanation here. One thing that should definitely happen is adding a comment to u_infactor and v_infactor as knowing what those variables are helped me when reading through the diff. The rest could maybe be simplified and also made into code comments, though I'm not 100% sold on that ... it might be enough to link that that issue in the changelog?

Speaking of which, I think that this PR likely warrants a changelog entry. We haven't been doing them up to this point, but it's in our "definition of done" so now's as good a time as any to start.

The decayed value is given by
sign(x) * floor(abs(x) * (2**bits - offset - decay) / 2**bits)

This comment has been minimized.

@tbekolay

tbekolay Jan 11, 2019

Member

This will render as a blockquote, which is probably not what you want. If you want it to show up like math, use the .. math:: directive; if you want it to look like code you can either end the previous line with :: or use the .. code-block:: directive; see https://www.sphinx-doc.org/en/master/usage/restructuredtext/directives.html

This comment has been minimized.

@hunse

hunse Jan 11, 2019

Author Contributor

Can we talk about standards for how we do equations? I find LaTeX great for stuff that will definitely be rendered, but these docstrings may either be rendered (for online) or just as strings (when reading the code, or calling help on a function).

For that reason, I think that we should stay away from the math directive, unless we have a really good reason for it in a particular situation. I'd opt for using :: and displaying stuff as code wherever possible, since this is clear in either raw or rendered format.

This comment has been minimized.

@hunse

hunse Jan 11, 2019

Author Contributor

Having said that, I realize that some things are hard to do in code (or at least need a standard way of writing them), for example a subscript like x_{i-1}. Sums are another big one that I tend to write in LaTeX notation. So I'm not sure which way to go on this.

This comment has been minimized.

@tbekolay

tbekolay Jan 13, 2019

Member

I feel like it's mostly a case by case thing, sometimes the equation makes sense to be plain text and other times I prefer LaTeX. I feel like LaTeX notation doesn't look too bad in plaintext (I think that was one of its design goals) so it mostly depends on variable names and complexity. If they're short one-letter names and complicated math, I go LaTeX. If they're long textual names, and simpler math, I do plaintext. Definitely some tough calls along that spectrum, but I feel like we can always revisit those choices when they get rendered in documentation and it's easier to see how it reads on a page. If you want to talk more about this we can make an issue for it.

Show resolved Hide resolved nengo_loihi/loihi_api.py Outdated
Show resolved Hide resolved nengo_loihi/loihi_api.py Outdated
Show resolved Hide resolved nengo_loihi/loihi_api.py Outdated
Show resolved Hide resolved nengo_loihi/loihi_api.py
Show resolved Hide resolved nengo_loihi/tests/test_loihi_api.py Outdated
Show resolved Hide resolved nengo_loihi/tests/test_loihi_api.py Outdated
Show resolved Hide resolved nengo_loihi/tests/test_loihi_api.py Outdated
Show resolved Hide resolved nengo_loihi/tests/test_loihi_api.py Outdated
Show resolved Hide resolved nengo_loihi/tests/test_loihi_api.py Outdated

Code's changed

@hunse

This comment has been minimized.

Copy link
Contributor Author

hunse commented Jan 11, 2019

Comments addressed.

@hunse

This comment has been minimized.

Copy link
Contributor Author

hunse commented Jan 11, 2019

I realized I forgot to address your original (not-inline) comment, so I added a few commits to do that too. Between the comment briefly describing u_factor and v_factor when they're declared, and the addition to the decay_magnitude docstring, I think we've got all the key points of my comment at: #114 (comment)

The one thing I don't talk about is how to choose the initial value for computing the decay magnitude (x0). I didn't really have a principled way to do this, so there's not really much to say. The most I can think of is a note somewhere that this is a thing and we could look into it more at some point. But a) I don't think it's particularly significant, b) there are lots of things to look into in the code, and c) if this does need revisiting, I expect it will be because of some problem (a particular model is not as accurate as we'd like) and that will inspire us to revisit it.

@tbekolay
Copy link
Member

tbekolay left a comment

All comments addressed! Merging once I finish running tests on hardware.

@tbekolay tbekolay force-pushed the integrator-accuracy branch from d980aac to bbfd500 Jan 13, 2019

@tbekolay tbekolay referenced this pull request Jan 13, 2019

Open

Revisit xfailed tests #166

@tbekolay tbekolay force-pushed the integrator-accuracy branch from bbfd500 to d1284b4 Jan 13, 2019

hunse added some commits Oct 13, 2018

More accurate computation of decay magnitude
The biggest improvement comes from the fact that we now account
for the actual U decay being one more than the provided U decay.
Match decayU to the desired decay better
The chip adds one to decayU, so take this into account.
Round weights before losing bits
When the wgtExp is small, we lose some of the least significant
bits in the weights (due to how the chip implements the shift).
Rather than just chopping these bits, round them before chopping
so that weights are not biased to be smaller than expected.
Discretize tau_rc in loihi_lif_rates
This is especially important for long `tau_rc`, since more
rounding occurs.

@tbekolay tbekolay force-pushed the integrator-accuracy branch from d1284b4 to 3cd5bac Jan 13, 2019

@tbekolay tbekolay merged commit 3cd5bac into master Jan 13, 2019

1 of 2 checks passed

codecov/patch 98.71% of diff hit (target 100%)
Details
codecov/project 79.39% (+0.2%) compared to 3f0d820
Details

@tbekolay tbekolay deleted the integrator-accuracy branch Jan 13, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment