Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve product network accuracy. #651

Merged
merged 5 commits into from
Aug 18, 2015
Merged

Improve product network accuracy. #651

merged 5 commits into from
Aug 18, 2015

Conversation

jgosmann
Copy link
Collaborator

@jgosmann jgosmann commented Feb 9, 2015

More precise product network based on the results of my technical report.

I am also working on a benchmark to include in Nengo based on PR #647.

@hunse
Copy link
Collaborator

hunse commented Feb 12, 2015

So this actually provides an improvement over using diagonal encoders? I had tried this once, and found it was better than random encoders, but worse than diagonal encoders. My explanation was that diagonal encoders basically force this kind of squaring multiplication anyway, but rather than computing the square terms individually and then taking the difference, it computes the difference directly, which allows decoders to be chosen so that errors in one square term cancel errors in the other square term.

@jgosmann
Copy link
Collaborator Author

From my report linked above:

diagonal encoders vs. alternative network:
Improvement by 8% (p < 0.001).

RMSE values for diagonal encoders:

mean RMSE: 0.0481133982306
median RMSE: 0.0477134127208
variance of RMSE: 2.244052918e-05

RMSE values for alternative network:

mean RMSE: 0.0444385504631
median RMSE: 0.0445897354704
variance of RMSE: 1.59544830317e-05

Sample size: 50

Definitely not a huge difference and to detect in, I believe, it is necessary to ensure that the benchmark actually covers the complete input space (it is easy to not include the corners which actually have the largest error contribution).

@hunse
Copy link
Collaborator

hunse commented Feb 12, 2015

I think you're right: It matters a lot what your input is. I was probably doing something like that differently. I'm still curious why computing the squares separately is better; I was wondering also if it could have to do with intercepts.

A couple minor comments on the notebook: I would report standard deviations instead of variances, since they're in the same units as means. Also, you could consider using relative RMSEs, i.e. the RMS of the error divided by the RMS of the correct output. I find it makes the RMSE a bit easier to interpret. Finally, make sure all the scales in the final spiking plots are the same. Right now, it's hard to tell if the "alternative" network actually results in less noisy outputs than the diagonal encoders, or if this is just because it has a larger scale.

@drasmuss
Copy link
Member

Also re: the notebook, in the alternative network benchmark you end up having effectively double the evaluation points, right? Did you compare the accuracy if you halve the evaluation points in each population so the total is the same as the simple network?

@hunse
Copy link
Collaborator

hunse commented Feb 12, 2015

There's an explanation in the notebook as to why @jgosmann does not halve the evaluation points. To summarize, it's because in the simple network you have one 2D population, so the N evaluation points get projected onto each dimension, meaning you have N evaluation points in each dimension. So it makes sense to keep N evaluation points in each dimension (now read population) in the alternative network.

@drasmuss
Copy link
Member

Ah yes that makes sense 🌈

@jgosmann
Copy link
Collaborator Author

@hunse That are good points. But I am not sure if I understand how to calculate the relative RMSE (and therefore whether it makes sense): If I divide by the RMS of the correct output, wouldn't that give extremly large errors for values close to zero?

@jgosmann
Copy link
Collaborator Author

Updated the notebook (use standard deviaton and use the same scale in all plots).

@hunse
Copy link
Collaborator

hunse commented Feb 12, 2015

I wasn't suggesting you compute the relative RMSE on a term-by-term basis; if you did that you would have the problem you described. Compute the RMSE exactly how you did, but then divide the final number by the RMS of the correct signal. Does that make sense?

@jgosmann
Copy link
Collaborator Author

Version of the notebook also showing the relative RMSE. (not sure yet if I'll merge it into master)

@hunse hunse force-pushed the better-product branch 2 times, most recently from 6c9d9cd to 0c80331 Compare February 18, 2015 22:40
@jgosmann
Copy link
Collaborator Author

Based on PR #647 and #657 I implemented a benchmark for the product network in PR #658.

Suprisingly it shows a much larger improvement than I expected:
Multiplication improvement by 0.310131 (84%, p < 0.001)

@hunse
Copy link
Collaborator

hunse commented Feb 23, 2015

My two concerns are 1) from a pedagogical point of view, this makes the Product code more opaque, and doesn't highlight the power of neurons to compute arbitrary nonlinear functions, and 2) it has not been well tested in all situations.

With regards to (1), I don't think this is a good reason not to have this new Product network, but maybe just take a few steps to make it clear that it's using a complex technique to get better results. We have a good notebook on basic multiplication (multiplication.ipynb), so maybe we can point to that in the docstring of Product. We could then also point to the tech report notebook in the same docstring as an in-depth look at the advanced techniques used in the new Product notebook.

With regards to (2), it would be good to test improvements on common networks that use Product, for example CircularConvolution. Also, if we have some larger models that make use of Product, it might be nice to test them, too, just to make sure that everything still works. The accuracy tests in the tech report notebook are great, but they do assume particular distributions of input values, and I just want to make sure that our models haven't made other assumptions that result in better performance of the old Product network.

If these are too much for this PR, we can always add the new product network beside the old one, so that everything will keep using the old one by default, but it's easy to get the new one.

@jgosmann
Copy link
Collaborator Author

I will soon rerun my spaopt tests which test circular convolution and dot products. It won't be much work to make another run testing the modified product network. Of course there are always some assumptions in writing a benchmark, but I think, the tech report uses the best default assumptions in case no additional information about the distribution of the factors is given. Those are also the right assumptions for the dot product and circular convolution. But sure, there might me models based on different assumptions which are optimized for the old product network.

@tbekolay tbekolay added this to the 2.1.0 release milestone Mar 3, 2015
@tbekolay
Copy link
Member

  • modify docstring to point to multiplication.ipynb
  • run benchmark from Product benchmark #713
  • add tutorial for making subnetwork with simple product

@jgosmann
Copy link
Collaborator Author

Circular convolution and dot product benchmarks submitted to computing cluster.

@jgosmann
Copy link
Collaborator Author

Benchmarks are done:

Legend

def = default/current implementation
spaopt = spaopt-v3 implementation (optimized radius), not part of this PR ⚠️

  • prod = using the improved product networs (this PR)
    Thus, compare def to def + prod for the change introduced with this PR.

The plots show the distribution of errors along the y-axis. If you turn the plot by 90 degrees it's like a probability distribution function.

Circular convolution

cconv

For the circular convolution the improvement with this PR is even better than the spaopt optimizations! 😮 Together, they give the best result.

Dot product

dot

The improvement is also clearly there for the dot product, though in this case spaopt has a larger effect.

Methods

Data are based on 20 simulations with different seeds, each run for 10 seconds with the first 0.5s dismissed.

@jgosmann
Copy link
Collaborator Author

Also ran the benchmark from #713. Here's the relevant line:

Multiplication improvement by 0.310131 (84%, p < 0.001)

@jgosmann
Copy link
Collaborator Author

I found a problem when setting n_neurons=1 which I fixed and I added a test for it. Is the usage of nengo.Direct() in that test safe regarding different backends or should this be done in some other way?

@jgosmann
Copy link
Collaborator Author

add tutorial for making subnetwork with simple product

There is examples/basic/multiplication.ipynb. Doesn't that suffice?

@tbekolay
Copy link
Member

Benchmarks are done

Nice, very impressive! I'm sold on using this as our Product implementation.

There is examples/basic/multiplication.ipynb. Doesn't that suffice?

I think what would need to be added is a short section at the end showing how you can put that model creation code in a short function in order to make a multiplication subnetwork. The notebook could also point to the Product network as an optimized way to do exactly the same thing (synergy!)

Also, I think the docstring should point to examples/basic/multiplication.ipynb in addition to the tech report.

@xchoo
Copy link
Member

xchoo commented Aug 12, 2015

So.. I may be a bit math slow (and the workbook kinda too a round-a-bout way of doing the derivation), but the function being computed by this new network is:

0.25 * (a+b)^2 - 0.25 * (a-b)^2
= 0.25 * a^2 + 0.5 * a * b + 0.25 * b^2 - (0.25 * a^2 - 0.5 * a * b + 0.25 * b^2)
= 0.25 * a^2 - 0.25 * a^2 + 0.25 * b^2 - 0.25 * b^2 + 0.5 * a * b + 0.5 * a * b
= a*b!

Neato.

@tcstewar
Copy link
Contributor

the function being computed by this new network is

Cute. And that also nicely explains why the diagonal encoders are the right choice in the normal version of the product network: you just need to be able to represent a+b and a-b.....

@hunse
Copy link
Collaborator

hunse commented Aug 13, 2015

Any idea why adding in spaopt increases the maximum error? Namely "spaopt+prod" has higher maximum error than "def+prod", especially in the middle case. Not a big concern, but I'm curious. Other than that, the results look pretty definitive to me. Also, did you make those plots with matplotlib?

@jgosmann
Copy link
Collaborator Author

Yes, spaopt decreases the radius so that the majority of values can be represented more accurately, but this implies that the error for a few rare vectors falls outside of the radius.

The plots were done with Seaborn.

@tbekolay
Copy link
Member

I just pushed some commits to add the subnetwork to the multiplication example, and a few style suggestions. Will merge on @jgosmann 's +1!

@jgosmann
Copy link
Collaborator Author

LGTM 🍰

@tbekolay tbekolay merged commit 54ac9ab into master Aug 18, 2015
@tbekolay tbekolay deleted the better-product branch August 18, 2015 19:03
@jgosmann
Copy link
Collaborator Author

We probably should have added a changelog entry for this, shouldn't we?

@tbekolay
Copy link
Member

Yeah, I made a branch for it, changelogs. I'll make a PR.
On Aug 21, 2015 5:48 PM, "Jan Gosmann" notifications@github.com wrote:

We probably should have added a changelog entry for this, shouldn't we?


Reply to this email directly or view it on GitHub
#651 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants