Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deterministic undocumented Node behaviour #716

Closed
xchoo opened this issue Apr 29, 2015 · 17 comments · Fixed by #722
Closed

Non-deterministic undocumented Node behaviour #716

xchoo opened this issue Apr 29, 2015 · 17 comments · Fixed by #722
Milestone

Comments

@xchoo
Copy link
Member

xchoo commented Apr 29, 2015

For nodes with a size_in > 0, storing a copy of the input to the function can lead to unexpected behaviour.

This code illustrates the problem:

import numpy as np
import matplotlib.pyplot as plt

import nengo
from nengo.utils.functions import piecewise


class DirectBuffer(object):
    def __init__(self, dim=1):
        self.x = np.zeros(dim)

    def set_x(self, t, x):
        self.x = x              # Causes a problem
        # self.x = np.array(x)    # Causes no problems

    def get_x(self, t):
        return self.x


with nengo.Network() as model:
    stim = nengo.Node(piecewise({0: 1, 0.5: 0.5, 1: -1}))

    db = DirectBuffer(1)
    n1 = nengo.Node(output=db.set_x, size_in=1)
    n2 = nengo.Node(output=db.get_x)

    nengo.Connection(stim, n1, synapse=0.005)

    p1 = nengo.Probe(stim)
    p2 = nengo.Probe(n2)

for r in range(10):
    sim = nengo.Simulator(model)
    sim.run(1.5)

    plt.subplot(2, 1, 1)
    plt.plot(sim.trange(), sim.data[p1])
    plt.ylim([-1.05, 1.05])
    plt.title('stim')
    plt.subplot(2, 1, 2)
    plt.plot(sim.trange(), sim.data[p2])
    plt.ylim([-1.05, 1.05])
    plt.title('out')
    plt.show()

In the ideal case, the output of n2 matches the signal generated by stim. However, in a majority of the runs, the output of n2 is 0.

@hunse
Copy link
Collaborator

hunse commented Apr 29, 2015

This seems like expected behaviour. When you do self.x = x you're making self.x point to x, so the original buffer that you allocate is lost, and you're left with a view onto some internal simulator signal.

I think what you want is self.x[...] = x.

@tcstewar
Copy link
Contributor

This seems like expected behaviour.

Yup. Or, at least, expected if we have giant documentation saying NEVER EVER EVER KEEP THE VARIABLE PASSED IN TO A NODE. (in other words, if you want to keep it, then make a copy).

Part of me wants to just change things so we always pass in a copy to the Node, as I keep running into this confusion. But then part of me cringes at the overhead that causes.

@hunse
Copy link
Collaborator

hunse commented Apr 29, 2015

Once the .nengorc file is a little more developed, it might not be a bad idea to have the default be to pass in a copy, and advanced users can turn this off for all their models if they want. Or we could just do it with the config system.

I'm guessing that there won't be much cost associated with the copy for all but the largest models (and even then, copies are really cheap compared to any other operation). Actually, the cost is a lot greater if there are a lot of small copies, since there's some Python overhead with making a new array or view onto an existing array (we avoid this by just making the view once, during the build).

@xchoo
Copy link
Member Author

xchoo commented Apr 29, 2015

Yup. It is expected behaviour, but very un-intuitive behaviour (you have to know how the internals of the simulator work to understand why this behaviour happens).

@hunse
Copy link
Collaborator

hunse commented Apr 29, 2015

I don't think it's necessarily unintuitive. If I receive an object as a parameter to the function, I would assume that that object is being passed by reference, not by value, and thus could change later. In fact, you seem to have had this view in your original code, too, since you allocated memory to store the incoming value. You just forgot to copy the data into the buffer you had created.

@xchoo
Copy link
Member Author

xchoo commented Apr 29, 2015

Well, the code posted was after an evening of debugging the odd behaviour from @Seanny123 code. What made the problem even more un-intuitive was that he was using 1D signals (so you are expecting a float, but you don't get a float). (Oh. and the constructor should have been self.x = 0 haha. I was testing other things as well XD)

@tbekolay tbekolay changed the title Non-deterministic undocumented nengo.Node behaviour Non-deterministic undocumented nengo.Node behaviour Apr 29, 2015
@tbekolay tbekolay changed the title Non-deterministic undocumented nengo.Node behaviour Non-deterministic undocumented Node behaviour Apr 29, 2015
@hunse
Copy link
Collaborator

hunse commented Apr 29, 2015

Yeah, the 1D case is probably the least intuitive, because then you might expect to get a float instead of a 1D array, but it also bugs me when software does stuff like that and you have to write special case code for the 1D case versus the N-D case.

Here's the original issue #626 and the fix #628. For some reason, despite most people leaning towards copying, I ended up doing a readonly view. @drasmuss is just so convincing! So maybe we should just go implement the copy like we talked about? My results at the time showed it didn't seem like much overhead.

@jgosmann
Copy link
Collaborator

It would be nice to have some numbers how much more overhead it is.

It's definitely quite hard to spot the problem. Even after realizing that it has to be either a race condition or shared memory it didn't occur to me why the memory was shared.

@xchoo
Copy link
Member Author

xchoo commented Apr 29, 2015

Oh! There's that issue. I was trying to find that. 😄
I remember reading about that issue, and that was how I figured out what the problem was. haha

@hunse
Copy link
Collaborator

hunse commented Apr 29, 2015

See #626 for my timing results. But remember that in my solution, I figured out that if we're using a view, we don't have to re-make it each time step, so there's no overhead. But the results do show how much it costs to copy arrays of different sizes (and it's pretty cheap).

How much it affects an actual network will depend on the network, of course, but I'm sure that as soon as you have any neural computations in the model, the cost of copying will be greatly overshadowed by the costs of encoder and decoder dot products, and the like. Generally, copying seems to be measured on the order of microseconds, whereas dot products are on the order of milliseconds.

@jgosmann
Copy link
Collaborator

Interestingly, a copy seems to be cheaper than a RO view for n < 1000. Does anyone use nodes with such a high-dimensional input a lot? It seems usually the input would be much smaller. Also, I agree that nodes probably won't be the main bottle neck in most simulations.

As the saying goes developer time is more expensive than computation time. So I'm in favor of copies. We can still add an rc option for the experts who need every little bit of speed.

@tcstewar
Copy link
Contributor

As the saying goes developer time is more expensive than computation time. So I'm in favor of copies.

+1 from me as well, as long as that cost is as minimal as it looks.

@xchoo xchoo closed this as completed Apr 29, 2015
@xchoo xchoo reopened this Apr 29, 2015
@hunse
Copy link
Collaborator

hunse commented Apr 29, 2015

We can still add an rc option for the experts who need every little bit of speed.

Ahh, but is it worth your developer time to add that option 😄

@xchoo
Copy link
Member Author

xchoo commented Apr 29, 2015

Oops! Wrong button. 😃

@drasmuss
Copy link
Member

I don't think I had any preference for readonly vs copying, I'd just do whichever one is faster.

@Seanny123
Copy link
Contributor

+1 for copying (if you have to write an all-caps warning in documentation, it's usually a sign it's a bad feature), +1 for writing it down (but not implementing) read-only options as something to implement someday in nengorc and instead just spending time on making Nengo OCL faster

@jgosmann jgosmann self-assigned this May 11, 2015
@jgosmann jgosmann added this to the 2.1.0 release milestone May 11, 2015
jgosmann added a commit that referenced this issue May 27, 2015
@jgosmann jgosmann removed their assignment May 27, 2015
@hunse
Copy link
Collaborator

hunse commented May 27, 2015

Interestingly, a copy seems to be cheaper than a RO view for n < 1000.

In practice, though, we don't have to make the view every timestep (just at the beginning), whereas we have to make the copy every timestep. So the copy is more expensive. But still pretty cheap in the grand scheme of things.

Another benefit to the copy is that it makes the reference implementation consistent with backends, which will typically be copying off a device to pass to a node function and thus passing a copy.

jgosmann added a commit that referenced this issue May 27, 2015
jgosmann added a commit that referenced this issue May 28, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

6 participants