Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom solver without default args will break cache #1054

Closed
arvoelke opened this issue May 3, 2016 · 13 comments
Closed

Custom solver without default args will break cache #1054

arvoelke opened this issue May 3, 2016 · 13 comments

Comments

@arvoelke
Copy link
Contributor

arvoelke commented May 3, 2016

class CustomSolver(nengo.solvers.LstsqL2):
    def __call__(self, A, Y, *args, **kwargs):
        return super(CustomSolver, self).__call__(A, Y, *args, **kwargs)

with nengo.Network(seed=0) as model:
    a = nengo.Ensemble(10, 1)
    nengo.Connection(a, a, solver=CustomSolver())

nengo.Simulator(model)

This gives the following error:

    354             except TypeError:
    355                 args, _, _, defaults = inspect.getargspec(solver.__call__)
--> 356             args = args[-len(defaults):]
    357             if rng is None and 'rng' in args:
    358                 rng = defaults[args.index('rng')]

TypeError: object of type 'NoneType' has no len()

Changing the signature to (self, A, Y, rng=None, E=None) resolves this issue.

@arvoelke arvoelke added the bug label May 3, 2016
@arvoelke
Copy link
Contributor Author

arvoelke commented May 3, 2016

Also... there is a slight inherent problem with developing a custom solver when the cache is enabled: changing the method's implementation won't change the fingerprint (unless the attributes are changed as well). This means that someone trying to write their own solver might be confused when they keep getting the same results, despite changing their code.

This could also be hazardous with Nengo solvers if, say, a bugfix was made to one of the solvers. The cache would not pick up on the fix and continue using the previous incorrect results.

Perhaps cache misses should be forced when the solver is not within the Nengo directory (to alleviate the first problem), and the Nengo __version__ should be a part of the cache key (to alleviate the second problem)?

Obviously this is hard problem in general, since it's always possible to construct weird counter-examples, but I think we should aim to be slightly less aggressive with how we detect hits, since false-hits are way more "costly" in terms of user time than false-misses.

Also a way to force cache misses on specific connections (#1044) would be nice when all else fails. This might be necessary if the custom solver has attributes that are not picklable (e.g. a lambda function). Edit: Some sort of flag on the solver would also solve this (see discussion below).

@jgosmann jgosmann self-assigned this May 3, 2016
@hunse
Copy link
Collaborator

hunse commented May 3, 2016

changing the method's implementation won't change the fingerprint

This is a general problem with caches and development, and unfortunately I don't think there's much we could do about it. I guess we could have the cache disabled by default in the development version. Either way, I think it's good to make developers aware that the cache is there, and perhaps recommend developing with the cache off.

This could also be hazardous with Nengo solvers if, say, a bugfix was made to one of the solvers.

This is more worrying, but I think something like you suggested (incorporating the version in some way), should address it. Do we not do that right now? My inclination is that we should always be clearing cache files that don't match the current version, but I'm not sure how difficult that is. Actually, we should probably have the cache files organized with separate folders for each version.

@hunse
Copy link
Collaborator

hunse commented May 3, 2016

Perhaps cache misses should be forced when the solver is not within the Nengo directory

Oh, that's a really good point. Would there be any way for users to turn the cache on for their own solver, though?

EDIT: also, I wouldn't call them cache misses, I would just say don't check the cache at all.

@arvoelke
Copy link
Contributor Author

arvoelke commented May 3, 2016

Would there be any way for users to turn the cache on for their own solver, though?

There could be a flag that solvers have to explicitly set to True (or possibly their "version") in order to enable the cache on that connection. Inheriting from Solver would get False by default during development. Just a thought.

@hunse
Copy link
Collaborator

hunse commented May 3, 2016

I wonder if we could automatically make a solver "version" with inspect. Could be a bit magic and fragile, though.

@arvoelke
Copy link
Contributor Author

arvoelke commented May 3, 2016

Could be a bit magic and fragile, though.

I'm not necessarily advocating the following view, but if we don't trust our users then by default it should be disabled for their own solvers, even if their code hasn't changed at all. Because in extreme cases they may be manipulating global variables or even files outside the scope of the fingerprint. We can trust our own code enough to know this won't happen, but if the solver is outside the Nengo directory who knows...

@jgosmann
Copy link
Collaborator

jgosmann commented May 3, 2016

changing the method's implementation won't change the fingerprint

To solve this we could add the possibility for classes to implement a special method to overwrite how the fingerprint is created. This is similar to other things in Python like pickles where there is a default, but classes can implement __getstate__ to overwrite that. In fact, the fingerprint is based on a pickle, so we already have that possibility by implementing __getstate__ in the solver. The questions are:

  1. Do we want another method to be able to overwrite fingerprinting and pickling independent from each other or have a somewhat more obvious API? (Though one could argue that if it produces the same pickle it is essentially the same object and should produce the same fingerprint. But in case of a bugfix this behaviour might not be diserable.)
  2. Should we disable caching for non core Nengo solvers?
  3. Should all core solvers include some form of a version?

Regarding the original issue in the first post. I would argue that this should actually fail:

  1. The cache needs to know all the arguments. If they are not passed in it needs access to the defaults. The *args, **kwargs make it impossible to figure out what these defaults are.
  2. The cache requires a specific function signature including rng and E, so they can be put explicitely in the function signature.

The questions are:

  1. It would be possible to have further arguments set to default values and potentially these should be conisdered too for calculating the cache key?
  2. Should this fail with a better error message?
  3. Or should it fail silently and not cache the result?

@jgosmann
Copy link
Collaborator

jgosmann commented May 3, 2016

I wonder if we could automatically make a solver "version" with inspect. Could be a bit magic and fragile, though.

probably only if we access the code object which seems too fragile and not portable enough to me

@arvoelke
Copy link
Contributor Author

arvoelke commented May 3, 2016

For the second set of questions, it could skip the cache and spew a warning that links to the relevant documentation/issue?

Also it's still not immediately obvious to me why you need to know the defaults. Couldn't there be special default symbols for rng and E to distinguish this case when the defaults aren't actually known and they are not specified by the caller? This could result in redundantly storing the same result multiple times in the cache, but a few false-misses aren't that big of a deal? Or is there some technical reason this wouldn't work?

@jgosmann
Copy link
Collaborator

jgosmann commented May 3, 2016

If the defaults change, it could lead to loading wrong results. But that's sort of a problem we already have (but it might be even easier to overlook in this case).

Also, the code has to be rewritten to not pass in rng and E at all if they are not given, but I suppose that can be done.

@jgosmann
Copy link
Collaborator

The conclusions from the dev meeting seem to be:

  • Implement some sort of indicator to mark objects as cacheable or non-cacheable. The solver base class won't be cacheable, but our solvers deriving from it will be. If we ever need to make a bugfix to those we can extend them to include a version number, but for now we won't do that.
  • The solver function signature is simple enough to manually specify rng and E.

jgosmann added a commit that referenced this issue May 16, 2016
Addresses #1054.

Code can change and we can't detect it, so we allow only things that
are marked as supporting fingerprints to be cached and only mark
objects that should be stable as supporting this. Note that an object
marked as supporting fingerprints will retain this mark even if it
references objects that are not marked as supporting fingerprints
(because the fingerprint is created by pickling and we cannot hook into
that process).

This also removes the solver_fn from the fingerprint because it is not
needed in there. It should always be the same function which is part of
the builder (the solver itself is still part of the fingerprint of
course). This in turn allows to simplify the SolverMock in the test
code a little bit. (It actually is slightly misnamed as it mocks the
solver_fn and not a Solver.)
jgosmann added a commit that referenced this issue May 16, 2016
Addresses #1054.

Code can change and we can't detect it, so we allow only things that
are marked as supporting fingerprints to be cached and only mark
objects that should be stable as supporting this. Note that an object
marked as supporting fingerprints will retain this mark even if it
references objects that are not marked as supporting fingerprints
(because the fingerprint is created by pickling and we cannot hook into
that process).

This also removes the solver_fn from the fingerprint because it is not
needed in there. It should always be the same function which is part of
the builder (the solver itself is still part of the fingerprint of
course). This in turn allows to simplify the SolverMock in the test
code a little bit. (It actually is slightly misnamed as it mocks the
solver_fn and not a Solver.)
@jgosmann
Copy link
Collaborator

PR #1066 ensures that decoders for custom solvers (and neuron types!) are only cached when marked as supporting this.

This doesn't solve the original issue, but it seems the consens from the dev meeting was to require the use of (self, A, Y, rng=None, E=None) as function signature.

@jgosmann jgosmann removed their assignment May 16, 2016
tbekolay pushed a commit that referenced this issue May 17, 2016
Addresses #1054.

Code can change and we can't detect it, so we allow only things that
are marked as supporting fingerprints to be cached and only mark
objects that should be stable as supporting this. Note that an object
marked as supporting fingerprints will retain this mark even if it
references objects that are not marked as supporting fingerprints
(because the fingerprint is created by pickling and we cannot hook into
that process).

This also removes the solver_fn from the fingerprint because it is not
needed in there. It should always be the same function which is part of
the builder (the solver itself is still part of the fingerprint of
course). This in turn allows to simplify the SolverMock in the test
code a little bit. (It actually is slightly misnamed as it mocks the
solver_fn and not a Solver.)
drasmuss pushed a commit that referenced this issue Sep 2, 2016
Addresses #1054.

Code can change and we can't detect it, so we allow only things that
are marked as supporting fingerprints to be cached and only mark
objects that should be stable as supporting this.

This also removes the solver_fn from the fingerprint because it is not
needed in there. It should always be the same function which is part of
the builder (the solver itself is still part of the fingerprint of
course). This in turn allows to simplify the SolverMock in the test
code a little bit. (It actually is slightly misnamed as it mocks the
solver_fn and not a Solver.)

Co-authored by: Jan Gosmann <jgosmann@uwaterloo.ca>
Co-authored by: Daniel Rasmussen <daniel.rasmussen@appliedbrainresearch.com>
@drasmuss
Copy link
Member

drasmuss commented Sep 2, 2016

Fixed in #1068

@drasmuss drasmuss closed this as completed Sep 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

4 participants