-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Caching field values #134
Comments
Thanks for the detailed RFC!
I'd say no, because this could lead to some very hard to diagnose bugs for a user who is unaware of these implementation details.
In general I'm usually in favor of having a bit of internal implementation ugliness as a way to support more intuitive/robust/harder-to-break behavior for the end user, so these options don't sound too bad to me. I'd probably go for the latter, since it sounds easiest. (Same reasoning as favoring giving up some cleanliness/minimalism in model code to enable larq/zoo#61, for example.) |
This is a very valid point. I guess then the question becomes which behaviour we actually want. We can also support both (we already have a keyword arg to |
Thanks for the detailed explanation! I didn't think of the CLI use case, but I also think it would be very confusing if we'd have different behaviours for default vs CLI values.
Can we just wrap the value into a One other approach would be to keep the current behaviour (
I thought about introducing something like a In general having @Field
def input_quantizer(self):
return lq.quantizers.SteSign(clip_value=1.25) behave like |
If we have a single mutable object then wrapping it inside a lambda won't solve the issue because each call to the lambda will return a reference to the same object. I think we would have to do the CLI string parsing inside the lambda. |
I completely agree. I think having |
Ah yeah makes sense, since we want to replicate this behaviour: class A:
@property
def foo(self):
return [1, 2, 3]
class B:
foo = [1, 2, 3]
a = A()
b = B()
assert a.foo == b.foo == [1, 2, 3]
a.foo.append(4)
b.foo.append(4)
assert a.foo == [1, 2, 3]
assert b.foo == [1, 2, 3, 4]
What about using |
Cool, I hadn't come across that before but it looks like it would work. Especially as the values that can be passed in through the CLI (i.e. parsed from a string) are a very small subset of possible Python objects, and no doubt exclude any 'difficult' ones which would be hard to clone. |
Okay, so the concrete proposal is that Are we all happy to move forward with this? |
Will the bahaviour of immutable values like numbers change depending on
👍 |
Unless we require the immutable defaults to be passed in with lambdas we can't really not cache immutable values, but I don't think that matters? |
I agree, just wanted to make sure that this applies only to values we pass via lambda functions. |
Makes sense, and yes to be clear this will apply to mutable defaults passed with lambda functions and we'll use the |
Apologies to come back to this, but I have some further questions after partially implementing this. It seems to me that behaviour could be quite confusing when you mix cached and not cached fields with parent/child components. @component
class Child:
foo: List[int] = Field(cache_value=False)
@component
class Parent:
child: Child = ComponentField(Child)
foo: List[int] = Field(lambda: [1, 2, 3], cache_value=True)
p = Parent()
configure(p, {})
assert p.foo == [1, 2, 3]
assert p.child.foo == [1, 2, 3]
p.foo.append(4)
assert p.foo == [1, 2, 3, 4]
assert p.child.foo == ??? # What should this be? It seems to me that either behaviour here would be confusing. We either don't respect value inheritence, or we don't respect the fact that the child field is The solution would probably be to disallow |
@AdamHillier What about not supporting |
I think we need to support caching values. For example, in Zoo it would be very weird (and potentially buggy) if a new optimizer was created for every access of Here is another Zoo example where having caching is important (because the default output directory includes the current time). I think in general there will be cases where you don't want to do expensive computation every time you access something. |
I agree, for the case of the optimizer it is easy to get around by assigning it to a temporary variable, but for other cases were you might want to access the same value from different methods it makes sense to still have this possibility. What do you think about to adhere to the way Python handles imutable attributes, by default? class Parent:
foo =[1, 2, 3]
class Child(Parent):
pass
c = Child()
p = Parent()
assert c.foo == [1, 2, 3]
assert p.foo == [1, 2, 3]
c.foo.append(4)
assert c.foo == [1, 2, 3, 4]
assert p.foo == [1, 2, 3, 4] |
I've gone back and forth on this several times now, API design is not easy :) We all agree that having the ability to cache values is important. The proposal is to also have the ability to have values which aren't cached. The problem then lies in how the two interact, especially when you mix in field value inheritence and CLI-overriden values. There are a lot of corner cases, some of which we can resolve by disallowing some combinations of behaviour (e.g. disallowing I now feel like the best course of action is probably to keep the existing behaviour, where we always cache values, and discourage the use of Regarding larq/zoo#148, I think we can go with @lgeiger's initial fix, in PR larq/zoo#149, and not use |
This sounds good to me. I've merged #149. I think maybe it'd be good to have some simple examples of how these Zookeeper concepts ( |
Yes, that's a good idea! There is an example experiment thing which I think is a good start, but we should make that more prominent and give more examples, you're right. |
Feature description and motivation
At the moment, field values on component instances behave much like instance attributes of generic Python class instances. One value exists per instance, and if it is mutable then an access after modification will return the same, modified value, e.g.:
We could change this behaviour so that field values instead behave much more like
@property
values, i.e. the value is not 'cached' on the instance and instead re-generated on every access. See discussion here for a motivation of this different behaviour: larq/zoo#148 (comment).Current implementation
For a full explanation of how components access field values, see the docstring of the
_wrap_getattribute
method incomponent.py
:zookeeper/zookeeper/core/component.py
Lines 116 to 144 in 2b8812a
New implementation
It would be straightforward to implement
@property
-esqe behaviour for default values which are passed into fields, as mutable default values are already generated from lambdas, and there's no issue with immutable default values being cached .However, it would be much more difficult to implement for values passed in through the CLI. Consider the configuration CLI argument
foo=[1, 2, 3]
. We receive this as a string, and parse it into a Python value (in this case a list) to be used as the value for the fieldfoo
. If we wanted to return a new instance of this list on each access offoo
, we would either need to be able to deep-clone generic mutable objects, or we would have to hold on to the configuration value as a string, and re-parse it into a Python value each time.It's an open question whether we are happy for the behaviour of default values vs cli-overriden values to be different.
The text was updated successfully, but these errors were encountered: