Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

possible to copy already-computed memoized fields in build()? #774

Open
dave-doty opened this issue Dec 28, 2019 · 9 comments
Open

possible to copy already-computed memoized fields in build()? #774

dave-doty opened this issue Dec 28, 2019 · 9 comments
Assignees

Comments

@dave-doty
Copy link

dave-doty commented Dec 28, 2019

While profiling my application, I noticed that a supposedly memoized field was being recomputed over and over, even though the object's fields didn't change. However, the object itself changed through a call to replace (with identical field values as before).

The source of it seems to be that build does not copy the memoized fields.

I understand the danger here. Suppose we have

abstract class Value implements Built<Value, ValueBuilder> {
  int get field;
  
  @memoized
  int get derived => field*2;
  
  // boilerplate
}
var value = Value(field: 3);
print(value.derived); // prints 6
var builder = value.toBuilder();
builder.field = 30;
value = builder.build();
print(value.derived); // should print 60

If build naively copied the field derived, then it would still be the value 6 at the last print statement, even though we want it to be recomputed to be 60 since the non-derived field has changed.

Nonetheless, it would be great if the ValueBuilder class could detect whether any of its fields had changed since it was created, and if not, then any derived fields that have already been computed could be copied in build to avoid needing to re-compute them.

One way to do this would be to have build copy all of the derived fields by default, but for the writable field setters, on the first write to any of them, to set all the underlying derived fields to null. (in the example above, this would be the field named __derived in the generated .g.dart file). Or to be a bit more efficient, add to each Builder class a single _dirty field that's initially false, and is set to true when any setter is called, and build() only copies memoized fields into the new Built object if _dirty == false.

@dave-doty
Copy link
Author

dave-doty commented Dec 28, 2019

Even better, one could imagine an implementation where this:

var val = Value(field: 3);
var val2 = val.rebuild((b) => b..field=3);

results in identical(val, val2) being true. More generally, the following code would also result in only a single Value object being created:

var val = Value(field: 3);
var builder = val.toBuilder();
builder.field = 3;
var val2 = builder.build();

The idea is the same: toBuilder() marks the field Builder._dirty as false, any setter on the Builder sets it to true if the new value is different, and build() returns the original underlying Built object (the one referenced by _$v in the implemented Builder class) if and only if _dirty is false.

There is one question of efficiency: in each of the Builder's setters, would we want to pay the cost of doing a full == comparison to determine if the new value is different? This could be expensive. Whether to do this full equality comparison could be an option specified in toBuilder and the auto-generated Builder constructor (stored in a Builder field bool _compareOnSet). But it would cost almost nothing to at least default to doing an identical comparison, i.e.,

set field(int value) {
  if (identical(_$this._field, value) || (_compareOnSet && _$this._field == value)) {
    return;
  }
  _dirty = true;
  _$this._field = value;
}

Even if _compareOnSet defaults to false, in many circumstances having a setter implemented as above would avoid duplicating objects that are structurally == to old objects, so identical(_$this._field, value) would end up being true , which would recursively imply that this object itself won't be duplicated either.

This would essentially get all the benefits of intern()'ing (#750) in the common special case where a duplicate object is created directly from the old one that it is structurally == to. (It would not, however, avoid duplication when going through a sequence of different values, e.g., val.rebuild((b) => b..field=3).rebuild((b) => b..field=4).rebuild((b) => b..field=3))

Looking at the generated file .g.dart file, my understanding is that currently, the only optimization is that if I create a Builder from a Built object via var builder = val.toBuilder(); and then don't access any of the fields of builder (not even to read them with a getter), then the same Built object val will be returned from builder.build(). Any access of the fields accesses the _$this getter, which in turn nulls the field _$v$, which causes a new Built object to be created when calling build().

@dave-doty dave-doty changed the title possible to copy already-computed memoized fields for replace/build? possible to copy already-computed memoized fields in build? Dec 29, 2019
@dave-doty dave-doty changed the title possible to copy already-computed memoized fields in build? possible to copy already-computed memoized fields in build()? Dec 29, 2019
@davidmorgan davidmorgan self-assigned this Dec 30, 2019
@davidmorgan
Copy link
Collaborator

Thanks.

It might make sense to have the builder keep the reference to the old value, then compare after building whether the results are the same.

But, we have to be careful. For example, it's possible to have fields that are not included in the comparison. With this change, it would no longer be possible to update these fields unless you update some other field at the same time.

I'll have a think about this next time I look at implementing a batch of features.

@dave-doty
Copy link
Author

For example, it's possible to have fields that are not included in the comparison. With this change, it would no longer be possible to update these fields unless you update some other field at the same time.

Ah, I see the problem.

Nevertheless, the identical test should be safe in this regard

set field(int value) {
  if (identical(_$this._field, value)) {
    return;
  }
  _dirty = true;
  _$this._field = value;
}

And then use the value of _dirty (rather than the null-ness of _$v) as your check in build():

  _$Value build() {
    _$Value _$result;
    try {
      _$result = _dirty
        ? new _$Value._(
              field: field,
              //...
          )
        : _$v;
    } catch (_) {
      String _$failedField;
      // ...
      } catch (e) {
        throw new BuiltValueNestedFieldError(
            'Value', _$failedField, e.toString());
      }
      rethrow;
    }
    replace(_$result);
    return _$result;
  }
}

In other words, create a new Value object if and only if some field was set to a value/object not identical to its previous value. (including even fields that are not included in an == comparison between two Value instances)

@davidmorgan
Copy link
Collaborator

Nested builders will be more complicated; those are always new, so identical won't help. I think we have to keep the old instance around and check when building, either with identical or with ==, to improve things there.

@dave-doty
Copy link
Author

@davidmorgan For what it's worth, I just encountered another reason not to use the reading of _$this as a "dirty bit". I was trying to debug some code today and had a Heisenbug where printing the values in some builders caused an error to show up. It was because the print statement accessed a Builder field, triggering the _$this getter to change the Builder's internal state. It's always surprising when simply printing a value causes the state of the program to change.

@davidmorgan
Copy link
Collaborator

Yes, nested builders are great for usability but awkward for correctness :)

@dave-doty
Copy link
Author

dave-doty commented Jan 3, 2020

Is there a reason to null out the internal _$v reference whenever a getter is accessed? It seems that one could allow _$v to reference the original Built object assigned in replace until the first setter is called, i.e., implement getters and setters like this:

int _field;
int get field =>_$v == null ? _field : _$v.field;
set field(int newField) {
  if (_$v == null) {
    _field = newField;
  } else {
    _field = new_field;
    _field2 = _$v.field2;
    _field3 = _$v.field3;
    _fieldNestedBuilder = _$v?._fieldNestedBuilder.toBuilder();
    // etc...
    _$v = null;
  }
}

Update:
Ah, I think I see the problem. If the field accessed through a getter is itself a nested Builder, then since Builder objects are mutable, it is possible to change its value through the getter (e.g., myValueBuilder.nestedBuilder.field = 7;, which as far as myValueBuilder is concerned just looks like a getter access). So that's why you declare the top-level builder "dirty" (by setting _$v to null) on any getter access.

Still, it feels tantalizing close to there being an efficient way to avoid creating new Built objects unless they represent actual new values. This would be great for things like Redux, where you have very frequent updates to a large state tree, most of which leave most of the tree alone in terms of what value it represents, but as currently implemented in built_value, causes brand-new objects to be generated on each update.

@dave-doty
Copy link
Author

I think the following logic might make sense to allow more eager caching of Built objects, but I'm sure I could be missing something.

Each Builder maintains an internal dirty bit _dirty, which depends only on the non-Builder fields in the way I suggested above (doesn't change on getter access to those fields, and changes on setter access only when the new value is not identical to the old, with a user-specifiable option to allow this to change to an == test, which is more time-expensive but more likely to successfully cache).

The Builder has a public method dirty(), which is defined as this._dirty || nestedBuilder1.dirty() || nestedBuilder2.dirty() || .... The value of this.dirty() during build() determines whether to create a new instance of _$Value or just return the original Value instance referenced by _$v.

One way this breaks is that built_value allows non-immutable objects to be contained in Built values, e.g., it is permitted to store a mutable List<int> as a field, instead of enforcing BuiltList<int> as the type. Such objects can be mutated, but there's no dirty() method to alert the parent Builder to this, so it would have to resort to a more expensive == comparison to determine the value of dirty(). I'd also think it would be great to allow a user-specifiable option to disallow this, which enforces that the only field types allowed are known immutable types such as int or String, as well as other Built instances and instances from built_collection.

I feel like I'm possibly re-inventing the wheel here; I wonder if there is a language that supports true language-level optional immutability, and if so, how it's implemented. There's functional languages like Haskell that force everything to be immutable (no optional mutability), and many languages have something like final to declare a reference cannot be changed, but do nothing to enforce immutability of the object being referenced (not true immutability). In some sense built_value is an implementation of the latter, since it allows fields to reference mutable types such as List.

@davidmorgan
Copy link
Collaborator

Allowing List is a bug; it used to be disallowed but an analyzer change broke that check and unfortunately it was a while before I noticed. I plan to re-enable the check again but I haven't had time to do it--it's hard because it's a breaking change.

Other than that though I've taken the approach that it's more useful to allow types we're not sure about, on the understanding that people might supply their own immutable types, and should be aware that things will break if they are mutable :)

Anyway, I hope we can improve things here--not sure when I'll get to it, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants