Table::set() should not record transaction log entries for setting to the current value #2787

tgoyne · 2017-08-08T20:22:54Z

The end-user motivation for this is realm/realm-swift#3489. The short summary is that upserting a json blob into a Realm currently triggers notifications even if none of the values actually changed. This has also emerged as a problem for sync, as the redundant sets are still recorded in the transaction log and so use bandwidth and CPU time on all clients.

This is something we should do for 3.0. It's a breaking change, so it needs to happen now or not anytime soon, and I don't think anyone particularly wants the current behavior.

Conceptually this can be as simple as making Table::set() be:

template<typename T>
void Table::set(size_t col, size_t row, T value)
{
    if (get<T>(col, row) == value)
      return;
    // do set
}

For some column types it may be worth pushing the check in deeper to avoid redundant validation/bptree traversals/etc. should benchmarking indicate that this actually has meaningful overhead.

There's a few other ways we could handle this other than making changes in core, none of which I feel are viable.

Filter out sets to the current value in the notification code.

I took a stab at implementing this and it turned out to be very complicated and/or slow due to that the code processing the transaction log doesn't have access to the Realm in the state that it would be at the point where each instruction is handled, so a lot of additional state needs to be tracked to either support lookups there or to compare the fields afterwards. Also doesn't help the sync case at all.
Special-case it in the createOrUpdate logic.

This inherently imposes inconsistency in our API, which is something we should try to avoid.
Make self-assignments no-ops at the ObjectStore layer.

Currently not a practical option due to that not all property sets actually go through the object store, and mandating that bindings never call Table::set() directly would be a large project for the 3.0 timeframe.

The text was updated successfully, but these errors were encountered:

ironage · 2017-08-09T01:57:53Z

@simonask do you foresee any problems integrating this change with sync, or can we go ahead and implement?

astigsen · 2017-08-09T02:13:08Z

This is something that could potentially be very expensive performance wise, so we should make sure that we benchmark it and implement it at the right level.

It could potentially be implemented in the setter at the leaf level so that we don't have have to traverse the b-tree twice, but it is obviously only worth doing this if profiling shows it to be an actual performance problem.

simonask · 2017-08-09T05:59:26Z

For sync, it does create a behavioral change, because of "last write wins". This means that if a different client wrote to the field between the previous set instruction and the one you are throwing away, the other client will now "win" with this change.

By the way, we are implementing log compaction in Sync to address the same situation (redundant instructions), so the motivation to do this should be purely for Core performance. :-)

By the way, it would be fine from our perspective to skip writing to the database, but still emit the instruction (potentially reducing amount of CoW-memory), but also potentially not making any difference at all, since the commit log entry must still be written). I would love to see if that has any impact on benchmarks. :-)

tgoyne · 2017-08-09T15:00:25Z

It creates a behavioral change, but it is just as easy to think of scenarios where the current behavior is undesirable (e.g. user copies object out of Realm, modifies one field, copies it back in, and then overwrites a change made on a different client that they did not mean to overwrite) as scenarios where this would be worse.

I do not see how log compaction can reduce the size of the transaction logs uploaded by a client unless it does the same logic of checking if each instruction actually changed the value on the client locally before uploading, and doing that just sounds like a slower version of just not emitting the instruction in the first place.

simonask · 2017-08-09T19:52:54Z

That is certainly a scenario that would create a similar problem, but it's a lot more esoteric than setting any property at any point in the program. It's also very easy to explain any unexpected behavior.

Log compaction does not look at the current state of the realm, but scans a sequence of changesets to detect if early instructions are made redundant by later instructions in those changesets. This can run on the client or the server, but we haven't decided yet exactly when the right time is.

I believe that not emitting the instruction would be problematic, but that it is worth investigating if there is anything to be won by eliding the actual update — especially considering it is very likely that something else caused a changelog entry to be emitted in the same transaction (requiring a write in the history regardless, meaning eliminating one instruction makes no real difference), whereas a full CoW on the column is potentially expensive. I don't know if Array already performs this check though?

ironage · 2017-08-09T20:00:49Z

If the existing value is identical, Array::set returns early before CoW.

tgoyne · 2017-08-09T20:08:35Z

This change in functionality has been widely requested for years, both in the linked issue and pretty much every time the subject has come up when talking to users, and not because of write performance. Users are consistently surprised by the idea that we consider setting a property to its existing value to be a a modification of that property.

simonask · 2017-08-10T06:25:06Z

It seems part of the surprise stems from update notifications being triggered where oldValue == newValue. That should be trivial to address in the notification code (just don't trigger the notification in that case).

For the use case of applying updates to an object from JSON, it would make a lot of sense to have a specialized API for that, like -[RLMObject updateIfDifferent:(JSON*)].

The reason I'm concerned about doing it in Table::set() for every property update is that it becomes significantly harder to predict the behavior of the merge algorithm. There would no longer be a 1-to-1 correlation between Table-level API calls and emitted instructions.

finnschiermer · 2017-08-10T14:22:22Z

@astigsen From Cores perspective, the overarching question is whether we want this change to Sync semantics - And this depends on how Sync is used. What is your opinion?

simonask · 2017-08-10T14:27:58Z

For the record, I'm fine with the change if it's the right way to solve the issues, just as long as we do it with open eyes. Also, we probably have some unit tests that would need adjustment. :-)

tgoyne · 2017-08-10T17:03:21Z

It seems part of the surprise stems from update notifications being triggered where oldValue == newValue. That should be trivial to address in the notification code (just don't trigger the notification in that case).

I took a stab at implementing this and it turned out to be very complicated and/or slow due to that the code processing the transaction log doesn't have access to the Realm in the state that it would be at the point where each instruction is handled, so a lot of additional state needs to be tracked to either support lookups there or to compare the fields afterwards.

Adding a different function which does the thing people actually want would just be API bloat, and would have the same impact on sync. Explaining merge behavior to users in terms of Table::set() calls is not very useful, as that's already several abstraction layers away from what they're doing.

simonask · 2017-08-10T17:18:21Z

It does not have the same impact if it isn't the default behavior all the time.

I'm not sure how notifications are exposed in Cocoa, but if both newValue and oldValue are available at some point in the notification code, it should also be possible to use them to prevent the notification from being passed to user code.

I'm just trying to get across that changing Last-Write-Wins to Last-Write-Sort-Of-Wins isn't a trivial no-brainer change.

tgoyne · 2017-08-10T17:28:13Z

Collection notifications do not at any point keep track of the old or new values of anything or report that information to the user. The thing tracked is just which rows in the collection were modified (and inserted/removed).

With this change, sync merging is still Last-Write-Wins; we just don't perform a write at all when you set a field to the value it already has. I think this is a much easier to explain behavior than "sometimes we do a write when you aren't actually changing the value and sometimes we don't, depending on which function you called".

ironage · 2017-08-10T17:47:15Z

The notifications are generated by inspecting the replication logs right? Core could conceivably report both the old and new value in the "set" instruction to the Replication so that the notifications could just skip the instructions that set identical values. It would be a breaking change, but should solve the notification problem without affecting sync. Additionally, as Alexander points out, this change would suffer a (possibly significant) performance hit of always "getting" the previous value.

simonask · 2017-08-11T06:59:34Z

Collection notifications do not at any point keep track of the old or new values of anything or report that information to the user. The thing tracked is just which rows in the collection were modified (and inserted/removed).

Hm, okay - collections shouldn't be touched by Table::set(), though, right? Or am I misunderstanding the situation?

With this change, sync merging is still Last-Write-Wins; we just don't perform a write at all when you set a field to the value it already has. I think this is a much easier to explain behavior than "sometimes we do a write when you aren't actually changing the value and sometimes we don't, depending on which function you called".

The problem is that it is "the value it already has" is unpredictable, because it depends on when the client last happened to communicate with the server, sacrificing transactional consistency. You could see antipatterns arise where clients clear fields before setting them "to make sure" etc.

Forgive me, but "the behavior is different depending on which function you call" seems like a fairly straight-forward idea. ;-)

simonask · 2017-08-11T07:04:11Z

Example in pseudocode:

Client A at t=0:
{
    animals[0].breed = "dog";
    animals[0].diet = "carnivorous";
    commit();
}

Client A at t=2:
{
    animals[0].breed = "cat";
    animals[0].diet = "carnivorous";
    commit();
}

Client B at t=1; // <-- NB! Timestamp is between the previous two operations.
{
    animals[0].breed = "giraffe";
    animals[0].diet = "herbivorous";
    commit();
}

With this (contrived?) sequence of operations, you could end up with a carnivorous giraffe. :)

tgoyne · 2017-08-14T22:44:58Z

Another option which would make achieving the desired notification behavior simple without changing the sync results would be to add a instr_SetCurrent instruction which is used instead of instr_Set when the old and new values are identical. Sync would treat this identically to Set, while notifications would ignore it entirely.

Ideally we would also omit the value entirely from the SetCurrent instruction to help somewhat with the secondary goal of reducing the size of the transaction logs, but I don't know if sync knows enough about the state of the clients to be able to actually merge that.

bigfish24 · 2017-08-15T04:38:17Z

Just to chime in, but there is definitely a desire from sync perspective for the behavior described above. The original code I wrote for a customer over a year ago had to manually diff for changes.

This is a common pattern for any server integration with an API that can't provide diffs or push changes, and instead needs to be polled.

From this perspective though a specific API would be appropriate since this use-case is sync or server related to not create unnecessary transactions.

bigfish24 · 2017-08-15T04:56:27Z

Also, Eric filed this from a customer: https://github.com/realm/product/issues/376

bdash · 2017-08-15T05:06:33Z

Keep in mind that the primary issue we're trying to solve here is related to unnecessary notifications being generated when properties are set to their existing values, where virtually all clients of notifications want to suppress such notifications. Requiring that a different API be used for such modifications is hard to justify.

For the secondary issue of avoiding unnecessary transaction log growth with sync, any new API would still be vulnerable to the conflict resolution consistency issue that Simon mentioned in #2787 (comment). While the user would be explicitly opting in to such behavior by using a different API, it would still be challenging to explain to them the subtleties of the behavior they're opting in to.

The proposal Thomas made in #2787 (comment) seems like it may allow for the behavior we're after in both cases while also preserving sync's ability to handle conflicts in a predictable fashion. It'd be great to hear the sync team's thoughts on it.

bigfish24 · 2017-08-15T05:09:01Z

@bdash thanks for explanation, wasn't trying to push any sort of direction but just making sure the customer voice was added to the mix.

simonask · 2017-08-15T06:52:52Z

SetCurrent would be fine for Sync (or alternatively just a flag on the Set instruction), but keep in mind that the merge algorithm may still generate no-op Set instructions in some corner cases or multiple SetCurrent in a row.

Can we reiterate why intercepting no-op sets in the notification system isn't feasible? I heard the point about collections, but those are precisely not covered by Table::set().

tgoyne · 2017-08-15T14:31:26Z

Collection notifications report when fields on the objects in the collection are modified and not just when the set of objects contained by the collection changes, so they do care about Table::set().

simonask · 2017-08-16T06:08:55Z

Ah, gotcha, that was the missing piece, thanks. 👍

simonask · 2017-08-16T06:15:18Z

Then it seems to me that the most attractive idea is the SetCurrent instruction (or SetUnchanged or whatever we call it). Today, there is a boolean field indicating if the Set instruction is setting a default value. That could be extended to an enum indicating if the instruction is regular Set, SetDefault, or SetCurnret without incurring any overhead.

simonask · 2017-08-16T06:15:45Z

A way to opt out of SetCurrent would also be very nice.

bdash · 2017-08-16T06:29:14Z

Why would it be useful to opt out of it?

simonask · 2017-08-16T08:56:39Z

Because, as I have explained, it breaks certain semantics that we rely on.

I'm also not 100% convinced that we are seeing the full picture — we are getting requests for a different behavior from people who want that, but we do not have any idea how many people are actually relying on the current behavior for conflicts to resolve correctly.

I guess I'm also just fundamentally skeptical of the idea of involving 3 different teams to solve a problem that seems isolated to notifications. New merge rules, documenting new semantics, resolving issues, measuring the performance impact, coordinating the effort — it all adds up at a time that we at least would rather spend on more high-profile features.

simonask · 2017-08-16T08:57:24Z

But of course, if that is what is needed, and there really is no other good way to address the problem, it is what we must do. I just think we need to really, really consider alternatives.

finnschiermer · 2017-08-16T13:01:26Z

@simonask Are you saying that even if we go with a new setUnchanged instruction, sync may in certain situations have to emit an ordinary set instructions, so that users will still get unwanted notifications, just not as often as now?

simonask · 2017-08-16T13:04:48Z

After thinking about it a bit more, I think the merge algorithm should be able to generate the desired instructions in all cases — the main drawback being that work is required inside the merge algorithm in the first place…

bdash · 2017-08-21T17:45:00Z

From talking with Simon, while a SetUnchanged instruction should allow the correct merge semantics, there's not sufficient time at this point in the release cycle for sync to make the necessary changes to accommodate it given the other high priority work that the sync team is occupied with.

I think the right plan of action for now is to defer this from consideration for our upcoming releases, and revisit it when we have less time pressure.

skydivedan · 2019-03-07T03:26:09Z

Forgive me if I'm misinterpreting this, but I'm curious about this, in regards to
realm/realm-swift#3489

It looks like the problem was solved over here in realm-java realm/realm-java#6224

Is there any effort -- or is it even possible -- to bring what was learned in realm-java over to realm-cocoa? Is there another ticket for that?

tgoyne · 2019-03-07T05:51:08Z

I started working on exposing that in Cocoa a while back but haven't had time to finish it off. It's still near the top of my backlog.

msbouchedid · 2020-07-06T17:39:17Z

Any update on where this landed? Running into same issue.

tgoyne · 2020-07-07T16:00:14Z

This was released in Cocoa 3.16.0.

tgoyne added Breaking T-Enhancement labels Aug 8, 2017

austinzheng mentioned this issue Aug 9, 2017

Avoid triggering change notifications for setting properties to existing values for createOrUpdate realm/realm-swift#3489

Closed

cmelchior mentioned this issue Aug 19, 2017

Changeset for Object returns all fields realm/realm-java#5123

Open

Zhuinden mentioned this issue Oct 23, 2017

Changed Notification raised even when the value is the same realm/realm-java#5451

Open

rodericj mentioned this issue Apr 11, 2019

Create update or insert method wrapper ThumbWorks/sf-ios#20

Closed

bmunkholm added the Pipeline-On-Hold label Nov 11, 2019

tgoyne closed this as completed Jul 7, 2020

github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024

Table::set() should not record transaction log entries for setting to the current value #2787

Table::set() should not record transaction log entries for setting to the current value #2787

Comments

tgoyne commented Aug 8, 2017 • edited by RealmBot Loading

ironage commented Aug 9, 2017

astigsen commented Aug 9, 2017

simonask commented Aug 9, 2017 • edited Loading

tgoyne commented Aug 9, 2017

simonask commented Aug 9, 2017

ironage commented Aug 9, 2017

tgoyne commented Aug 9, 2017

simonask commented Aug 10, 2017

finnschiermer commented Aug 10, 2017

simonask commented Aug 10, 2017

tgoyne commented Aug 10, 2017

simonask commented Aug 10, 2017

tgoyne commented Aug 10, 2017

ironage commented Aug 10, 2017

simonask commented Aug 11, 2017

simonask commented Aug 11, 2017 • edited Loading

tgoyne commented Aug 14, 2017

bigfish24 commented Aug 15, 2017 • edited Loading

bigfish24 commented Aug 15, 2017 • edited by bmunkholm Loading

bdash commented Aug 15, 2017

bigfish24 commented Aug 15, 2017 • edited Loading

simonask commented Aug 15, 2017

tgoyne commented Aug 15, 2017

simonask commented Aug 16, 2017

simonask commented Aug 16, 2017

simonask commented Aug 16, 2017

bdash commented Aug 16, 2017

simonask commented Aug 16, 2017

simonask commented Aug 16, 2017

finnschiermer commented Aug 16, 2017

simonask commented Aug 16, 2017

bdash commented Aug 21, 2017

skydivedan commented Mar 7, 2019

tgoyne commented Mar 7, 2019

msbouchedid commented Jul 6, 2020

tgoyne commented Jul 7, 2020 • edited Loading

tgoyne commented Aug 8, 2017 •

edited by RealmBot

Loading

simonask commented Aug 9, 2017 •

edited

Loading

simonask commented Aug 11, 2017 •

edited

Loading

bigfish24 commented Aug 15, 2017 •

edited

Loading

bigfish24 commented Aug 15, 2017 •

edited by bmunkholm

Loading

bigfish24 commented Aug 15, 2017 •

edited

Loading

tgoyne commented Jul 7, 2020 •

edited

Loading