Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table::set() should not record transaction log entries for setting to the current value #2787

Closed
tgoyne opened this issue Aug 8, 2017 · 36 comments

Comments

@tgoyne
Copy link
Member

tgoyne commented Aug 8, 2017

The end-user motivation for this is realm/realm-swift#3489. The short summary is that upserting a json blob into a Realm currently triggers notifications even if none of the values actually changed. This has also emerged as a problem for sync, as the redundant sets are still recorded in the transaction log and so use bandwidth and CPU time on all clients.

This is something we should do for 3.0. It's a breaking change, so it needs to happen now or not anytime soon, and I don't think anyone particularly wants the current behavior.

Conceptually this can be as simple as making Table::set() be:

template<typename T>
void Table::set(size_t col, size_t row, T value)
{
    if (get<T>(col, row) == value)
      return;
    // do set
}

For some column types it may be worth pushing the check in deeper to avoid redundant validation/bptree traversals/etc. should benchmarking indicate that this actually has meaningful overhead.


There's a few other ways we could handle this other than making changes in core, none of which I feel are viable.

  1. Filter out sets to the current value in the notification code.

    I took a stab at implementing this and it turned out to be very complicated and/or slow due to that the code processing the transaction log doesn't have access to the Realm in the state that it would be at the point where each instruction is handled, so a lot of additional state needs to be tracked to either support lookups there or to compare the fields afterwards. Also doesn't help the sync case at all.

  2. Special-case it in the createOrUpdate logic.

    This inherently imposes inconsistency in our API, which is something we should try to avoid.

  3. Make self-assignments no-ops at the ObjectStore layer.

    Currently not a practical option due to that not all property sets actually go through the object store, and mandating that bindings never call Table::set() directly would be a large project for the 3.0 timeframe.

@ironage
Copy link
Contributor

ironage commented Aug 9, 2017

@simonask do you foresee any problems integrating this change with sync, or can we go ahead and implement?

@astigsen
Copy link
Contributor

astigsen commented Aug 9, 2017

This is something that could potentially be very expensive performance wise, so we should make sure that we benchmark it and implement it at the right level.

It could potentially be implemented in the setter at the leaf level so that we don't have have to traverse the b-tree twice, but it is obviously only worth doing this if profiling shows it to be an actual performance problem.

@simonask
Copy link
Contributor

simonask commented Aug 9, 2017

For sync, it does create a behavioral change, because of "last write wins". This means that if a different client wrote to the field between the previous set instruction and the one you are throwing away, the other client will now "win" with this change.

By the way, we are implementing log compaction in Sync to address the same situation (redundant instructions), so the motivation to do this should be purely for Core performance. :-)

By the way, it would be fine from our perspective to skip writing to the database, but still emit the instruction (potentially reducing amount of CoW-memory), but also potentially not making any difference at all, since the commit log entry must still be written). I would love to see if that has any impact on benchmarks. :-)

@tgoyne
Copy link
Member Author

tgoyne commented Aug 9, 2017

It creates a behavioral change, but it is just as easy to think of scenarios where the current behavior is undesirable (e.g. user copies object out of Realm, modifies one field, copies it back in, and then overwrites a change made on a different client that they did not mean to overwrite) as scenarios where this would be worse.

I do not see how log compaction can reduce the size of the transaction logs uploaded by a client unless it does the same logic of checking if each instruction actually changed the value on the client locally before uploading, and doing that just sounds like a slower version of just not emitting the instruction in the first place.

@simonask
Copy link
Contributor

simonask commented Aug 9, 2017

That is certainly a scenario that would create a similar problem, but it's a lot more esoteric than setting any property at any point in the program. It's also very easy to explain any unexpected behavior.

Log compaction does not look at the current state of the realm, but scans a sequence of changesets to detect if early instructions are made redundant by later instructions in those changesets. This can run on the client or the server, but we haven't decided yet exactly when the right time is.

I believe that not emitting the instruction would be problematic, but that it is worth investigating if there is anything to be won by eliding the actual update — especially considering it is very likely that something else caused a changelog entry to be emitted in the same transaction (requiring a write in the history regardless, meaning eliminating one instruction makes no real difference), whereas a full CoW on the column is potentially expensive. I don't know if Array already performs this check though?

@ironage
Copy link
Contributor

ironage commented Aug 9, 2017

If the existing value is identical, Array::set returns early before CoW.

@tgoyne
Copy link
Member Author

tgoyne commented Aug 9, 2017

This change in functionality has been widely requested for years, both in the linked issue and pretty much every time the subject has come up when talking to users, and not because of write performance. Users are consistently surprised by the idea that we consider setting a property to its existing value to be a a modification of that property.

@simonask
Copy link
Contributor

It seems part of the surprise stems from update notifications being triggered where oldValue == newValue. That should be trivial to address in the notification code (just don't trigger the notification in that case).

For the use case of applying updates to an object from JSON, it would make a lot of sense to have a specialized API for that, like -[RLMObject updateIfDifferent:(JSON*)].

The reason I'm concerned about doing it in Table::set() for every property update is that it becomes significantly harder to predict the behavior of the merge algorithm. There would no longer be a 1-to-1 correlation between Table-level API calls and emitted instructions.

@finnschiermer
Copy link
Contributor

@astigsen From Cores perspective, the overarching question is whether we want this change to Sync semantics - And this depends on how Sync is used. What is your opinion?

@simonask
Copy link
Contributor

For the record, I'm fine with the change if it's the right way to solve the issues, just as long as we do it with open eyes. Also, we probably have some unit tests that would need adjustment. :-)

@tgoyne
Copy link
Member Author

tgoyne commented Aug 10, 2017

It seems part of the surprise stems from update notifications being triggered where oldValue == newValue. That should be trivial to address in the notification code (just don't trigger the notification in that case).

I took a stab at implementing this and it turned out to be very complicated and/or slow due to that the code processing the transaction log doesn't have access to the Realm in the state that it would be at the point where each instruction is handled, so a lot of additional state needs to be tracked to either support lookups there or to compare the fields afterwards.

Adding a different function which does the thing people actually want would just be API bloat, and would have the same impact on sync. Explaining merge behavior to users in terms of Table::set() calls is not very useful, as that's already several abstraction layers away from what they're doing.

@simonask
Copy link
Contributor

It does not have the same impact if it isn't the default behavior all the time.

I'm not sure how notifications are exposed in Cocoa, but if both newValue and oldValue are available at some point in the notification code, it should also be possible to use them to prevent the notification from being passed to user code.

I'm just trying to get across that changing Last-Write-Wins to Last-Write-Sort-Of-Wins isn't a trivial no-brainer change.

@tgoyne
Copy link
Member Author

tgoyne commented Aug 10, 2017

Collection notifications do not at any point keep track of the old or new values of anything or report that information to the user. The thing tracked is just which rows in the collection were modified (and inserted/removed).

With this change, sync merging is still Last-Write-Wins; we just don't perform a write at all when you set a field to the value it already has. I think this is a much easier to explain behavior than "sometimes we do a write when you aren't actually changing the value and sometimes we don't, depending on which function you called".

@ironage
Copy link
Contributor

ironage commented Aug 10, 2017

The notifications are generated by inspecting the replication logs right? Core could conceivably report both the old and new value in the "set" instruction to the Replication so that the notifications could just skip the instructions that set identical values. It would be a breaking change, but should solve the notification problem without affecting sync. Additionally, as Alexander points out, this change would suffer a (possibly significant) performance hit of always "getting" the previous value.

@simonask
Copy link
Contributor

Collection notifications do not at any point keep track of the old or new values of anything or report that information to the user. The thing tracked is just which rows in the collection were modified (and inserted/removed).

Hm, okay - collections shouldn't be touched by Table::set(), though, right? Or am I misunderstanding the situation?

With this change, sync merging is still Last-Write-Wins; we just don't perform a write at all when you set a field to the value it already has. I think this is a much easier to explain behavior than "sometimes we do a write when you aren't actually changing the value and sometimes we don't, depending on which function you called".

The problem is that it is "the value it already has" is unpredictable, because it depends on when the client last happened to communicate with the server, sacrificing transactional consistency. You could see antipatterns arise where clients clear fields before setting them "to make sure" etc.

Forgive me, but "the behavior is different depending on which function you call" seems like a fairly straight-forward idea. ;-)

@simonask
Copy link
Contributor

simonask commented Aug 11, 2017

Example in pseudocode:

Client A at t=0:
{
    animals[0].breed = "dog";
    animals[0].diet = "carnivorous";
    commit();
}

Client A at t=2:
{
    animals[0].breed = "cat";
    animals[0].diet = "carnivorous";
    commit();
}

Client B at t=1; // <-- NB! Timestamp is between the previous two operations.
{
    animals[0].breed = "giraffe";
    animals[0].diet = "herbivorous";
    commit();
}

With this (contrived?) sequence of operations, you could end up with a carnivorous giraffe. :)

@tgoyne
Copy link
Member Author

tgoyne commented Aug 14, 2017

Another option which would make achieving the desired notification behavior simple without changing the sync results would be to add a instr_SetCurrent instruction which is used instead of instr_Set when the old and new values are identical. Sync would treat this identically to Set, while notifications would ignore it entirely.

Ideally we would also omit the value entirely from the SetCurrent instruction to help somewhat with the secondary goal of reducing the size of the transaction logs, but I don't know if sync knows enough about the state of the clients to be able to actually merge that.

@bigfish24
Copy link
Contributor

bigfish24 commented Aug 15, 2017

Just to chime in, but there is definitely a desire from sync perspective for the behavior described above. The original code I wrote for a customer over a year ago had to manually diff for changes.

This is a common pattern for any server integration with an API that can't provide diffs or push changes, and instead needs to be polled.

From this perspective though a specific API would be appropriate since this use-case is sync or server related to not create unnecessary transactions.

@bigfish24
Copy link
Contributor

bigfish24 commented Aug 15, 2017

Also, Eric filed this from a customer: https://github.com/realm/product/issues/376

@bdash
Copy link
Contributor

bdash commented Aug 15, 2017

Keep in mind that the primary issue we're trying to solve here is related to unnecessary notifications being generated when properties are set to their existing values, where virtually all clients of notifications want to suppress such notifications. Requiring that a different API be used for such modifications is hard to justify.

For the secondary issue of avoiding unnecessary transaction log growth with sync, any new API would still be vulnerable to the conflict resolution consistency issue that Simon mentioned in #2787 (comment). While the user would be explicitly opting in to such behavior by using a different API, it would still be challenging to explain to them the subtleties of the behavior they're opting in to.

The proposal Thomas made in #2787 (comment) seems like it may allow for the behavior we're after in both cases while also preserving sync's ability to handle conflicts in a predictable fashion. It'd be great to hear the sync team's thoughts on it.

@bigfish24
Copy link
Contributor

bigfish24 commented Aug 15, 2017

@bdash thanks for explanation, wasn't trying to push any sort of direction but just making sure the customer voice was added to the mix.

@simonask
Copy link
Contributor

SetCurrent would be fine for Sync (or alternatively just a flag on the Set instruction), but keep in mind that the merge algorithm may still generate no-op Set instructions in some corner cases or multiple SetCurrent in a row.

Can we reiterate why intercepting no-op sets in the notification system isn't feasible? I heard the point about collections, but those are precisely not covered by Table::set().

@tgoyne
Copy link
Member Author

tgoyne commented Aug 15, 2017

Collection notifications report when fields on the objects in the collection are modified and not just when the set of objects contained by the collection changes, so they do care about Table::set().

@simonask
Copy link
Contributor

Ah, gotcha, that was the missing piece, thanks. 👍

@simonask
Copy link
Contributor

Then it seems to me that the most attractive idea is the SetCurrent instruction (or SetUnchanged or whatever we call it). Today, there is a boolean field indicating if the Set instruction is setting a default value. That could be extended to an enum indicating if the instruction is regular Set, SetDefault, or SetCurnret without incurring any overhead.

@simonask
Copy link
Contributor

A way to opt out of SetCurrent would also be very nice.

@bdash
Copy link
Contributor

bdash commented Aug 16, 2017

Why would it be useful to opt out of it?

@simonask
Copy link
Contributor

Because, as I have explained, it breaks certain semantics that we rely on.

I'm also not 100% convinced that we are seeing the full picture — we are getting requests for a different behavior from people who want that, but we do not have any idea how many people are actually relying on the current behavior for conflicts to resolve correctly.

I guess I'm also just fundamentally skeptical of the idea of involving 3 different teams to solve a problem that seems isolated to notifications. New merge rules, documenting new semantics, resolving issues, measuring the performance impact, coordinating the effort — it all adds up at a time that we at least would rather spend on more high-profile features.

@simonask
Copy link
Contributor

But of course, if that is what is needed, and there really is no other good way to address the problem, it is what we must do. I just think we need to really, really consider alternatives.

@finnschiermer
Copy link
Contributor

@simonask Are you saying that even if we go with a new setUnchanged instruction, sync may in certain situations have to emit an ordinary set instructions, so that users will still get unwanted notifications, just not as often as now?

@simonask
Copy link
Contributor

After thinking about it a bit more, I think the merge algorithm should be able to generate the desired instructions in all cases — the main drawback being that work is required inside the merge algorithm in the first place…

@bdash
Copy link
Contributor

bdash commented Aug 21, 2017

From talking with Simon, while a SetUnchanged instruction should allow the correct merge semantics, there's not sufficient time at this point in the release cycle for sync to make the necessary changes to accommodate it given the other high priority work that the sync team is occupied with.

I think the right plan of action for now is to defer this from consideration for our upcoming releases, and revisit it when we have less time pressure.

@skydivedan
Copy link

Forgive me if I'm misinterpreting this, but I'm curious about this, in regards to
realm/realm-swift#3489

It looks like the problem was solved over here in realm-java realm/realm-java#6224

Is there any effort -- or is it even possible -- to bring what was learned in realm-java over to realm-cocoa? Is there another ticket for that?

@tgoyne
Copy link
Member Author

tgoyne commented Mar 7, 2019

I started working on exposing that in Cocoa a while back but haven't had time to finish it off. It's still near the top of my backlog.

@msbouchedid
Copy link

Any update on where this landed? Running into same issue.

@tgoyne
Copy link
Member Author

tgoyne commented Jul 7, 2020

This was released in Cocoa 3.16.0.

@tgoyne tgoyne closed this as completed Jul 7, 2020
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

10 participants