Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Primary Keys in Core #1200

Merged
merged 7 commits into from Dec 2, 2015
Merged

Primary Keys in Core #1200

merged 7 commits into from Dec 2, 2015

Conversation

simonask
Copy link
Contributor

@simonask simonask commented Oct 6, 2015

PROPOSAL -- very open for discussion.

This does two things:

  1. Removes all previous support for primary keys in Core.
  2. Introduces a new operation, "SetUnique", which gives an error if the incoming value already exists in the column.

To implement primary keys on top of this, the binding must call only the set_unique operations (and never plain set) on a column. An extra consequence of this is that primary key constraints are only checked when set_unique is called, and not, for instance, when new rows are inserted. This replicates the current behavior of the mechanism used in the Cocoa binding (see RLMSetValueUnique).

Rationale

  • We need Core-level support for primary keys in order to support them in Sync. Since updating a value in a primary key column has different side-effects than updating a value in a non-PK column, the PK nature of the update operation must be encoded in the transaction log, so we can implement different merge rules for PK updates.
  • This retains the current implementation model for primary keys. Bindings still use the pk metadata table, and Core still knows nothing about which columns are primary key columns (no attribute or similar). It is solely up to the binding / object store to ensure that primary key columns are treated as such, as it is today. This has the additional benefit of not requiring a new method of creating objects, i.e. this retains the insert-empty-row-then-set-values mode.
  • As discussed in Primary key support  #831, different semantics for primary keys than what users usually expect are required for sync, because we cannot throw a primary key constraint violation during merge in case of a conflict. The proposed semantics is to merge the objects with the conflicting primary keys into one object. However, this means primary keys would behave differently in sync and outside of sync, so there is a proposal to actually make non-sync "primary keys" behave this way as well. This has the implication that changing the PK of an object to the same value as some other object will actually cause one of the objects to be deleted and the properties of the objects to be merged, and all references to the deleted object to be changed to point to the retained object. This behavior is probably extremely confusing for users by itself if we call it "primary keys", so this could be a reason to go for a different naming scheme, in this case "unique".

I have also considered the word "identity" instead of "unique", but "identity" is already a heavily overloaded word... Still, it might be worth considering.

Fixes #831.

\cc @kspangsege

@simonask simonask self-assigned this Oct 6, 2015
@realm-ci
Copy link
Contributor

realm-ci commented Oct 6, 2015

@realm-ci
Copy link
Contributor

realm-ci commented Oct 6, 2015

@simonask simonask removed the wip label Oct 7, 2015
@kneth
Copy link
Member

kneth commented Oct 23, 2015

@simonask This is a single column uniqueness, right? In general, candidate/primary keys can span multiple columns. To my knowledge, no bindings support that scenario but we get request for it (see realm/realm-java#1129).

Uniqueness is not only useful for primary keys but also for a general case - see realm/realm-java#967.

@simonask
Copy link
Contributor Author

@kneth Indeed, this is single-column uniqueness.

Multi-column uniqueness can be implemented in a similar way, but we would need a SetMultiple instruction.

@finnschiermer
Copy link
Contributor

I think I like the concepts and the nice explanations, but I'm not going to claim I understand all the implications :-)

@simonask
Copy link
Contributor Author

@kspangsege, @danielpovlsen, @teotwaki, @rrrlasse I would love your input here as well. :)

@danielpovlsen
Copy link
Contributor

Is this still a proposal or to be considered ready for review?

@simonask
Copy link
Contributor Author

@danielpovlsen I consider a proposal inherently destined for review. ;-) Sorry if the intention was unclear. Please do review, and please do have opinions about the direction as well! :-)

@teotwaki
Copy link
Contributor

This has the implication that changing the PK of an object to the same value as some other object will actually cause one of the objects to be deleted and the properties of the objects to be merged

I think this is one of the major points, and that discussing this as "primary keys" is dangerous. As far as I can tell, this sort of behaviour is not possible in SQL or even NoSQL stores. @simonask, correct me if I'm wrong, but could this be rephrased as "unique on insert but not on update"?

Are we convinced that what we're providing will fit with user expectations? How many people will expect merges to fail, or merges to be handled in a specific way; and can we properly document that/ensure a specific behaviour? In my opinion, there is a use-case for conflicts to occur, and ask the user for manual intervention.

  • Use-case where auto-merge is acceptable: contacts/phone book.
  • Use-case where auto-merge is not acceptable: spreadsheet.

@simonask
Copy link
Contributor Author

It is indeed tricky, because SQL and SQL-like stores provide no such feature directly. Many of them are providing "UPSERT", though, precisely for distributed use cases.

I would argue that this feature is actually in the spirit of "primary keys", in the sense that a primary key is meant to identify an object - logically, if two objects then have the same primary key, they have the same identity, and can therefore be collapsed.

When would you use primary keys in a spreadsheet?

@teotwaki
Copy link
Contributor

Sorry, my brain farted and I'm not entirely sure which point I was trying to get to.

Thinking this over, merge could indeed be the correct default behaviour considering what we're trying to achieve. I guess the main road block I see comes from the TRIGGER concept in RDBMS, which we don't have in Realm, but which would run specific actions upon insertion. In those cases, a merge strategy could have very weird consequences, and is why duplicate PK insertion is blocking in those systems.

@simonask simonask mentioned this pull request Nov 25, 2015
# Conflicts:
#	src/realm/impl/transact_log.hpp
#	src/realm/replication.cpp
#	src/realm/table.cpp
#	src/realm/table.hpp
@realm-ci
Copy link
Contributor

@cmelchior
Copy link
Contributor

Talked a bit with Simon today about this. In my opinion it doesn't matter much if we solve this problem at the ObjectStore level or in Core. For the sake of keeping core as simple as possible (+1 when trying to reason about Sync), I think that solving it at the ObjectStore level would be natural. This means that this proposal would be enough to solve our problems as far as I can see.

Some extra thoughts to consider

  • Auto-incremented keys are out (I guess they where never in anyway)
  • Synthetic keys like 128 bit GUID's doesn't need to use this operation to function.
  • For semantic keys we will have to consider how to resolve conflicts key conflicts. Just merging them with the default conflict resolution could probably work (Hint: Still need support for table-level conflict resolution rules), but will require some extra thinking. E.g. what happens if:
Server: [Andy, Benny, Chris] (name being unique)
Phone A: Renames Andy to Dennis
Phone B: Renames Benny to Dennis
  • Adding a setMultipleUnique or similar for supporting composite keys.

@rrrlasse
Copy link
Contributor

SQL has a "merge" keyword, and you can skip conflicting rows by adding a "where not exists" criteria. There also exists a "when not matched then" clause that you can use if you want custom behaviour on conflicts... So SQL is a winner over Realm here, with respect to flexibility.

By the way, do we really need the user to be able to specify/set the unique values? If they were only allowed to be auto generataed, then Realm could use different number generator for each thread that would somehow mathematically be guaranteed never to collide.

@simonask
Copy link
Contributor Author

@rrrlasse There are two types of primary keys: natural and synthetic. Your suggestion is to disallow natural keys, but we don't think this is what users actually want.

I think it requires a deeper analysis to figure out whether or not we can achieve similar levels of flexibility under this scheme.

@cmelchior
Copy link
Contributor

Disallowing natural keys will not be a good idea IMO. Too many people rely on those (e.g. server generated keys).

Regarding conflict handling, then SQLite e.g. has the following modes: https://www.sqlite.org/lang_conflict.html

@simonask
Copy link
Contributor Author

@cmelchior Interesting perspective — I hadn't looked into what SQLite actually does. Here is my evaluation of the strategies as they apply to Realm Sync:

  • ROLLBACK: Not an option — changes have already been applied on other clients.
  • ABORT + FAIL: See above.
  • IGNORE: See above. :-)
  • REPLACE: This is one possible choice as an outcome. We can also choose the same but opposite solution, where the object causing the PK violation becomes the old object (SQLite would not be able to implement this, as far as I can tell).

@realm-ci
Copy link
Contributor

realm-ci commented Dec 1, 2015

@realm-ci
Copy link
Contributor

realm-ci commented Dec 1, 2015

@realm-ci
Copy link
Contributor

realm-ci commented Dec 2, 2015

simonask added a commit that referenced this pull request Dec 2, 2015
@simonask simonask merged commit 1daf458 into master Dec 2, 2015
@simonask simonask removed the Review-PR label Dec 2, 2015
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Primary key support
8 participants