Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent State in MVCC #151

Closed
timbokopter opened this issue Aug 12, 2013 · 14 comments
Closed

Inconsistent State in MVCC #151

timbokopter opened this issue Aug 12, 2013 · 14 comments
Assignees
Labels

Comments

@timbokopter
Copy link
Contributor

In the current implementation, MVCC allows for inconsistent reads. For a record deleted by another transaction, it is impossible to determine whether it would have been valid or not in the context of the current transaction. Example:

  1. T1 starts with last_CID=6, does things
  2. T2 starts, inserts record A, commits with CID=7
  3. T3 starts, deletes record A, commits with CID=8
  4. T1 reads A with valid=0 and CID (8) > last_CID (6) ==> current implementation assumes that value was valid for T1's read
@ghost ghost assigned schwald Aug 12, 2013
@bastih
Copy link
Contributor

bastih commented Aug 12, 2013

Maybe we need to keep track of creation and expiration CIDs (which is for example what postgres mvcc does). (see p 58ff http://momjian.us/main/writings/pgsql/internalpics.pdf)

@grundprinzip
Copy link
Contributor

This is true. Anyone volunteers to replace the valid column with a creation vector and update the validation methods?

@bastih
Copy link
Contributor

bastih commented Aug 13, 2013

I'll take care of it.

@mrks
Copy link
Member

mrks commented Aug 13, 2013

The Postgres approach provides fewer transactional guarantees though: "The above assumes the default
read committed isolation level". Currently (ignoring the problem above), we guarantee serializable transactions. As far as I understand it, the Postgres approach would introduce lost updates, phantom reads, and non-repeatable reads.

@grundprinzip
Copy link
Contributor

Basically, we are not changing much: The CID becomes the end timestamp and we add a creation timestamp. The creation timestamp is written once together with the CID of the insertion commit. Validity rules remain as there are with the addition of the creation check.

A record with an end CID is not valid
A record with a creation CID is valid
A record with no CIDs and a TID is currently written
...

David has a state table for that I think...

@bastih
Copy link
Contributor

bastih commented Aug 13, 2013

do we actually need the valid bit then?

@grundprinzip
Copy link
Contributor

nope...

@schwald
Copy link
Member

schwald commented Aug 13, 2013

Ok, so if I get this right we would have the following for each row: TID, BeginCID, EndCID.
When writing the following states should occur (ignoring inconsistent states):

v.TID v.BeginCID v.EndCID LastCID Status
-- Inf Inf 12 Init
5 Inf Inf 12 Insert uncommitted
5 13 Inf 12 Insert commit in prog.
5 13 Inf 13 Insert committed
-- 13 Inf 13 Insert committed (cleaned up)
-- 13 14 13 delete commit in prog.
-- 13 14 14 delete committed

For the version visibility this results in:

v.TID == tx.TID tx.LastCID >= v.BeginCID tx.LastCID >= v.EndCID Keep? Comment
yes yes yes No Impossible
no yes yes No Past Delete
yes no yes No Impossible
(yes) yes no No Own Delete (delete list in TM)
no no yes No Impossible
yes no no Yes Own Insert
no yes no Yes Past Insert, Future Delete
no no no No Uncommitted Insert, Future Insert

Anything I missed?

@mrks
Copy link
Member

mrks commented Aug 15, 2013

We had a discussion about the implementation of deletes yesterday:

Having the delete list in the transaction manager might turn into a performance issue, depending on how many deletes we have. If ValidatePositions has to check for the existence of a deletion entry in the delete list, the best we can achieve O(log n).

The other idea would be to use some kind of delete lock, which could be implemented by reusing the TID. As you can see in the first table above, we do not need the TID field any more as soon as BeginCID is set. What we could do is to add a "delete uncommitted" state with the TID set to the TID of the deleting transaction and leaving the other fields untouched.

This might cause live locks when two transactions try to delete two rows in a different orders and always have to get aborted. However, this case does not violate the ACID criteria. Also, it can happen in other databases, like in MySQL.

@grundprinzip
Copy link
Contributor

Just my two cents here: validate positions will always preform a sequential scan over the deleted positions thus this is not o(LOG N). During the scan it will evaluate of it can commit.

However this does not solve the serialization of commits if they're executed in parallel as the is no possibility to mark a row as delete pending.

A way to mark rows as pending would be great and we can use the tid field for that.

If tid and begin cid is set it's a pending delete while a tid and no begin cid is a insert.

Open question : how to delete your own writes? How to handle abort and rollback of transactions correctly.

@schwald
Copy link
Member

schwald commented Aug 15, 2013

Regarding the decision if a transaction can commit or if it detects conflicts, we clearly need the list of deleted positions inside the TM.

However, I think the problem Markus refers to is a different one. If we do not write a TID in a deleted row and only remember the deleted position in a list in the TM, how does a transaction detect its own deletes. If it reads a table and has a list of positions, the ValidatePosScan has to detect all the own deletes.

If both lists were sorted, this could be done easily by iterating through the sorted lists in parallel. However, the delete list might not be in sort order but could be sorted before the ValidatePosScan. Can we assume that the list of positions that will be validated is sorted?

@schwald
Copy link
Member

schwald commented Aug 15, 2013

I think the issue of parallelizing commits should be a different discussion. See #138.

@grundprinzip
Copy link
Contributor

Regarding the sort order of the position lists, we cannot assume any ordering. So it's true we need something else and the proposed way of using the tid column is fine for me.

@schwald
Copy link
Member

schwald commented Aug 15, 2013

Ok, so let's assume we write the TID into a row when deleting the row in a transaction. So we have the following status combinations for a row:

v.TID v.BeginCID v.EndCID LastCID Status
-- Inf Inf 12 Init
5 Inf Inf 12 Insert uncommitted
5 13 Inf 12 Insert commit in prog.
5 13 Inf 13 Insert committed
-- 13 Inf 13 Insert committed (cleaned up)
6 13 Inf 13 delete uncommited
6 13 14 13 delete commit in prog.
6 13 14 14 delete committed

If we consider a set TID in a row as an exclusive lock, can the step between 'insert committed' and 'insert committed cleaned up' make any trouble? So we could have rows that are already committed but still locked. Do you see any problem here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants