Inconsistent State in MVCC #151

timbokopter · 2013-08-12T15:31:46Z

In the current implementation, MVCC allows for inconsistent reads. For a record deleted by another transaction, it is impossible to determine whether it would have been valid or not in the context of the current transaction. Example:

T1 starts with last_CID=6, does things
T2 starts, inserts record A, commits with CID=7
T3 starts, deletes record A, commits with CID=8
T1 reads A with valid=0 and CID (8) > last_CID (6) ==> current implementation assumes that value was valid for T1's read

bastih · 2013-08-12T15:52:30Z

Maybe we need to keep track of creation and expiration CIDs (which is for example what postgres mvcc does). (see p 58ff http://momjian.us/main/writings/pgsql/internalpics.pdf)

grundprinzip · 2013-08-12T20:12:51Z

This is true. Anyone volunteers to replace the valid column with a creation vector and update the validation methods?

bastih · 2013-08-13T08:03:34Z

I'll take care of it.

mrks · 2013-08-13T08:09:30Z

The Postgres approach provides fewer transactional guarantees though: "The above assumes the default
read committed isolation level". Currently (ignoring the problem above), we guarantee serializable transactions. As far as I understand it, the Postgres approach would introduce lost updates, phantom reads, and non-repeatable reads.

grundprinzip · 2013-08-13T08:26:03Z

Basically, we are not changing much: The CID becomes the end timestamp and we add a creation timestamp. The creation timestamp is written once together with the CID of the insertion commit. Validity rules remain as there are with the addition of the creation check.

A record with an end CID is not valid
A record with a creation CID is valid
A record with no CIDs and a TID is currently written
...

David has a state table for that I think...

bastih · 2013-08-13T08:36:36Z

do we actually need the valid bit then?

grundprinzip · 2013-08-13T08:51:05Z

nope...

schwald · 2013-08-13T12:08:37Z

Ok, so if I get this right we would have the following for each row: TID, BeginCID, EndCID.
When writing the following states should occur (ignoring inconsistent states):

v.TID	v.BeginCID	v.EndCID	LastCID	Status
--	Inf	Inf	12	Init
5	Inf	Inf	12	Insert uncommitted
5	13	Inf	12	Insert commit in prog.
5	13	Inf	13	Insert committed
--	13	Inf	13	Insert committed (cleaned up)
--	13	14	13	delete commit in prog.
--	13	14	14	delete committed

For the version visibility this results in:

v.TID == tx.TID	tx.LastCID >= v.BeginCID	tx.LastCID >= v.EndCID	Keep?	Comment
yes	yes	yes	No	Impossible
no	yes	yes	No	Past Delete
yes	no	yes	No	Impossible
(yes)	yes	no	No	Own Delete (delete list in TM)
no	no	yes	No	Impossible
yes	no	no	Yes	Own Insert
no	yes	no	Yes	Past Insert, Future Delete
no	no	no	No	Uncommitted Insert, Future Insert

Anything I missed?

mrks · 2013-08-15T08:22:20Z

We had a discussion about the implementation of deletes yesterday:

Having the delete list in the transaction manager might turn into a performance issue, depending on how many deletes we have. If ValidatePositions has to check for the existence of a deletion entry in the delete list, the best we can achieve O(log n).

The other idea would be to use some kind of delete lock, which could be implemented by reusing the TID. As you can see in the first table above, we do not need the TID field any more as soon as BeginCID is set. What we could do is to add a "delete uncommitted" state with the TID set to the TID of the deleting transaction and leaving the other fields untouched.

This might cause live locks when two transactions try to delete two rows in a different orders and always have to get aborted. However, this case does not violate the ACID criteria. Also, it can happen in other databases, like in MySQL.

grundprinzip · 2013-08-15T08:41:43Z

Just my two cents here: validate positions will always preform a sequential scan over the deleted positions thus this is not o(LOG N). During the scan it will evaluate of it can commit.

However this does not solve the serialization of commits if they're executed in parallel as the is no possibility to mark a row as delete pending.

A way to mark rows as pending would be great and we can use the tid field for that.

If tid and begin cid is set it's a pending delete while a tid and no begin cid is a insert.

Open question : how to delete your own writes? How to handle abort and rollback of transactions correctly.

schwald · 2013-08-15T08:51:20Z

Regarding the decision if a transaction can commit or if it detects conflicts, we clearly need the list of deleted positions inside the TM.

However, I think the problem Markus refers to is a different one. If we do not write a TID in a deleted row and only remember the deleted position in a list in the TM, how does a transaction detect its own deletes. If it reads a table and has a list of positions, the ValidatePosScan has to detect all the own deletes.

If both lists were sorted, this could be done easily by iterating through the sorted lists in parallel. However, the delete list might not be in sort order but could be sorted before the ValidatePosScan. Can we assume that the list of positions that will be validated is sorted?

schwald · 2013-08-15T08:57:25Z

I think the issue of parallelizing commits should be a different discussion. See #138.

grundprinzip · 2013-08-15T09:29:24Z

Regarding the sort order of the position lists, we cannot assume any ordering. So it's true we need something else and the proposed way of using the tid column is fine for me.

schwald · 2013-08-15T09:45:13Z

Ok, so let's assume we write the TID into a row when deleting the row in a transaction. So we have the following status combinations for a row:

v.TID	v.BeginCID	v.EndCID	LastCID	Status
--	Inf	Inf	12	Init
5	Inf	Inf	12	Insert uncommitted
5	13	Inf	12	Insert commit in prog.
5	13	Inf	13	Insert committed
--	13	Inf	13	Insert committed (cleaned up)
6	13	Inf	13	delete uncommited
6	13	14	13	delete commit in prog.
6	13	14	14	delete committed

If we consider a set TID in a row as an exclusive lock, can the step between 'insert committed' and 'insert committed cleaned up' make any trouble? So we could have rows that are already committed but still locked. Do you see any problem here?

ghost assigned schwald Aug 12, 2013

mrks added a commit that referenced this issue Aug 15, 2013

Changed TranStore to new MVCC transaction model #151

94c54f1

mrks added a commit that referenced this issue Aug 21, 2013

Changed TranStore to new MVCC transaction model #151

514c175

schwald closed this as completed Aug 23, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent State in MVCC #151

Inconsistent State in MVCC #151

timbokopter commented Aug 12, 2013

bastih commented Aug 12, 2013

grundprinzip commented Aug 12, 2013

bastih commented Aug 13, 2013

mrks commented Aug 13, 2013

grundprinzip commented Aug 13, 2013

bastih commented Aug 13, 2013

grundprinzip commented Aug 13, 2013

schwald commented Aug 13, 2013

mrks commented Aug 15, 2013

grundprinzip commented Aug 15, 2013

schwald commented Aug 15, 2013

schwald commented Aug 15, 2013

grundprinzip commented Aug 15, 2013

schwald commented Aug 15, 2013

Inconsistent State in MVCC #151

Inconsistent State in MVCC #151

Comments

timbokopter commented Aug 12, 2013

bastih commented Aug 12, 2013

grundprinzip commented Aug 12, 2013

bastih commented Aug 13, 2013

mrks commented Aug 13, 2013

grundprinzip commented Aug 13, 2013

bastih commented Aug 13, 2013

grundprinzip commented Aug 13, 2013

schwald commented Aug 13, 2013

mrks commented Aug 15, 2013

grundprinzip commented Aug 15, 2013

schwald commented Aug 15, 2013

schwald commented Aug 15, 2013

grundprinzip commented Aug 15, 2013

schwald commented Aug 15, 2013