[6X FIX]Fix a bug of concurrently executing UPDATE/DELETE with VACUUM on a partition ao table #11895

bbbearxyz · 2021-04-30T05:29:15Z

The problem is ERROR "tuple concurrently updated" occurs when running
delete qeury during ETL as below error. When we concurrently execute an
UPDATE/DELETE statement and a VACUUM statement on a partition ao table.
UPDATE/DELETE statement is likely to get an out-of-date snapshot after

Here are some reminders before you submit the pull request

Add tests for the change
Document changes
Communicate in the mailing list if needed
Pass make installcheck
Review a PR in return to support the community

asimrp

There are no AO tables in upstream PostgreSQL, but there are partitioned tables. How does the locking work for partitioned tables in upstream? It will be a good reference point in order to evaluate this change.

kainwen · 2021-05-20T04:09:33Z

There are no AO tables in upstream PostgreSQL, but there are partitioned tables. How does the locking work for partitioned tables in upstream? It will be a good reference point in order to evaluate this change.

Hi @asimrp

For Update | Delete on root table, upstream will lock all the leafs.

For Insert, upstream's behavior, please refer to the thread: https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/wAPKpJzhbpM/m/VEkOeMf_BQAJ

kainwen · 2021-05-25T12:48:50Z

src/backend/parser/parse_clause.c


-		pstate->p_target_relation = parserOpenTable(pstate, relation, RowExclusiveLock, NULL);
+	Oid relid = RangeVarGetRelid(relation, NoLock, true);


We can get relid from pstate->p_target_relation, do not need to get id from rangeVar.

kainwen · 2021-05-25T12:49:54Z

src/backend/utils/cache/plancache.c

+						if (parsetree->commandType == CMD_INSERT)
+							lockmode = RowExclusiveLock;
+						else
+							lockmode = ExclusiveLock;


I think the lockmode should also be deducted here.

I think the lockmode should also be deducted here.

Yeah. We can just use lockmode here. That's easier.

asimrp · 2021-05-25T13:17:53Z

It seems there was consensus to adopt the locking strategy from upstream (lock all leaves): https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/-FELESD8BCc/m/W-wWpb7UAgAJ

Is it possible to do the same on 6X_STABLE? Or such a change would be too intrusive for a stable branch?

kainwen · 2021-05-25T13:52:05Z

It seems there was consensus to adopt the locking strategy from upstream (lock all leaves): https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/-FELESD8BCc/m/W-wWpb7UAgAJ

Is it possible to do the same on 6X_STABLE? Or such a change would be too intrusive for a stable branch?

Hi @asimrp , 6X's behavior is not correct thus leads to this BUG.

Without GDD, 6X locks all the leafs for insert, delete, update; this is just the same behavior as this commit now. Also, previous fix forget to modify the related code for cached plan.

With GDD, 6X's behavior is very strange:

with orca, leafs will not be locked on QD
with planner, leafs are locked with wrong lock mode for AO partition table

This commit keeps the same behavior as upstream (also the same as 6x without GDD, but a correct way).

The only thing might need to discuss is: with GDD, do we still need to lock all leafs in QD? I try to find a bad case (not locking), but seems no result yet.

cc @d

kainwen · 2021-05-27T12:23:21Z

src/backend/parser/parse_clause.c

+			lockmode = ExclusiveLock;
+		else
+			lockmode = RowExclusiveLock;
+		if (pstate->p_is_insert && gp_enable_global_deadlock_detector) {


Better comment more here, like move the comments (removed by this or) in InitPlan() here.

kainwen · 2021-05-27T12:24:34Z

src/backend/utils/cache/plancache.c

+					 */
+					if (rel_is_partitioned(rte->relid) && (parsetree->commandType == CMD_INSERT
+					|| parsetree->commandType == CMD_UPDATE || parsetree->commandType == CMD_DELETE))
+						find_all_inheritors(rte->relid, lockmode, NULL);


Here the behavior is not the same as setTargetTable, we should keep consistent.

Here the behavior is not the same as setTargetTable, we should keep consistent.

You mean adding the conditional statement of gdd or removing commandType or both ?

JunfengYang

Please note the usage of function rel_is_partitioned. To recognize a partitioned table, see example:

gpdb/src/backend/commands/indexcmds.c

Lines 2400 to 2409 in 183f886

    
           if (rel_is_partitioned(heapOid)) 
        
           { 
        
           	PartitionNode *pn; 
        
           	pn = get_parts(heapOid, 0 /* level */, 0 /* parent */, false /* inctemplate */, 
        
           				   true /* includesubparts */); 
        
           	prels = all_partition_relids(pn); 
        
           } 
        
           else if (rel_is_child_partition(heapOid)) 
        
           	prels = find_all_inheritors(heapOid, NoLock, NULL);

JunfengYang · 2021-06-01T06:34:44Z

src/backend/parser/parse_clause.c


-		pstate->p_target_relation = parserOpenTable(pstate, relation, RowExclusiveLock, NULL);
+	Oid relid = pstate->p_target_relation->rd_id;
+	if (rel_is_partitioned(relid))


rel_is_partitioned only recognizes the top relation of a partitioned table. If the relation is the middle layer partitioned table, it will return false and skip lock its children.

JunfengYang · 2021-06-01T06:35:24Z

src/backend/utils/cache/plancache.c

+					 * If the relation is partitioned, we should acquire corresponding locks
+					 * of each leaf partition before entering InitPlan.
+					 */
+					if (rel_is_partitioned(rte->relid) && ((parsetree->commandType == CMD_INSERT


rel_is_partitioned same here.
And the indent here looks really confused.

JunfengYang · 2021-06-01T06:39:53Z

src/test/isolation2/sql/concurrent_vacuum_with_delete.sql

@@ -0,0 +1,66 @@
+create extension if not exists gp_inject_fault;
+
+create table sales_row (id int, date date, amt decimal(10,2))


Please also add tests for the miulti-level partitioned table.
And operate on the middle layer partitioned table.

for the multi-level partitioned table, we can't insert/update/delete to middle-layer table. So we don't need to add it.

Do we know why we can't do it? As I check the upstream, it allows insert/update/delete on the middle layer partitioned table.

I don't know, so tomorrow I will check the reason.

asimrp

Can you also explain why this problem does not occur with concurrent vacuum and DML operations on a heap partitioned table? Why is it specific to AO?

asimrp · 2021-06-03T13:22:41Z

src/backend/executor/execMain.c

-				 * This will cause a deadlock.
-				 * Similar scenario apply for UPDATE as well.
+				 * We already get all the locks in the parse-analyze stage.
+				 * So we don't need to acquire any locks here.


Well, this comment is wrong. All locks are not acquired at parse-analyze stage, see line 1742 above. The change introduced by this patch would cause locks to be acquired in InitPlan() which is invoked during execution. Am I missing something?

Well, this comment is wrong. All locks are not acquired at parse-analyze stage, see line 1742 above. The change introduced by this patch would cause locks to be acquired in InitPlan() which is invoked during execution. Am I missing something?

Hi @asimrp

InitPlan():1742 is not the first time to lock the relation. It just open the table to get the result relation.

asimrp · 2021-06-03T13:44:16Z

src/backend/parser/parse_clause.c

+		}
+		else if (pstate->p_is_insert && gp_enable_global_deadlock_detector)
+		{
+			lockmode = NoLock;


This looks tricky. Not acquiring locks and yet accessing the tables sounds very incorrect. Can you please answer the following questions:

Why is it correct to insert into a child table without acquiring a lock on it?

What do we gain by not acquiring a lock?

This looks tricky. Not acquiring locks and yet accessing the tables sounds very incorrect. Can you please answer the following questions:

Why is it correct to insert into a child table without acquiring a lock on it?

What do we gain by not acquiring a lock?

Hi @asimrp just share what is in my mind:

this is just the same behavior as the 6X now (without this patch)

Insert on root partition will lock the needed leaf during Executor (on QEs)

Then for your questions:

Why is it correct to insert into a child table without acquiring a lock on it?

It will be locked in QEs, not QD, which is sort of the same behavior as upstream, I spend much time to come up with a bad case, but it seems OK.

What do we gain by not acquiring a lock?

Maybe performance? See the thread: https://groups.google.com/a/greenplum.org/g/gpdb-dev/c/wAPKpJzhbpM/m/VEkOeMf_BQAJ

For 6X, this patch keep the same behavior (but correct) seems the safest way?

asimrp · 2021-06-03T13:48:33Z

src/backend/utils/cache/plancache.c

+					 * following dead lock scenario may happen between INSERT and AppendOnly VACUUM 
+					 * drop phase on the partition table:
+					 * 
+					 * 1. AO VACUUM drop on QD: acquired AccessExclusiveLock


It is important to mention which command operates on a leaf partition and which one on the root, in this example. My understanding is, vacuum operates on a leaf, whereas insert operates on the root.

asimrp · 2021-06-03T13:52:47Z

src/test/isolation2/sql/concurrent_vacuum_with_delete.sql

+-- session 1 get locks for the sales_row_1_prt_1
+-- session 2 will get snapshot and enter InitPlan, wait session 1 to release the locks for the sales_row_1_prt_1
+-- session 1 complete
+-- session 2 get locks but the snapshot is invalid.


A better way to say this: "the version of tuple (in aoseg relation?) created by vacuum is invisible to session2's snapshot". The term "invalid snapshot" does not ring the right bells.

kainwen

LGTM. Please resolve comments ASAP.

asimrp

👍 to merge this fix as it targets a specific problem. Bigger questions with regards to the locking of partitioned tables with and without GDD enabled remain but they should be dealt with separately.

@bbbearxyz @wraymo before merging, please answer the question on multilevel partitioning raised by Junfeng and also incorporate specific feedback already given.

asimrp · 2021-06-07T08:09:49Z

The bigger problems are being tracked in #8362.

wraymo · 2021-06-07T08:38:16Z

The bigger problems are being tracked in #8362.

I think we have already solved the problem in this PR because we get locks of each leaf partition before InitPlan following the lock mode of its root partition.

ashwinstar · 2021-06-15T22:27:08Z

I will get to reviewing this PR tomorrow. Meanwhile, lets add README file to document current partition locking semantics based on knowledge we have gained as part of this PR.

ashwinstar · 2021-06-15T23:27:21Z

src/test/isolation2/sql/concurrent_vacuum_with_delete.sql

+-- session 2 will get locks on the sales_row, but still wait locks on the leaf partition in the function AcquirePlannerLocks.
+2&: execute test;
+
+select pg_sleep(2);


why do we have this sleep in test?

we must make true session 2 arrtive the place where session 2 need to get locks. If we don't sleep, this situation may happen: session 1 wait in the fault vacuum_hold_lock, session 2 is so slow and don't arrive the place where need to get locks and reset the fault, session 1 complete and session2 complete serializablility, the test is not valid.

If that's the case, no way to guarantee it will run concurrently even with sleep as completely non-deterministic and times can vary too much in CI. Can we enhance and make it deterministic instead by looping and checking for required locks waiting stage is reached by querying pg_locks?

Supose that sutiation : A parent table has two child table a and b, When we want to insert into the parent table, we need to acquire parent locks and two child locks, But In the test, vacuum session 1 already get locks on b, we must make sure session 2 wait in the position where need to get locks on b. But we can't make true session 2 wait in the postion. Though we can check pg_locks for session 2 already get lock on a, we still can't make true session 2 wait in the position get lock on b. It is non-deterministic although the non-deterministic is very small. Do you think it is better or not?

I already know how to solve it. Thank you! I will use fault inject replace sleep.

I use inject fault in the place where need to get lock, But still have the problem we just say, we can't make true session 2 is waitting for the child locks exactly, So I use inject fault and sleep to keep non-deterministic smallest.

ashwinstar · 2021-06-16T01:15:54Z

The PR states it aligns locking behavior with upstream. That's stated in context of what all locks are acquired. Are we aligned with upstream also in terms of when (parse vs initplan time) the locks are acquired?

I think this PR also helps to mitigate cases for such kind of queries where actual function execution (deleting the rows for concurrently vacuumed table) is much delayed after taking the snapshot.

SELECT pg_sleep(10) FROM dummy_table UNION ALL SELECT delete_rows();

bbbearxyz · 2021-06-16T07:27:46Z

The PR states it aligns locking behavior with upstream. That's stated in context of what all locks are acquired. Are we aligned with upstream also in terms of when (parse vs initplan time) the locks are acquired?

I think this PR also helps to mitigate cases for such kind of queries where actual function execution (deleting the rows for concurrently vacuumed table) is much delayed after taking the snapshot.
SELECT pg_sleep(10) FROM dummy_table UNION ALL SELECT delete_rows();

This case is that we get snapshot, after ten seconds, we delete rows? In the sql, do you know when we get locks in the function delete_rows()?

ashwinstar · 2021-06-17T17:00:29Z

src/backend/storage/lmgr/README-partition

@@ -0,0 +1,42 @@
+src/backend/storage/lmgr/README-partition


Thank you for starting this README. Is there a way we can represent the information using tabular format or something capturing information about different Operations (I/U/D/DDLs), GDD enabled, GDD disabled, with ORCA, without ORCA. Clearly flagging any deviations from upstream we have in the table as well.

@asimrp would you be interested to work with @bbbearxyz to help out to get it in that shape, I think you were also wondering about it with me

ok, I will do it

wenwangBJ · 2021-06-18T08:13:35Z

I will get to reviewing this PR tomorrow. Meanwhile, lets add README file to document current partition locking semantics based on knowledge we have gained as part of this PR.

HI, Ashwin, can you tell clearly where we should add README file to?

…ition ao table The problem is ERROR "tuple concurrently updated" occurs when running delete qeury during ETL as below error. When we concurrently execute an UPDATE/DELETE statement and a VACUUM statement on a partition ao table. UPDATE/DELETE statement is likely to get an out-of-date snapshot after the VACUUM has been processed on one of the leaf partition table. So we fix it by acquiring locks before executing. After we get locks, we dispatch snapshot. So this problem will not happen. If the relation is partitioned, we should acquire corresponding locks on each leaf partition before entering InitPlan. The snapshot will acquire before the InitPlan, so if we wait the locks for a while in the InitPlan. When we get all locks, the snapshot maybe become invalid. So we must acquire lock before entering InitPlan to keep the snapshot valid. In the parse-analyze stage, we should get all locks we needed and also need to maintain the consistency of the locks of the parent table and the child table. Only one situation is special, when the command is to insert, and gdd don't open, the dead lock will happen. for example: Without locking the partition relations on QD when INSERT with Planner the following dead lock scenario may happen between INSERT and AppendOnly VACUUM drop phase on the partition table: 1. AO VACUUM drop on QD: acquired AccessExclusiveLock 2. INSERT on QE: acquired RowExclusiveLock 3. AO VACUUM drop on QE: waiting for AccessExclusiveLock 4. INSERT on QD: waiting for AccessShareLock at ExecutorEnd() 2 blocks 3, 1 blocks 4, 1 and 2 will not release their locks until 3 and 4 proceed. Hence INSERT needs to Lock the partition tables on QD here (before 2) to prevent this dead lock. This will cause a deadlock. But if gdd have already opened, gdd will solve the dead lock. So it is safe to not get locks for partitions. But for the update and delete, we must acquire locks whether the gdd opens or not. So for the normal sql statement, we move the acquiring lock behavior in the InitPlan to setTargetTable. for the cached plan, we add the code acquiring locks on the child table. Co-authored-by: Rui Wang wangru@vmware.com wraymo

bbbearxyz force-pushed the backup branch 2 times, most recently from e0ca8f0 to 5953d23 Compare May 6, 2021 09:27

bbbearxyz requested a review from ashwinstar May 10, 2021 07:21

bbbearxyz force-pushed the backup branch 2 times, most recently from f07ba29 to f2353df Compare May 11, 2021 10:35

bbbearxyz closed this May 14, 2021

bbbearxyz force-pushed the backup branch from f2353df to 5650be2 Compare May 14, 2021 09:59

bbbearxyz reopened this May 14, 2021

bbbearxyz force-pushed the backup branch from d40f5eb to a0819b0 Compare May 14, 2021 10:14

asimrp reviewed May 20, 2021

View reviewed changes

bbbearxyz force-pushed the backup branch 2 times, most recently from 52e1e21 to 7126f63 Compare May 21, 2021 09:28

wraymo force-pushed the backup branch from 7126f63 to 4ace51e Compare May 25, 2021 10:24

kainwen reviewed May 25, 2021

View reviewed changes

wraymo force-pushed the backup branch from 4ace51e to 79f2b5f Compare May 26, 2021 02:41

bbbearxyz force-pushed the backup branch 6 times, most recently from 5ff0b30 to a3c1fab Compare May 27, 2021 05:56

kainwen reviewed May 27, 2021

View reviewed changes

bbbearxyz force-pushed the backup branch 2 times, most recently from d2035ab to 48602e9 Compare May 28, 2021 06:47

JunfengYang reviewed Jun 1, 2021

View reviewed changes

bbbearxyz force-pushed the backup branch 5 times, most recently from c617bf1 to d133653 Compare June 2, 2021 11:03

asimrp reviewed Jun 3, 2021

View reviewed changes

JunfengYang approved these changes Jun 7, 2021

View reviewed changes

kainwen approved these changes Jun 7, 2021

View reviewed changes

asimrp reviewed Jun 7, 2021

View reviewed changes

asimrp mentioned this pull request Jun 7, 2021

Inconsistency in locking behavior between AOCSCompact() vs. AppendOnlyCompact() #10529

Closed

ashwinstar reviewed Jun 15, 2021

View reviewed changes

bbbearxyz force-pushed the backup branch from d133653 to b952a2a Compare June 17, 2021 07:16

ashwinstar reviewed Jun 17, 2021

View reviewed changes

bbbearxyz force-pushed the backup branch 2 times, most recently from 7edefd7 to fe94298 Compare June 29, 2021 08:52

bbbearxyz force-pushed the backup branch from fe94298 to 97e500f Compare July 1, 2021 02:57

bbbearxyz merged commit 99fbc8e into greenplum-db:6X_STABLE Jul 2, 2021

kainwen mentioned this pull request Sep 29, 2021

Some issues of Locking behaviour for Greenplum Partition table #8362

Closed


		pstate->p_target_relation = parserOpenTable(pstate, relation, RowExclusiveLock, NULL);
		Oid relid = RangeVarGetRelid(relation, NoLock, true);

	if (rel_is_partitioned(heapOid))
	{
	PartitionNode *pn;

	pn = get_parts(heapOid, 0 /* level /, 0 / parent /, false / inctemplate */,
	true /* includesubparts */);
	prels = all_partition_relids(pn);
	}
	else if (rel_is_child_partition(heapOid))
	prels = find_all_inheritors(heapOid, NoLock, NULL);

		@@ -0,0 +1,66 @@
		create extension if not exists gp_inject_fault;

		create table sales_row (id int, date date, amt decimal(10,2))

[6X FIX]Fix a bug of concurrently executing UPDATE/DELETE with VACUUM on a partition ao table #11895

[6X FIX]Fix a bug of concurrently executing UPDATE/DELETE with VACUUM on a partition ao table #11895

Conversation

bbbearxyz commented Apr 30, 2021

Here are some reminders before you submit the pull request

asimrp left a comment

Choose a reason for hiding this comment

kainwen commented May 20, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asimrp commented May 25, 2021

kainwen commented May 25, 2021

kainwen May 27, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JunfengYang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

asimrp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kainwen left a comment

Choose a reason for hiding this comment

asimrp left a comment

Choose a reason for hiding this comment

asimrp commented Jun 7, 2021

wraymo commented Jun 7, 2021

ashwinstar commented Jun 15, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bbbearxyz Jun 16, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ashwinstar commented Jun 16, 2021

bbbearxyz commented Jun 16, 2021

Choose a reason for hiding this comment

bbbearxyz Jun 21, 2021 • edited

Choose a reason for hiding this comment

wenwangBJ commented Jun 18, 2021

kainwen May 27, 2021 •

edited

bbbearxyz Jun 16, 2021 •

edited

bbbearxyz Jun 21, 2021 •

edited