Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for motion hazard between subplan and outer rel in a join was incomplete #8677

Closed
hlinnaka opened this issue Sep 21, 2019 · 4 comments
Closed
Assignees

Comments

@hlinnaka
Copy link
Member

@hlinnaka hlinnaka commented Sep 21, 2019

#7492 fixed a deadlock, if a join qual contains a SubPlan with a Motion. It fixed the deadlock by forcing any SubPlans in the join qual to be fully fetched, before reading the first tuple from the outer relation. To force the SubPlans to be fetched, it constructed a fake all-NULLs outer tuple, and evaluated the join qual with that. However, that method is not reliable. Consider this query:

select count(*) from t_inner right join t_outer on t_inner.c2=t_outer.c2
   and (t_inner.c1 is null or not exists (select 0 from t_subplan where t_subplan.c2=t_outer.c1));

This is the same query that the PR added to the regression suite in 'deadlock2.sql', but with the extra "t_inner.c1 is null or ..." clause. Because of that extra clause, evaluating the join qual with an all-NULLs tuple doesn't cause the subplan to be evaluated, because the "t_inner.c1 is null" condition is true and the qual evaluates to true without evaluating the subplan. So the query deadlocks.

@hlinnaka

This comment has been minimized.

Copy link
Member Author

@hlinnaka hlinnaka commented Sep 21, 2019

@pivotal-ning-yu, I was looking into this, because I noticed that the ShouldPrefetchJoinQual() is the only place where we use the estate->currentSliceIdInPlan field. I'd like to remove that field, because it seems pretty useless (prototype at https://github.com/hlinnaka/gpdb/tree/remove-unused-slice-id-tracking). I think ShouldPrefetchJoinQual() would work just as well (or poorly, given this bug :-) ) if it didn't check estate->currentSliceIdInPlan, it would just sometimes prefetch quals that it didn't necessarily need to.

@pivotal-ning-yu

This comment has been minimized.

Copy link
Member

@pivotal-ning-yu pivotal-ning-yu commented Sep 23, 2019

Nice catch, we'll work on this. @gaos1

@pivotal-ning-yu

This comment has been minimized.

Copy link
Member

@pivotal-ning-yu pivotal-ning-yu commented Sep 23, 2019

To be honest, I can't recall why ShouldPrefetchJoinQual() was implemented that way, that didn't do the real check. So feel free to remove the currentSliceIdInPlan` checking in that function, I'll check if we could add the real check later.

@gaos1 gaos1 self-assigned this Sep 30, 2019
@kainwen kainwen mentioned this issue Feb 25, 2020
0 of 5 tasks complete
kainwen added a commit that referenced this issue Feb 27, 2020
We introduces prefetch joinqual logic in commit fa762b and
later refactoed it in commit 36a93ba. But the logic before does
not handle things correct. Previously we finally check this until
ExecInitXXX and forgot to check if there is joinqual.

This commit fixes the issue for planner. It determines the bool
field `prefetch_joinqual` during planning stage.  The logic is
first set the field to show the risk in create_xxxjoin_plan, and then
in create_join_plan after we get the plan, if there is risk then we
loop each joinqual to see if there is motion in it (The only case
is the joinqual is a SubPlan).

Also this commit improves the logic of ExecPrefetchJoinQual to
solve the github issue:  #8677
It fixes this by flattening the bool expr of the joinqual and force each
of them to be Prefetched.
@kainwen

This comment has been minimized.

Copy link
Member

@kainwen kainwen commented Feb 27, 2020

Fix by 82dd416

Will consider backporting.

But for ORCA, I think we do not handle it correct.

kainwen added a commit to kainwen/gpdb that referenced this issue Mar 10, 2020
We introduces prefetch joinqual logic in commit fa762b and
later refactoed it in commit 36a93ba. But the logic before does
not handle things correct. Previously we finally check this until
ExecInitXXX and forgot to check if there is joinqual.

This commit fixes the issue for planner. It determines the bool
field `prefetch_joinqual` during planning stage.  The logic is
first set the field to show the risk in create_xxxjoin_plan, and then
in create_join_plan after we get the plan, if there is risk then we
loop each joinqual to see if there is motion in it (The only case
is the joinqual is a SubPlan). In 6X we add motions for SubPlan
after we have gotten the plan. So at the time of this function calling,
we do not know if the SubPlan contains motion. We might walk the plannedstmt
to accurately determine this, but that makes the code not that clean.
So, we adopt an over kill method here: any SubPlan contains motion. At least,
this can avoid motion deadlock.

Also this commit improves the logic of ExecPrefetchJoinQual to
solve the github issue:  greenplum-db#8677
It fixes this by flattening the bool expr of the joinqual and force each
of them to be Prefetched.
kainwen added a commit that referenced this issue Mar 10, 2020
We introduces prefetch joinqual logic in commit fa762b and
later refactoed it in commit 36a93ba. But the logic before does
not handle things correct. Previously we finally check this until
ExecInitXXX and forgot to check if there is joinqual.

This commit fixes the issue for planner. It determines the bool
field `prefetch_joinqual` during planning stage.  The logic is
first set the field to show the risk in create_xxxjoin_plan, and then
in create_join_plan after we get the plan, if there is risk then we
loop each joinqual to see if there is motion in it (The only case
is the joinqual is a SubPlan). In 6X we add motions for SubPlan
after we have gotten the plan. So at the time of this function calling,
we do not know if the SubPlan contains motion. We might walk the plannedstmt
to accurately determine this, but that makes the code not that clean.
So, we adopt an over kill method here: any SubPlan contains motion. At least,
this can avoid motion deadlock.

Also this commit improves the logic of ExecPrefetchJoinQual to
solve the github issue:  #8677
It fixes this by flattening the bool expr of the joinqual and force each
of them to be Prefetched.
kainwen added a commit to kainwen/gpdb that referenced this issue Mar 10, 2020
We introduces prefetch joinqual logic in commit fa762b and
later refactoed it in commit 36a93ba. But the logic before does
not handle things correct. Previously we finally check this until
ExecInitXXX and forgot to check if there is joinqual.

This commit fixes the issue for planner. It determines the bool
field `prefetch_joinqual` during planning stage.  The logic is
first set the field to show the risk in create_xxxjoin_plan, and then
in create_join_plan after we get the plan, if there is risk then we
loop each joinqual to see if there is motion in it (The only case
is the joinqual is a SubPlan). In 5X we add motions for SubPlan
after we have gotten the plan. So at the time of this function calling,
we do not know if the SubPlan contains motion. We might walk the plannedstmt
to accurately determine this, but that makes the code not that clean.
So, we adopt an over kill method here: any SubPlan contains motion. At least,
this can avoid motion deadlock.

Also this commit improves the logic of ExecPrefetchJoinQual to
solve the github issue:  greenplum-db#8677
It fixes this by flattening the bool expr of the joinqual and force each
of them to be Prefetched.
kainwen added a commit that referenced this issue Mar 11, 2020
We introduces prefetch joinqual logic in commit fa762b and
later refactoed it in commit 36a93ba. But the logic before does
not handle things correct. Previously we finally check this until
ExecInitXXX and forgot to check if there is joinqual.

This commit fixes the issue for planner. It determines the bool
field `prefetch_joinqual` during planning stage.  The logic is
first set the field to show the risk in create_xxxjoin_plan, and then
in create_join_plan after we get the plan, if there is risk then we
loop each joinqual to see if there is motion in it (The only case
is the joinqual is a SubPlan). In 5X we add motions for SubPlan
after we have gotten the plan. So at the time of this function calling,
we do not know if the SubPlan contains motion. We might walk the plannedstmt
to accurately determine this, but that makes the code not that clean.
So, we adopt an over kill method here: any SubPlan contains motion. At least,
this can avoid motion deadlock.

Also this commit improves the logic of ExecPrefetchJoinQual to
solve the github issue:  #8677
It fixes this by flattening the bool expr of the joinqual and force each
of them to be Prefetched.
@kainwen kainwen closed this Mar 11, 2020
weinan003 added a commit to weinan003/gpdb that referenced this issue Mar 15, 2020
We introduces prefetch joinqual logic in commit fa762b and
later refactoed it in commit 36a93ba. But the logic before does
not handle things correct. Previously we finally check this until
ExecInitXXX and forgot to check if there is joinqual.

This commit fixes the issue for planner. It determines the bool
field `prefetch_joinqual` during planning stage.  The logic is
first set the field to show the risk in create_xxxjoin_plan, and then
in create_join_plan after we get the plan, if there is risk then we
loop each joinqual to see if there is motion in it (The only case
is the joinqual is a SubPlan).

Also this commit improves the logic of ExecPrefetchJoinQual to
solve the github issue:  greenplum-db#8677
It fixes this by flattening the bool expr of the joinqual and force each
of them to be Prefetched.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.