Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect cross join results for mpp queries occasionally #50358

Closed
yibin87 opened this issue Jan 12, 2024 · 5 comments
Closed

Incorrect cross join results for mpp queries occasionally #50358

yibin87 opened this issue Jan 12, 2024 · 5 comments

Comments

@yibin87
Copy link
Contributor

yibin87 commented Jan 12, 2024

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

create table t0 ( a int , b int);
create table t1( c int);
insert into t0 values(10, 20);
insert into t1 values(10);
alter table t0 set tiflash replica 1;
alter table t1 set tiflash replica 1;
set session tidb_isolation_read_engines='tiflash';
select 5 from t0 join t1;

2. What did you expect to see? (Required)

+---+
| 5 |
+---+
| 5 |
+---+

3. What did you see instead (Required)

Empty result sets.
Note, it is hard to reproduce the issue manually. The reproduce possiblity is rather low.

4. What is your TiDB version? (Required)

@yibin87 yibin87 added the type/bug This issue is a bug. label Jan 12, 2024
@yibin87
Copy link
Contributor Author

yibin87 commented Jan 12, 2024

/assign yibin87

@yibin87
Copy link
Contributor Author

yibin87 commented Jan 12, 2024

The issue is introduced by several steps during optimization:

  1. Part of the initial logicalPlan looks like this: Join(t0, t1, reordered=false, cartesianJoin=true) => Projection(schema: constant_column)
  2. rule_join_reorder (Little chance)
    For cross join node, this rule reorders t0 join t1 to t1 join t0 due to its cost estimation algorithm. And it detects schema changes, and try to add a new projection above join node:
    if schemaChanged {
             if schemaChanged {
			proj := LogicalProjection{
				Exprs: expression.Column2Exprs(originalSchema.Columns),
			}.Init(p.SCtx(), p.QueryBlockOffset())
			// Clone the schema here, because the schema may be changed by column pruning rules.
			proj.SetSchema(originalSchema.Clone())
			proj.SetChildren(p)
			p = proj
		}

Now, the LogicalPlan will have two projections as parent and child: Join(t0, t1, reordered=true, cartesianJoin=false) => Projection1(schema: old_join_schema) => Projection(schema: constant_column)
3. rule_column_pruning
This rule detects that none of the Projection1's output columns are needed by Projection, thus it prunes all the output columns, and changes the LogicalPlan to:
Join(t0, t1) => Projection1(schema: empty_schema) => Projection(schema: constant_column)
4. Generate physical plan:
When init PhysicalTableReader node, its schema will be set to the Projection1's schema, and won't be updated again
https://github.com/pingcap/tidb/blob/17d65b9317e054c80f781680d2412081de6556c6/pkg/planner/core/initialize.go#L484C1-L484C33

         p.schema = p.tablePlan.Schema()
  1. Build MPPGather:
    MppGather node reuses TableReader's schema
    gather := &MPPGather{
         gather := &MPPGather{
		BaseExecutor: exec.NewBaseExecutor(b.ctx, v.Schema(), v.ID()),
		...
	}
  1. Decode mpp response:
    Decode uses MppGather's schema to decode tiflash response, and produces empty result
    https://github.com/pingcap/tidb/blob/17d65b9317e054c80f781680d2412081de6556c6/pkg/distsql/select_result.go#L477C1-L480C1
      if r.respChunkDecoder == nil {
		r.respChunkDecoder = chunk.NewDecoder(
			chunk.NewChunkWithCapacity(r.fieldTypes, 0),
			r.fieldTypes,
		)
	}

@elsa0520
Copy link
Contributor

elsa0520 commented Jan 15, 2024

I can't reproduce this problem. Is it a issue from custom?

@yibin87
Copy link
Contributor Author

yibin87 commented Jan 15, 2024

https://github.com/pingcap/tidb/blob/17d65b9317e054c80f781680d2412081de6556c6/pkg/planner/core/initialize.go#L484C1-L484C33

No, it's from one of tiflash regression tests, and low possibility to reproduce with high concurrency.

@yibin87
Copy link
Contributor Author

yibin87 commented Mar 28, 2024

Fixed in tidb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants