Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename table may cause duplicate auto_increment assignment, leading to error #46904

Closed
mjonss opened this issue Sep 12, 2023 · 10 comments · Fixed by #47892
Closed

Rename table may cause duplicate auto_increment assignment, leading to error #46904

mjonss opened this issue Sep 12, 2023 · 10 comments · Fixed by #47892
Assignees
Labels
affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects 5.4.x versions. affects-6.1 affects-6.5 affects-7.1 affects-7.5 severity/major sig/sql-infra SIG: SQL Infra type/bug This issue is a bug.

Comments

@mjonss
Copy link
Contributor

mjonss commented Sep 12, 2023

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

# Use first client session, tidb1
create schema if not exists test1;
create schema if not exists test2;
use test1;
drop table if exists t1, test2.t2;
CREATE TABLE t1 (a int auto_increment primary key nonclustered, b varchar(255), key (b));
begin;
insert into t1 values (null, "t1 first");
select _tidb_rowid, a, b from t1;

# Switch to a new client (tidb2)
use test1;
rename table test1.t1 to test2.t2; -- will wait for tidb1

# Switch to new client (tidb3)
use test1;
begin;
insert into test2.t2 values (3, "t2 first insert 3")

# Switch back to tidb1
insert into t1 values (null, "t1 second insert, will hang due to autoid collision with tidb3!")

2. What did you expect to see? (Required)

Generating an auto_increment id will always work and not wait on any other session.

3. What did you see instead (Required)

insert into t1 values (null, "...") hangs, due to auto_increment generation?

insert into t1 values (null, "t1 second insert, will hang due to autoid collision with tidb3!")
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

4. What is your TiDB version? (Required)

@mjonss mjonss added the type/bug This issue is a bug. label Sep 12, 2023
@mjonss
Copy link
Contributor Author

mjonss commented Sep 12, 2023

I wonder if we should ever change the AutoID DatabaseID + tableID?

By keeping them disconnected from DatabaseID and TableID (just a reference in the TableInfo to the AutoID 'id') it would solve this issue, as well as simplify ADD/REMOVE Partitioning, where the DatabaseID and/or TableID can change and opens up for similar or worse issues. EXCHANGE Partition probably needs extra handling, like double writing to two sets of AutoIDs.

@ti-chi-bot ti-chi-bot bot added may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.1 may-affects-6.5 may-affects-7.1 labels Sep 12, 2023
@tiancaiamao
Copy link
Contributor

I think all the behaviour is expected.

# Use first client session, tidb1
create schema if not exists test1;
create schema if not exists test2;
use test1;
drop table if exists t1, test2.t2;
CREATE TABLE t1 (a int auto_increment primary key nonclustered, b varchar(255), key (b));
begin;
insert into t1 values (null, "t1 first");
select _tidb_rowid, a, b from t1;
# Switch to a new client (tidb2)
use test1;
rename table test1.t1 to test2.t2; -- wait for tidb1 is expected, since the DDL blocked by the transaction, by the MDL feature.

Going on well here

# Switch to new client (tidb3)
use test1;
begin;
insert into test2.t2 values (3, "t2 first insert 3")

The last step, transaction conflict between tidb1 and tidb3, blocking behaviour is expected by the pessimistic lock.
It's not hang, just that it's waiting until pessimistic lock timeout.

# Switch back to tidb1
insert into t1 values (null, "t1 second insert, will hang due to autoid collision with tidb3!")

test2.t2 and test1.t1 are the same table, although the meta info differs.
When tidb3 insert values (3, "t2 first insert 3"), it write pessimistic locks on key 7480000000000000a95f698000000000000002038000000000000003 (primary key a)

The key-value format for this operation is
t_{table id=169}_i_{index id=2}_{indexed value=3}   ===>    {rowid=30001}  (the index record)
t_{table id=169}_r_{rowid=30001}       ===>    {a=3, b="t2 first insert 3"}     (the row record)

When tidb1 insert values (null, "t1 second insert, will hang due to autoid collision with tidb3!")
This will also write pessimistic lock on because the auto increment value for null happens to be 3, thus the txn conflict.
7480000000000000a95f698000000000000002038000000000000003

@mjonss

@tiancaiamao
Copy link
Contributor

tidb1 and tidb3 both writing the a=3 record ... and there is a unique index (primary key) for that, thus transaction conflict on it.

@mjonss
Copy link
Contributor Author

mjonss commented Sep 13, 2023

@tiancaiamao the issue here is not the conflict inserting a = 3 twice, it is that tidb3 uses a different AutoID, disconnected from tidb1. Without the rename (or only rename the table, not also changing its database/schema) tidb3 would update the AutoID and it would be seen by tidb1 (even if both are still in active, non-committed transactions), resulting in tidb1 would not generate the auto_increment id 3, but 5 instead and not resulting in a conflict.

I would expect that AutoID is handled for each statement (not spanning client transactions).

@mjonss
Copy link
Contributor Author

mjonss commented Sep 13, 2023

I believe the issue is that the databaseID (together with tableID) is used to identify the AutoID, see here.

If it was only tableID (or even better, an own ID, that would be referenced/stored in the TableInfo) this issue would be solved.

The reason for not reusing databaseID or tableID, but instead an own ID, would be that when a logical table needs to change its tableID (like when changing the primary key or moving between partitioned and non-partitioned table) it could keep the same AutoID.

@tiancaiamao
Copy link
Contributor

the issue here is not the conflict inserting a = 3 twice, it is that tidb3 uses a different AutoID, disconnected from tidb1.

I debug the code and see they both insert a=3 ... and then transaction conflict.

insert into test2.t2 values (3, "t2 first insert 3")       --- tidb3
insert into t1 values (null, "t1 second insert, will hang due to autoid collision with tidb3!")     --- tidb1

What I mean is the last step of this bug ... not the trigger of it.
You can workaround the bug if you can somehow make tidb1 not inserting (null = 3), anything with AutoID in the rename table operation.

I believe the issue is that the databaseID (together with tableID) is used to identify the AutoID, see here.

Yes, it's databaseID + tableID ==> AutoID
So when if rename involving databaseID changes, and if a new AutoID is not generated, then the AutoID of the new table is not correct. That could be a bug. @mjonss

@tiancaiamao
Copy link
Contributor

Follow the reproduce steps, here

# Switch to new client (tidb3)
use test1;
begin;
insert into test2.t2 values (3, "t2 first insert 3")

You can change 3 to 5 and see what happens, there should be no "hang". @mjonss

insert into test2.t2 values (5, "t2 first insert 5")

@mjonss mjonss changed the title Rename table may cause hang in concurrent auto_increment assignment Rename table may cause duplicate auto_increment assignment, leading to error Sep 13, 2023
@mjonss
Copy link
Contributor Author

mjonss commented Sep 13, 2023

@tiancaiamao I would expect that any insert into t values (null, ...) would generate a new non-conflicting auto_increment (or _tidb_rowid for nonclustered tables). That expectations is broken by rename table t to newdb.t as shown in the test case.

I think the issue here is that when inserting in tidb3 it will use a new AutoID, while tidb1 is still using the old AutoID, resulting in tidb3 does not update the AutoID used in tidb1, which leads to the duplicated key. Auto_increment should always generate a new non-conflicting auto_increment value (1)

Here is another test case, basically only using auto-assigned auto_increment, which will also give unexpected 'ERROR 1146 (42S02): table doesn't exist':

# Use first client session, tidb1
create schema if not exists test1;
create schema if not exists test2;
use test1;
drop table if exists t1, test2.t2;
create table t1 (a int auto_increment primary key nonclustered, b varchar(255), key (b));
begin;
insert into t1 values (null, "t1 first null");
select _tidb_rowid, a, b from t1;

# Switch to a new client (tidb2)
use test1;
rename table test1.t1 to test2.t2; -- will wait for tidb1

# Switch to new client (tidb3)
use test1;
begin;
insert into test2.t2 values (null, "t2 first insert null");
select _tidb_rowid, a, b from test2.t2;

# Switch back to tidb1
insert into t1 values (29996, "t1 explicit 29996"); -- instead of generating 30k inserts with null
insert into t1 values (null, "t1 second null");
insert into t1 values (null, "t1 third null"); -- will give: ERROR 1146 (42S02): table doesn't exist. Due to AutoID does no-longer exists?

@tiancaiamao
Copy link
Contributor

> I think the issue here is that when inserting in tidb3 it will use a new AutoID, while tidb1 is still using the old AutoID, resulting in tidb3 does not update the AutoID used in tidb1, which leads to the duplicated key.

Exactly!

@tiancaiamao
Copy link
Contributor

# Switch back to tidb1
insert into t1 values (29996, "t1 explicit 29996"); -- instead of generating 30k inserts with null
insert into t1 values (null, "t1 second null");
insert into t1 values (null, "t1 third null"); -- will give: ERROR 1146 (42S02): table doesn't exist. Due to AutoID does no-longer exists?

In the "rename table" DDL operation, the old auto id meta key had been deleted and the new auto id meta key was generated.
Hence the auto id allocation of tidb1 is from the memory cache.
When tidb1 insert into t1 values (29996, "t1 explicit 29996") use up the id in memory, it trigger a new autoid allocation, and found the meta key do not exist.
Here is the error stack:

github.com/pingcap/errors.AddStack
	/home/genius/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20221009092201-b66cddb77c32/errors.go:174
github.com/pingcap/errors.(*Error).GenWithStack
	/home/genius/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20221009092201-b66cddb77c32/normalize.go:155
github.com/pingcap/tidb/meta.(*Meta).checkTableExists
	/home/genius/project/src/github.com/pingcap/tidb/meta/meta.go:539
github.com/pingcap/tidb/meta.(*autoIDAccessor).Inc
	/home/genius/project/src/github.com/pingcap/tidb/meta/meta_autoid.go:63
github.com/pingcap/tidb/meta/autoid.(*allocator).alloc4Signed.func2
	/home/genius/project/src/github.com/pingcap/tidb/meta/autoid/autoid.go:934
github.com/pingcap/tidb/kv.RunInNewTxn
	/home/genius/project/src/github.com/pingcap/tidb/kv/txn.go:132
github.com/pingcap/tidb/meta/autoid.(*allocator).alloc4Signed
	/home/genius/project/src/github.com/pingcap/tidb/meta/autoid/autoid.go:911
github.com/pingcap/tidb/meta/autoid.(*allocator).Alloc
	/home/genius/project/src/github.com/pingcap/tidb/meta/autoid/autoid.go:726
github.com/pingcap/tidb/table/tables.allocHandleIDs
	/home/genius/project/src/github.com/pingcap/tidb/table/tables/tables.go:1677
github.com/pingcap/tidb/table/tables.(*TableCommon).AddRecord
	/home/genius/project/src/github.com/pingcap/tidb/table/tables/tables.go:867
github.com/pingcap/tidb/executor.(*InsertValues).addRecordWithAutoIDHint
	/home/genius/project/src/github.com/pingcap/tidb/executor/insert_common.go:1408
github.com/pingcap/tidb/executor.(*InsertExec).exec
	/home/genius/project/src/github.com/pingcap/tidb/executor/insert.go:104
github.com/pingcap/tidb/executor.insertRows
	/home/genius/project/src/github.com/pingcap/tidb/executor/insert_common.go:253
github.com/pingcap/tidb/executor.(*InsertExec).Next
	/home/genius/project/src/github.com/pingcap/tidb/executor/insert.go:307
github.com/pingcap/tidb/executor/internal/exec.Next
	/home/genius/project/src/github.com/pingcap/tidb/executor/internal/exec/executor.go:283
github.com/pingcap/tidb/executor.(*ExecStmt).next
	/home/genius/project/src/github.com/pingcap/tidb/executor/adapter.go:1223
github.com/pingcap/tidb/executor.(*ExecStmt).handleNoDelayExecutor
	/home/genius/project/src/github.com/pingcap/tidb/executor/adapter.go:968
github.com/pingcap/tidb/executor.(*ExecStmt).handlePessimisticDML
	/home/genius/project/src/github.com/pingcap/tidb/executor/adapter.go:1029
github.com/pingcap/tidb/executor.(*ExecStmt).handleNoDelay
	/home/genius/project/src/github.com/pingcap/tidb/executor/adapter.go:791
github.com/pingcap/tidb/executor.(*ExecStmt).Exec
	/home/genius/project/src/github.com/pingcap/tidb/executor/adapter.go:575
github.com/pingcap/tidb/session.runStmt
...

Maybe a possible fix for "rename t1 to t2" is to make t2 not visible until the blocking transaction finish.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.3 This bug affects 5.3.x versions. affects-5.4 This bug affects 5.4.x versions. affects-6.1 affects-6.5 affects-7.1 affects-7.5 severity/major sig/sql-infra SIG: SQL Infra type/bug This issue is a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants