-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error Thread 3 issue restoring [...]: Table 'tablename' already exists
while restoring tables
#469
Comments
Hi @64kramsystem, |
Hello! I've reviewed today the issue. It seems to be harmless (although I can't confirm - considerations below), although it's very confusing, as The table does exist after failure, and it contains records. In order to get a better idea, I've truncated the table, but not dropped it; the load inserts all the records. I can't confirm that the problem is harmless (as mentioned above) because the table records are only 8, and I don't know what happens for larger datasets. I've further isolated the conditions for the problem to happen; they are:
I'm going to add some further debug information shortly. |
I'm examining the myloader source ( In See the overwrite invocation here:
I don't know if it's intentional or not, but if all errors happing on (I'm not familiar with the code, so there's a chance I may be misunderstanding the logic). |
Hi @64kramsystem, I didn't check the line that you mention yet, but I just wanted to let you know, that I'm already working to better handling those errors, as it is very difficult to follow the thread that it is failing. You can check #393 and test that branch. Please, let me know |
I may have found some pointer. If I force error printing after table dropping, applying this diff to the current master: diff --git i/myloader.c w/myloader.c
index 1d4d947..725e5c3 100644
--- i/myloader.c
+++ w/myloader.c
@@ -1084,8 +1084,10 @@ void process_restore_job(MYSQL *thrconn,struct restore_job *rj, int thread_id, i
g_message("Thread %d restoring table `%s`.`%s` from %s", thread_id,
dbt->real_database, dbt->real_table, rj->filename);
int truncate_or_delete_failed=0;
- if (overwrite_tables)
+ if (overwrite_tables) {
truncate_or_delete_failed=overwrite_table(thrconn,thread_id,dbt->real_database, dbt->real_table);
+ g_critical("Thread %d FORCED MESSAGE %s: %s",thread_id,dbt->real_table, mysql_error(thrconn));
+ }
if ((purge_mode == TRUNCATE || purge_mode == DELETE) && !truncate_or_delete_failed){
g_message("Skipping table creation `%s`.`%s` from %s", dbt->real_database, dbt->real_table, rj->filename);
}else{ The output shows (filtered for the problematic table):
I wonder if, for some reason, a |
Thanks. I will debug also using the branch mentioned 🙂. |
Latest finding, based on an updated diff (which actually explains the diff --git i/myloader.c w/myloader.c
index 1d4d947..2988756 100644
--- i/myloader.c
+++ w/myloader.c
@@ -966,9 +966,9 @@ int overwrite_table(MYSQL *conn,int thread_id,gchar * database, gchar * table){
query = g_strdup_printf("DROP TABLE IF EXISTS `%s`.`%s`",
database, table);
mysql_query(conn, query);
- query = g_strdup_printf("DROP VIEW IF EXISTS `%s`.`%s`", database,
- table);
- mysql_query(conn, query);
+ // query = g_strdup_printf("DROP VIEW IF EXISTS `%s`.`%s`", database,
+ // table);
+ // mysql_query(conn, query);
} else if (purge_mode == TRUNCATE) {
g_message("Truncating table `%s`.`%s`", database, table);
query= g_strdup_printf("TRUNCATE TABLE `%s`.`%s`", database, table);
@@ -1084,8 +1084,10 @@ void process_restore_job(MYSQL *thrconn,struct restore_job *rj, int thread_id, i
g_message("Thread %d restoring table `%s`.`%s` from %s", thread_id,
dbt->real_database, dbt->real_table, rj->filename);
int truncate_or_delete_failed=0;
- if (overwrite_tables)
+ if (overwrite_tables) {
truncate_or_delete_failed=overwrite_table(thrconn,thread_id,dbt->real_database, dbt->real_table);
+ g_critical("Thread %d FORCED MESSAGE %s: %s",thread_id,dbt->real_table, mysql_error(thrconn));
+ }
if ((purge_mode == TRUNCATE || purge_mode == DELETE) && !truncate_or_delete_failed){
g_message("Skipping table creation `%s`.`%s` from %s", dbt->real_database, dbt->real_table, rj->filename);
}else{ The log is now (filtered):
My educated guess is that it's the following race condition:
this conflicts with the actual timings, but the log timing may be asynchronous, so a few ms of difference may be ignored. This is an educated guess, I could be wrong. I'll follow up later in the week. |
Can you test with just 1 thread? -t 1. Can you confirm that you are only running myloader when you have this error? |
The error doesn't happen with 1 or 2 threads. Tested around 7/8 times in total. |
@64kramsystem why don't to test with G_MESSAGES_DEBUG=all as might be possible that the code that you are running has some DEBUG messages that might be useful. |
So, I dug a bit. I also used, as referred, #476, however, it doesn't help. Also using G_MESSAGES_DEBUG didn't show anything that I could use. Now, this is an issue inside an issue :) The error observed, As of now, I don't know what causes the deadlock. I've tried to remove almost everything from The obvious guess is that there's something sneaky that, before the table drop(s) in I've spent a considerable time, so I'm stopping here. As a conclusion, I propose to rewrite Mydumper... in Perl 😂 |
Hi @64kramsystem, can you share the table structure? Do you foreign keys in other tables that point to this table? |
Hello! I'll prepare, over this week, an anonymized, minimal test case, and post it. |
Hi there! I've diagnosed the issue more precisely, and prepared the simplest dataset I could. The issue is caused by a combination of two factors:
In order to reproduce, unzip the dataset to a directory, and execute: function restore {
# add credentials and connection options
/path/to/myloader \
--socket /tmp/mysql.sock \
--directory /path/to/restore_files \
--verbose 3 \
--innodb-optimize-keys \
--threads 3 \
--overwrite-tables \
2>&1 | grep already
}
mysql -e 'drop schema if exists mydb; create schema mydb'; while true; do restore; echo -n .; done The message:
will be printed until the script is stopped. Sometimes the error message will refer to the table Dataset: myloader_issue.zip |
Hi @64kramsystem, |
Hi @64kramsystem, in what database version are you trying to restore? |
|
Oh! good that I asked, as this issue is not happening in 5.7. |
Ok, this is an issue that can be reproduced only in MySQL/Percona Server 8, this is not happening in 5.7.
Session 2:
Statements with *1 and *2 are executed at the exact same time.
on *1 and will succeed on *2.
on all the cases. |
@64kramsystem, Ah, other interesting things, I was not able to see the deadlock found error in the innodb monitor output, so it is an deadlock that it is not reported as a deadlock... 🤔 |
Wow. I didn't think at all it was a MySQL issue. 😅 Thanks for following up 😄 |
@64kramsystem , looks like they don't want to fix it, actually, they didn't acknowledge the problem. Now that we know what is causing the issue and a test case, the solution is to create the tables with a single thread, but I would like to add an option to force using just one thread. |
@64kramsystem |
He! I was just now reading the manuals as well - the below were my observations.
Ok, well, the MySQL bug tracker issue is dead and buried now. Deadlocks are overrated, anyway 😂 At least they gave a hint, which allows understanding the nature of problem. I think that this is the key:
This specific paragraph, from the [8.0 MDL page](it's located at https://dev.mysql.com/doc/refman/8.0/en/metadata-locking.html) was indeed not present in the 5.7 version. |
It does make sense, ultimately. If DDLs/LOCK tables acquire locks over related tables, with one thread starting from the parent, and another thread starting from the child, then this is a typical deadlock. It is consistent with the hypothesis the operations putting MDLs don't honor the
one may infer that it's intentional. I agree with the single thread solution, which I think is the most solid. Using Thanks for the digging 🙂 |
I was thinking on (other) options to implement this workaround:
@64kramsystem what do you think about 3. ?? |
I'm personally not a big fan of basing workflows on error states 😁 However, I also recognize that of all the options, this is the only one that maintains concurrency as it is now, so I'm not against it (it really depends on how much one considers the tradeoff valid). Of the options, my favourite is 2 (although I suppose is technically more complex than 1). If speed is a concern (as the DROPs will execute sequentially) for some users and there is space for a new option, an approach I find interesting is to add a database drop functionality (option), which is the "as-fast-as-it-gets" way to drop the tables; it's also not unusual, as mysqldump/pump have an option to add the drop destination db statement. I'd use it all the time, at least! 😆 |
@64kramsystem I had just realized that you can use --purge-mode=TRUNCATE to avoid the issue and get the expected behavior. Unless the table structure is not the same... 🤔 |
Ok... from my analysis a can tell, that the only REAL solution is 1. Let me explain why I'm not considering 2 and 3 as a solution. So, as this only happens if there are foreign key, it might be good only in parent and childs tables. However, in the childs table structure, there is no info that it is referenced by another table. That is why it is all or none in a single threaded creation mode. But it will be optional, so users that are not using foreign keys, will have the opportunity to create tables in parallel. |
Excellent, thanks for the fix! I've also realized that its implementation is simpler than I imagined 👍 Thanks again 😄 |
Describe the bug
While restoring tables, an error
Table 'table1' already exists
is reported.To Reproduce
This seems to happen when restoring multiple tables. In my case, I've tried with 1 table and 3 tables; got no error with 1 table, and the error each time with 3 tables. Since this is likely a non-deterministic problem (due to threading), it may be hard to reproduce or pinpoint the necessary conditions.
Command executed:
Expected behavior
No errors should be reported.
Log
The log has been anonymized, however, the table names mapping has been kept:
Backup
Should not be relevant.
Environment (please complete the following information):
regex
branch (commit 0b14ec3)Additional context
MyDumper rocks! 😃
The text was updated successfully, but these errors were encountered: