Sequence, ANSI, ORACLE and stability fixes #1355

midenok · 2023-11-21T11:18:35Z

See fix descriptions in commit messages. Sequence fix contans part for dir mode. Stream mode should be fixed as a separate task.

Duplicated by #1341, #1334

Note: there is some setup problem with 2 bionic buildbots.

Example of myloader trace Thread descriptor is added only for --debug option

davidducos · 2023-11-21T14:29:58Z

.circleci/config.yml

@@ -974,18 +974,18 @@ commands:
  prepare_ubuntu_percona57:
    steps:
    - run: sudo percona-release setup -y ps57
-    - run: sudo apt-get install -y libperconaserverclient20 percona-server-client-5.7 libperconaserverclient20-dev
+    - run: sudo apt-get install -y gdb libperconaserverclient20 percona-server-client-5.7 libperconaserverclient20-dev


You should not edit the .circleci/config.yml file as it is generated by .circleci/circleci_config_builder.sh

I changed circleci_config_builder.sh

davidducos · 2023-11-21T14:34:20Z

src/myloader.c

@@ -399,7 +404,7 @@ int main(int argc, char *argv[]) {

  struct thread_data t;
  t.thread_id = 0;
-  t.conf = &conf;
+  t.conf = &conf; // TODO: if conf is singleton it must be accessed as global variable


On myloader yes! I'm agree with you on this. However, on mydumper it is not.

This is myloader code. Ok for mydumper.

CREATE TABLE fails if table referencing sequence is restored before sequence. This can happen because sequence restore happens at the very late stage of view restore. To fix this we move sequences restore in between database and table schema restore. We enqueue sequences to buffer queue database::sequence_queue like it is done for tables with database::queue. And we requeue sequence_queue before tables queue in set_db_schema_state_to_created(). We trigger reque like for tables: after database schema was created. When all sequences processed (sequences_processed == sequences) we requeue tables same way in set_db_schema_state_to_created(). There are 2 stages in INTERMEDIATE_ENDED. When first stage entered we wait until all sequences processed and then initiate for all dbs set_db_schema_state_to_created() which requeues their tables into creating queue. INTERMEDIATE_ENDED does loop over all databases. Though conf is the same in all td, according to design set_db_schema_created should access correct td, so we saved it into database struct. Bug description: https://jira.mariadb.org/browse/CONTRIB-25 Fixes mydumper#1338 Sporadic db switch failed fix Race condition: INTERMEDIATE_ENDED may be reached before db schema has been created. INTERMEDIATE_ENDED did set_db_schema_state_to_created() which allowed table schema thread to proceed and it failed because db schema was not yet created. We fix this by waiting INTERMEDIATE_ENDED until each db schema is created. INTERMEDIATE_ENDED does not do requeue now like it did with set_db_schema_state_to_created(), SCHEMA_CREATE does this as it was originally meant to do and signals to INTERMEDIATE_ENDED on cond variable.

Typo fix: datalength index is 5.

There are a lot of useless g_strdup() still, presumably overwriting each others' pointers.

Proper tracing includes thread identifiers borrowed from pthread_t pointers (crafted thread id is not available in every function). Essential for race condition hunting!

INTERMEDIATE_ENDED didn't send JOB_SHUTDOWN properly. It was meant to send JOB_SHUTDOWN to all schema threads, but it turned out it sends all JOB_SHUTDOWN to himself: td was wrongly used. Now we iterate td in the loop across schema_td array.

control_job_thread() initial threads_waiting wrong 0. If INTERMEDIATE_ENDED goes really fast, we call wake_threads_waiting() with 0 which leads to myloader hang.

Related to 14510e3 and https://jira.mariadb.org/browse/CONTRIB-15 g_strsplit() by quote character did wrong if there is quote character inside identifier itself. real_table was made empty string in this case.

Removed --identifier-quote-character from mydumper, it is now auto-detectected from ANSI or ANSI_QUOTES mode. myloader auto-detects --identifier-quote-character from metadata first matching group name or it may be forced by command-line option. Bug description: https://jira.mariadb.org/browse/CONTRIB-32 Detect quote character v2 (without any metadata changes)

For DROP TABLE we have to set FOREIGN_KEY_CHECKS=0 as well

Like in mysqldump iterate through routine_type to get FUNCTION, PROCEDURE, PACKAGE and PACKAGE BODY. Bug description: https://jira.mariadb.org/browse/CONTRIB-26

NO_TABLE_OPTIONS, NO_TABLE_OPTIONS, NO_FIELD_OPTIONS are dangerous options for mydumper. With NO_TABLE_OPTIONS we miss CHARSET=utf8mb3 from the dump of table schema. Default charset for database is utf8mb4. The schema is so large, so CREATE TABLE under utf8mb4 fails with error: Row size too large. The maximum row size for the used table type, not counting BLOBs, is 65535. But just removing TABLE_OPTIONS, NO_TABLE_OPTIONS, NO_FIELD_OPTIONS from SQL_MODE is not enough. Setting ORACLE automatically adds these to SQL_MODE: PIPES_AS_CONCAT ANSI_QUOTES IGNORE_SPACE NO_KEY_OPTIONS NO_TABLE_OPTIONS NO_FIELD_OPTIONS NO_AUTO_CREATE_USER SIMULTANEOUS_ASSIGNMENT Therefore when dumping table schema we must cut out ORACLE from session variables (as well as other dangerous options). Bug description: https://jira.mariadb.org/browse/CONTRIB-32

Useful for metadata comparison of two dumps.

Fixed: 1. wrong open() status check; 2. wrong directory of metadata.partial.N files; 3. dbt access after free() in process_stream() thread (CONTRIB-10); 4. metadata.partial table could be wrongly detected as metadata file. Stream now first writes metadata.header containing [config] section for proper myloader startup. Findings In stream mode mydumper dumps sql files like usual. After table is dumped metadata_partial_writer() thread gets job to write metadata.partial.N file: a bit of metadata for the table. After metadata bit is written a new job is created for process_stream() which outputs both metadata bit and sql file into STDOUT. In the end we output all databases metadata as the part of usual metadata write (common to non-stream mode). Then we order a job for process_stream() to output that metadata into STDOUT. References: https://jira.mariadb.org/browse/CONTRIB-10

CMake ASAN build fix

In one thread we cannot lock itself on waiting for other thread finishes because there no other thread. Instead we return INTERMEDIATE_ENDED into queue and reiterate. FIXME: merge with 1917e54

give_me_next_data_job_conf() accesses dbt->database->schema_state when dbt is already freed. That happens because main() doesn't wait for control job thread to finish.

midenok · 2023-11-27T21:26:18Z

Thanks!

midenok force-pushed the fixes branch 4 times, most recently from c4636de to 8f97946 Compare November 21, 2023 13:28

midenok changed the title ~~Duplicate of #1341~~ sequence, ANSI and stability fixes Nov 21, 2023

midenok force-pushed the fixes branch from 8f97946 to 6f4a59b Compare November 21, 2023 13:31

midenok requested a review from davidducos November 21, 2023 14:01

midenok self-assigned this Nov 21, 2023

midenok added the Fix label Nov 21, 2023

davidducos reviewed Nov 21, 2023

View reviewed changes

midenok mentioned this pull request Nov 22, 2023

Fix: myloader fails to restore table that references sequence + stability fixes #1341

Closed

midenok added 7 commits November 24, 2023 13:13

Fix wrong datalength index

a03ff23

Typo fix: datalength index is 5.

Some memory leaks fixes

41a8a0a

There are a lot of useless g_strdup() still, presumably overwriting each others' pointers.

Debug trace v2 for important queues

4b14ee2

Proper tracing includes thread identifiers borrowed from pthread_t pointers (crafted thread id is not available in every function). Essential for race condition hunting!

myloader hangs after jobs done fix

e9d5ff5

INTERMEDIATE_ENDED didn't send JOB_SHUTDOWN properly. It was meant to send JOB_SHUTDOWN to all schema threads, but it turned out it sends all JOB_SHUTDOWN to himself: td was wrongly used. Now we iterate td in the loop across schema_td array.

myloader one more hang fix

aaded1e

control_job_thread() initial threads_waiting wrong 0. If INTERMEDIATE_ENDED goes really fast, we call wake_threads_waiting() with 0 which leads to myloader hang.

real_table parsing fix

68f7a44

Related to 14510e3 and https://jira.mariadb.org/browse/CONTRIB-15 g_strsplit() by quote character did wrong if there is quote character inside identifier itself. real_table was made empty string in this case.

midenok force-pushed the fixes branch 2 times, most recently from edeb4db to a5c83ca Compare November 24, 2023 19:04

midenok changed the title ~~sequence, ANSI and stability fixes~~ sequence, ANSI, ORACLE and stability fixes Nov 24, 2023

midenok changed the title ~~sequence, ANSI, ORACLE and stability fixes~~ Sequence, ANSI, ORACLE and stability fixes Nov 24, 2023

midenok force-pushed the fixes branch 5 times, most recently from 3725459 to ce1ca25 Compare November 24, 2023 21:06

midenok added 2 commits November 25, 2023 00:17

test_mydumper.sh: print core

8e5f6b2

midenok added 4 commits November 25, 2023 00:18

myloader --overwrite-tables foreign keys constraint fix

11099ac

For DROP TABLE we have to set FOREIGN_KEY_CHECKS=0 as well

mydumper doesn't dump packages in oracle mode

fd643dd

Like in mysqldump iterate through routine_type to get FUNCTION, PROCEDURE, PACKAGE and PACKAGE BODY. Bug description: https://jira.mariadb.org/browse/CONTRIB-26

Sorted metadata

5d589fc

Useful for metadata comparison of two dumps.

midenok force-pushed the fixes branch 5 times, most recently from b878275 to faba7dc Compare November 25, 2023 16:57

midenok mentioned this pull request Nov 25, 2023

MyDumper hangs with --stream option if DB is big #1334

Closed

midenok force-pushed the fixes branch from faba7dc to d117c18 Compare November 25, 2023 17:03

midenok force-pushed the fixes branch from d117c18 to 33584ae Compare November 25, 2023 17:51

midenok requested a review from davidducos November 25, 2023 18:51

midenok and others added 4 commits November 26, 2023 15:47

circleci_config_builder.sh: gdb and libasan packages

93f085e

CMake ASAN build fix

Removing buster for MariaDB

793b42c

Sequences and --max-threads-for-schema-creation=1 fix

6afab3b

In one thread we cannot lock itself on waiting for other thread finishes because there no other thread. Instead we return INTERMEDIATE_ENDED into queue and reiterate. FIXME: merge with 1917e54

Control job thread crash fix

78f6c8b

give_me_next_data_job_conf() accesses dbt->database->schema_state when dbt is already freed. That happens because main() doesn't wait for control job thread to finish.

midenok force-pushed the fixes branch from 2c81d3a to 78f6c8b Compare November 26, 2023 12:48

davidducos approved these changes Nov 27, 2023

View reviewed changes

davidducos added this to the Release 0.16.1-1 milestone Nov 27, 2023

davidducos merged commit 5fbacbd into mydumper:master Nov 27, 2023
35 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sequence, ANSI, ORACLE and stability fixes #1355

Sequence, ANSI, ORACLE and stability fixes #1355

midenok commented Nov 21, 2023 •

edited

davidducos Nov 21, 2023

midenok Nov 25, 2023

davidducos Nov 21, 2023

midenok Nov 21, 2023

midenok commented Nov 27, 2023

Sequence, ANSI, ORACLE and stability fixes #1355

Sequence, ANSI, ORACLE and stability fixes #1355

Conversation

midenok commented Nov 21, 2023 • edited

davidducos Nov 21, 2023

Choose a reason for hiding this comment

midenok Nov 25, 2023

Choose a reason for hiding this comment

davidducos Nov 21, 2023

Choose a reason for hiding this comment

midenok Nov 21, 2023

Choose a reason for hiding this comment

midenok commented Nov 27, 2023

midenok commented Nov 21, 2023 •

edited