[Refactoring] Connection pooling implemented, multiple connections per file with INSERTs #1474

davidducos · 2024-04-22T11:29:30Z

This PR will force myloader to import one file per table at a time. However, this file will be imported with multi threads/connections.
Previous to this PR, there each job of a table that has a single file, went to multiple worker_loader which has a single connection. On cases where there was a single file, then it was imported with a single connection.
Now, the jobs of a table will be executed sequentially or just one job per table at a time, but then, the INSERTs will be executed by multiple threads.

How does this improves performance?
On cases where we have a single file per table, it will be imported with multiple threads anyways.

Only that?
This also will reduce fragmentation. https://www.percona.com/blog/myloader-stops-causing-data-fragmentation/

…r file with INSERTs

davidducos · 2024-04-25T12:17:07Z

I tested this solution to show the performance improvements.

The sysbench command that I use to prepare the test case is:

sysbench /usr/share/sysbench/oltp_write_only.lua --table-size=5000000 --tables=1 --threads=1 --mysql-user=root --mysql-db=sbtest --rate=10 --time=0 --report-interval=10 --create_secondary=off prepare

Then, I used this myloader command on all the cases as I only changed the mydumper parameters

./myloader -u root -v 4 -o -d ~/data ; echo $?

Single file:

With this command we are going to create just a single file with the 5M rows

./mydumper -o ~/data/ -B sbtest --clear

Current 77 seconds

** Message: 11:52:41.311: Thread 4: restoring sbtest.sbtest1 part 1 of 1 from sbtest.sbtest1.00000.sql. Progress 1 of 1. Tables 0 of 1 completed
** Message: 11:53:58.105: Thread 4: Data import ended

New 34 seconds

** Message: 11:56:45.587: Thread 2: restoring sbtest.sbtest1 part 1 of 1 from sbtest.sbtest1.00000.sql. Progress 1 of 1. Tables 0 of 1 completed
** Message: 11:56:45.587: Thread 2: Connection 2635 granted
** Message: 11:56:45.591: Thread 2: Connection 2636 granted
** Message: 11:56:45.595: Thread 2: Connection 2637 granted
** Message: 11:56:45.600: Thread 2: Connection 2638 granted
** Message: 11:57:19.841: Thread 2: Data import ended

Multiple files: ./mydumper -o ~/data/ -B sbtest --clear -F 200

With this command we are going to create files of 200MB which in our case are 5 files:

Current 39 seconds

** Message: 11:49:54.550: Thread 4: restoring sbtest.sbtest1 part 3 of 5 from sbtest.sbtest1.00000.00001.sql. Progress 1 of 5. Tables 0 of 1 completed
** Message: 11:49:54.550: Thread 1: restoring sbtest.sbtest1 part 1 of 5 from sbtest.sbtest1.00000.00002.sql. Progress 2 of 5. Tables 0 of 1 completed
** Message: 11:49:54.550: Thread 3: restoring sbtest.sbtest1 part 5 of 5 from sbtest.sbtest1.00000.sql. Progress 3 of 5. Tables 0 of 1 completed
** Message: 11:49:54.551: Thread 2: restoring sbtest.sbtest1 part 4 of 5 from sbtest.sbtest1.00000.00003.sql. Progress 4 of 5. Tables 0 of 1 completed
** Message: 11:50:20.578: Thread 2: restoring sbtest.sbtest1 part 2 of 5 from sbtest.sbtest1.00000.00004.sql. Progress 5 of 5. Tables 0 of 1 completed
** Message: 11:50:20.971: Thread 1: Data import ended
** Message: 11:50:21.837: Thread 3: Data import ended
** Message: 11:50:22.049: Thread 4: Data import ended
** Message: 11:50:33.542: Thread 2: Data import ended

New 34 seconds

** Message: 11:50:52.908: Thread 2: restoring sbtest.sbtest1 part 5 of 5 from sbtest.sbtest1.00000.sql. Progress 1 of 5. Tables 0 of 1 completed
** Message: 11:50:52.908: Thread 2: Connection 2611 granted
** Message: 11:50:52.911: Thread 2: Connection 2612 granted
** Message: 11:50:52.917: Thread 2: Connection 2613 granted
** Message: 11:50:52.921: Thread 2: Connection 2614 granted
** Message: 11:50:59.480: Thread 3: restoring sbtest.sbtest1 part 3 of 5 from sbtest.sbtest1.00000.00001.sql. Progress 2 of 5. Tables 0 of 1 completed
** Message: 11:50:59.480: Thread 3: Connection 2612 granted
** Message: 11:50:59.484: Thread 3: Connection 2614 granted
** Message: 11:50:59.487: Thread 3: Connection 2611 granted
** Message: 11:50:59.490: Thread 3: Connection 2613 granted
** Message: 11:51:06.384: Thread 1: restoring sbtest.sbtest1 part 1 of 5 from sbtest.sbtest1.00000.00002.sql. Progress 3 of 5. Tables 0 of 1 completed
** Message: 11:51:06.385: Thread 1: Connection 2613 granted
** Message: 11:51:06.388: Thread 1: Connection 2612 granted
** Message: 11:51:06.391: Thread 1: Connection 2611 granted
** Message: 11:51:06.393: Thread 1: Connection 2614 granted
** Message: 11:51:13.661: Thread 4: restoring sbtest.sbtest1 part 4 of 5 from sbtest.sbtest1.00000.00003.sql. Progress 4 of 5. Tables 0 of 1 completed
** Message: 11:51:13.661: Thread 4: Connection 2612 granted
** Message: 11:51:13.665: Thread 4: Connection 2611 granted
** Message: 11:51:13.668: Thread 4: Connection 2613 granted
** Message: 11:51:13.671: Thread 4: Connection 2614 granted
** Message: 11:51:20.698: Thread 2: restoring sbtest.sbtest1 part 2 of 5 from sbtest.sbtest1.00000.00004.sql. Progress 5 of 5. Tables 0 of 1 completed
** Message: 11:51:20.699: Thread 2: Connection 2611 granted
** Message: 11:51:20.702: Thread 2: Connection 2613 granted
** Message: 11:51:20.706: Thread 2: Connection 2614 granted
** Message: 11:51:20.709: Thread 2: Connection 2612 granted
** Message: 11:51:26.462: Thread 2: Data import ended
** Message: 11:51:26.462: Thread 4: Data import ended
** Message: 11:51:26.462: Thread 1: Data import ended
** Message: 11:51:26.462: Thread 3: Data import ended

As you can see from the previous test, there were 5 files which sizes were not the same. The command that I executed to check with files with the same sizes was:

./mydumper -o ~/data/ -B sbtest --clear -F 242

The list of files:

# ls -l ~/data
total 990196
-rw-r--r-- 1 root root       660 Apr 25 12:02 metadata
-rw-r----- 1 root root       404 Apr 25 12:02 sbtest-schema-create.sql
-rw-r----- 1 root root       249 Apr 25 12:02 sbtest-schema-triggers.sql
-rw-r----- 1 root root       529 Apr 25 12:02 sbtest.sbtest1-schema.sql
-rw-r----- 1 root root 253950437 Apr 25 12:02 sbtest.sbtest1.00000.00001.sql
-rw-r----- 1 root root 253950436 Apr 25 12:02 sbtest.sbtest1.00000.00002.sql
-rw-r----- 1 root root 252045063 Apr 25 12:02 sbtest.sbtest1.00000.00003.sql
-rw-r----- 1 root root 253973295 Apr 25 12:02 sbtest.sbtest1.00000.sql

Current 33 seconds

** Message: 12:02:49.680: Thread 3: restoring sbtest.sbtest1 part 2 of 4 from sbtest.sbtest1.00000.00001.sql. Progress 1 of 4. Tables 0 of 1 completed
** Message: 12:02:49.680: Thread 4: restoring sbtest.sbtest1 part 1 of 4 from sbtest.sbtest1.00000.00002.sql. Progress 2 of 4. Tables 0 of 1 completed
** Message: 12:02:49.680: Thread 1: restoring sbtest.sbtest1 part 4 of 4 from sbtest.sbtest1.00000.sql. Progress 3 of 4. Tables 0 of 1 completed
** Message: 12:02:49.684: Thread 2: restoring sbtest.sbtest1 part 3 of 4 from sbtest.sbtest1.00000.00003.sql. Progress 4 of 4. Tables 0 of 1 completed
** Message: 12:03:21.628: Thread 1: Data import ended
** Message: 12:03:21.742: Thread 4: Data import ended
** Message: 12:03:22.207: Thread 3: Data import ended
** Message: 12:03:22.531: Thread 2: Data import ended

New 34 seconds

** Message: 12:04:26.829: Thread 2: restoring sbtest.sbtest1 part 4 of 4 from sbtest.sbtest1.00000.sql. Progress 1 of 4. Tables 0 of 1 completed
** Message: 12:04:26.830: Thread 2: Connection 2699 granted
** Message: 12:04:26.833: Thread 2: Connection 2700 granted
** Message: 12:04:26.838: Thread 2: Connection 2701 granted
** Message: 12:04:26.842: Thread 2: Connection 2702 granted
** Message: 12:04:34.912: Thread 4: restoring sbtest.sbtest1 part 2 of 4 from sbtest.sbtest1.00000.00001.sql. Progress 2 of 4. Tables 0 of 1 completed
** Message: 12:04:34.913: Thread 4: Connection 2700 granted
** Message: 12:04:34.917: Thread 4: Connection 2702 granted
** Message: 12:04:34.920: Thread 4: Connection 2699 granted
** Message: 12:04:34.924: Thread 4: Connection 2701 granted
** Message: 12:04:43.372: Thread 4: restoring sbtest.sbtest1 part 1 of 4 from sbtest.sbtest1.00000.00002.sql. Progress 3 of 4. Tables 0 of 1 completed
** Message: 12:04:43.374: Thread 4: Connection 2699 granted
** Message: 12:04:43.378: Thread 4: Connection 2700 granted
** Message: 12:04:43.381: Thread 4: Connection 2702 granted
** Message: 12:04:43.390: Thread 4: Connection 2701 granted
** Message: 12:04:51.935: Thread 3: restoring sbtest.sbtest1 part 3 of 4 from sbtest.sbtest1.00000.00003.sql. Progress 4 of 4. Tables 0 of 1 completed
** Message: 12:04:51.935: Thread 3: Connection 2702 granted
** Message: 12:04:51.939: Thread 3: Connection 2699 granted
** Message: 12:04:51.942: Thread 3: Connection 2700 granted
** Message: 12:04:51.980: Thread 3: Connection 2701 granted
** Message: 12:05:00.671: Thread 4: Data import ended
** Message: 12:05:00.672: Thread 3: Data import ended
** Message: 12:05:00.672: Thread 1: Data import ended
** Message: 12:05:00.672: Thread 2: Data import ended

Conclusions

As we can see, the new version is taking a consistent amount of time. It is not depending on how you are taking the backup. The only case where it will be less than 5% slower, is when the backup has a divisible amount of backup files per table and that the size of the files are the same which happens much less that 5% of the times.

davidducos · 2024-04-25T12:23:33Z

During the developing of this PR, I added several changes to improve other pieces of code:

Control Job: removing code and making the logic less complex
now, myloader should only interact with the database through 2 commands, no need to know about the connection.
user will be aware of the connection that each thread is using

davidducos added 4 commits April 22, 2024 02:05

[Refactoring] Connection pooling implemented, multiple connections pe…

a7a958b

…r file with INSERTs

fixing bug

3e3b9f1

fixing bug

21e82f5

bug fixed

e069496

davidducos added enhancement Refactoring labels Apr 22, 2024

davidducos added this to the Release 0.16.3-1 milestone Apr 22, 2024

davidducos added 8 commits April 22, 2024 12:37

Fixing bug

c56ab06

bug fixed

605af32

bug fixed

fc7ce0f

bug fixed

2734406

bug fixed

d4f4312

bug fixed

076ee85

bug fixed

2f597bd

bug fixed

70799a7

davidducos merged commit 4341aeb into master Apr 25, 2024
35 checks passed

davidducos mentioned this pull request Apr 25, 2024

[bug] Fix --max-threads-per-table not working with myloader #1468

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactoring] Connection pooling implemented, multiple connections per file with INSERTs #1474

[Refactoring] Connection pooling implemented, multiple connections per file with INSERTs #1474

davidducos commented Apr 22, 2024 •

edited

davidducos commented Apr 25, 2024 •

edited

davidducos commented Apr 25, 2024

[Refactoring] Connection pooling implemented, multiple connections per file with INSERTs #1474

[Refactoring] Connection pooling implemented, multiple connections per file with INSERTs #1474

Conversation

davidducos commented Apr 22, 2024 • edited

davidducos commented Apr 25, 2024 • edited

Single file:

Current 77 seconds

New 34 seconds

Multiple files: ./mydumper -o ~/data/ -B sbtest --clear -F 200

Current 39 seconds

New 34 seconds

Current 33 seconds

New 34 seconds

Conclusions

davidducos commented Apr 25, 2024

davidducos commented Apr 22, 2024 •

edited

davidducos commented Apr 25, 2024 •

edited