Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Refactoring] Connection pooling implemented, multiple connections per file with INSERTs #1474

Merged
merged 12 commits into from
Apr 25, 2024

Conversation

davidducos
Copy link
Member

@davidducos davidducos commented Apr 22, 2024

This PR will force myloader to import one file per table at a time. However, this file will be imported with multi threads/connections.
Previous to this PR, there each job of a table that has a single file, went to multiple worker_loader which has a single connection. On cases where there was a single file, then it was imported with a single connection.
Now, the jobs of a table will be executed sequentially or just one job per table at a time, but then, the INSERTs will be executed by multiple threads.

How does this improves performance?
On cases where we have a single file per table, it will be imported with multiple threads anyways.

Only that?
This also will reduce fragmentation. https://www.percona.com/blog/myloader-stops-causing-data-fragmentation/

@davidducos
Copy link
Member Author

davidducos commented Apr 25, 2024

I tested this solution to show the performance improvements.

The sysbench command that I use to prepare the test case is:

sysbench /usr/share/sysbench/oltp_write_only.lua --table-size=5000000 --tables=1 --threads=1 --mysql-user=root --mysql-db=sbtest --rate=10 --time=0 --report-interval=10 --create_secondary=off prepare

Then, I used this myloader command on all the cases as I only changed the mydumper parameters

./myloader -u root -v 4 -o -d ~/data ; echo $?

Single file:

With this command we are going to create just a single file with the 5M rows

./mydumper -o ~/data/ -B sbtest --clear

Current 77 seconds

** Message: 11:52:41.311: Thread 4: restoring sbtest.sbtest1 part 1 of 1 from sbtest.sbtest1.00000.sql. Progress 1 of 1. Tables 0 of 1 completed
** Message: 11:53:58.105: Thread 4: Data import ended

New 34 seconds

** Message: 11:56:45.587: Thread 2: restoring sbtest.sbtest1 part 1 of 1 from sbtest.sbtest1.00000.sql. Progress 1 of 1. Tables 0 of 1 completed
** Message: 11:56:45.587: Thread 2: Connection 2635 granted
** Message: 11:56:45.591: Thread 2: Connection 2636 granted
** Message: 11:56:45.595: Thread 2: Connection 2637 granted
** Message: 11:56:45.600: Thread 2: Connection 2638 granted
** Message: 11:57:19.841: Thread 2: Data import ended

Multiple files: ./mydumper -o ~/data/ -B sbtest --clear -F 200

With this command we are going to create files of 200MB which in our case are 5 files:

Current 39 seconds

** Message: 11:49:54.550: Thread 4: restoring sbtest.sbtest1 part 3 of 5 from sbtest.sbtest1.00000.00001.sql. Progress 1 of 5. Tables 0 of 1 completed
** Message: 11:49:54.550: Thread 1: restoring sbtest.sbtest1 part 1 of 5 from sbtest.sbtest1.00000.00002.sql. Progress 2 of 5. Tables 0 of 1 completed
** Message: 11:49:54.550: Thread 3: restoring sbtest.sbtest1 part 5 of 5 from sbtest.sbtest1.00000.sql. Progress 3 of 5. Tables 0 of 1 completed
** Message: 11:49:54.551: Thread 2: restoring sbtest.sbtest1 part 4 of 5 from sbtest.sbtest1.00000.00003.sql. Progress 4 of 5. Tables 0 of 1 completed
** Message: 11:50:20.578: Thread 2: restoring sbtest.sbtest1 part 2 of 5 from sbtest.sbtest1.00000.00004.sql. Progress 5 of 5. Tables 0 of 1 completed
** Message: 11:50:20.971: Thread 1: Data import ended
** Message: 11:50:21.837: Thread 3: Data import ended
** Message: 11:50:22.049: Thread 4: Data import ended
** Message: 11:50:33.542: Thread 2: Data import ended

New 34 seconds

** Message: 11:50:52.908: Thread 2: restoring sbtest.sbtest1 part 5 of 5 from sbtest.sbtest1.00000.sql. Progress 1 of 5. Tables 0 of 1 completed
** Message: 11:50:52.908: Thread 2: Connection 2611 granted
** Message: 11:50:52.911: Thread 2: Connection 2612 granted
** Message: 11:50:52.917: Thread 2: Connection 2613 granted
** Message: 11:50:52.921: Thread 2: Connection 2614 granted
** Message: 11:50:59.480: Thread 3: restoring sbtest.sbtest1 part 3 of 5 from sbtest.sbtest1.00000.00001.sql. Progress 2 of 5. Tables 0 of 1 completed
** Message: 11:50:59.480: Thread 3: Connection 2612 granted
** Message: 11:50:59.484: Thread 3: Connection 2614 granted
** Message: 11:50:59.487: Thread 3: Connection 2611 granted
** Message: 11:50:59.490: Thread 3: Connection 2613 granted
** Message: 11:51:06.384: Thread 1: restoring sbtest.sbtest1 part 1 of 5 from sbtest.sbtest1.00000.00002.sql. Progress 3 of 5. Tables 0 of 1 completed
** Message: 11:51:06.385: Thread 1: Connection 2613 granted
** Message: 11:51:06.388: Thread 1: Connection 2612 granted
** Message: 11:51:06.391: Thread 1: Connection 2611 granted
** Message: 11:51:06.393: Thread 1: Connection 2614 granted
** Message: 11:51:13.661: Thread 4: restoring sbtest.sbtest1 part 4 of 5 from sbtest.sbtest1.00000.00003.sql. Progress 4 of 5. Tables 0 of 1 completed
** Message: 11:51:13.661: Thread 4: Connection 2612 granted
** Message: 11:51:13.665: Thread 4: Connection 2611 granted
** Message: 11:51:13.668: Thread 4: Connection 2613 granted
** Message: 11:51:13.671: Thread 4: Connection 2614 granted
** Message: 11:51:20.698: Thread 2: restoring sbtest.sbtest1 part 2 of 5 from sbtest.sbtest1.00000.00004.sql. Progress 5 of 5. Tables 0 of 1 completed
** Message: 11:51:20.699: Thread 2: Connection 2611 granted
** Message: 11:51:20.702: Thread 2: Connection 2613 granted
** Message: 11:51:20.706: Thread 2: Connection 2614 granted
** Message: 11:51:20.709: Thread 2: Connection 2612 granted
** Message: 11:51:26.462: Thread 2: Data import ended
** Message: 11:51:26.462: Thread 4: Data import ended
** Message: 11:51:26.462: Thread 1: Data import ended
** Message: 11:51:26.462: Thread 3: Data import ended

As you can see from the previous test, there were 5 files which sizes were not the same. The command that I executed to check with files with the same sizes was:

./mydumper -o ~/data/ -B sbtest --clear -F 242

The list of files:

# ls -l ~/data
total 990196
-rw-r--r-- 1 root root       660 Apr 25 12:02 metadata
-rw-r----- 1 root root       404 Apr 25 12:02 sbtest-schema-create.sql
-rw-r----- 1 root root       249 Apr 25 12:02 sbtest-schema-triggers.sql
-rw-r----- 1 root root       529 Apr 25 12:02 sbtest.sbtest1-schema.sql
-rw-r----- 1 root root 253950437 Apr 25 12:02 sbtest.sbtest1.00000.00001.sql
-rw-r----- 1 root root 253950436 Apr 25 12:02 sbtest.sbtest1.00000.00002.sql
-rw-r----- 1 root root 252045063 Apr 25 12:02 sbtest.sbtest1.00000.00003.sql
-rw-r----- 1 root root 253973295 Apr 25 12:02 sbtest.sbtest1.00000.sql

Current 33 seconds

** Message: 12:02:49.680: Thread 3: restoring sbtest.sbtest1 part 2 of 4 from sbtest.sbtest1.00000.00001.sql. Progress 1 of 4. Tables 0 of 1 completed
** Message: 12:02:49.680: Thread 4: restoring sbtest.sbtest1 part 1 of 4 from sbtest.sbtest1.00000.00002.sql. Progress 2 of 4. Tables 0 of 1 completed
** Message: 12:02:49.680: Thread 1: restoring sbtest.sbtest1 part 4 of 4 from sbtest.sbtest1.00000.sql. Progress 3 of 4. Tables 0 of 1 completed
** Message: 12:02:49.684: Thread 2: restoring sbtest.sbtest1 part 3 of 4 from sbtest.sbtest1.00000.00003.sql. Progress 4 of 4. Tables 0 of 1 completed
** Message: 12:03:21.628: Thread 1: Data import ended
** Message: 12:03:21.742: Thread 4: Data import ended
** Message: 12:03:22.207: Thread 3: Data import ended
** Message: 12:03:22.531: Thread 2: Data import ended

New 34 seconds

** Message: 12:04:26.829: Thread 2: restoring sbtest.sbtest1 part 4 of 4 from sbtest.sbtest1.00000.sql. Progress 1 of 4. Tables 0 of 1 completed
** Message: 12:04:26.830: Thread 2: Connection 2699 granted
** Message: 12:04:26.833: Thread 2: Connection 2700 granted
** Message: 12:04:26.838: Thread 2: Connection 2701 granted
** Message: 12:04:26.842: Thread 2: Connection 2702 granted
** Message: 12:04:34.912: Thread 4: restoring sbtest.sbtest1 part 2 of 4 from sbtest.sbtest1.00000.00001.sql. Progress 2 of 4. Tables 0 of 1 completed
** Message: 12:04:34.913: Thread 4: Connection 2700 granted
** Message: 12:04:34.917: Thread 4: Connection 2702 granted
** Message: 12:04:34.920: Thread 4: Connection 2699 granted
** Message: 12:04:34.924: Thread 4: Connection 2701 granted
** Message: 12:04:43.372: Thread 4: restoring sbtest.sbtest1 part 1 of 4 from sbtest.sbtest1.00000.00002.sql. Progress 3 of 4. Tables 0 of 1 completed
** Message: 12:04:43.374: Thread 4: Connection 2699 granted
** Message: 12:04:43.378: Thread 4: Connection 2700 granted
** Message: 12:04:43.381: Thread 4: Connection 2702 granted
** Message: 12:04:43.390: Thread 4: Connection 2701 granted
** Message: 12:04:51.935: Thread 3: restoring sbtest.sbtest1 part 3 of 4 from sbtest.sbtest1.00000.00003.sql. Progress 4 of 4. Tables 0 of 1 completed
** Message: 12:04:51.935: Thread 3: Connection 2702 granted
** Message: 12:04:51.939: Thread 3: Connection 2699 granted
** Message: 12:04:51.942: Thread 3: Connection 2700 granted
** Message: 12:04:51.980: Thread 3: Connection 2701 granted
** Message: 12:05:00.671: Thread 4: Data import ended
** Message: 12:05:00.672: Thread 3: Data import ended
** Message: 12:05:00.672: Thread 1: Data import ended
** Message: 12:05:00.672: Thread 2: Data import ended

Conclusions

As we can see, the new version is taking a consistent amount of time. It is not depending on how you are taking the backup. The only case where it will be less than 5% slower, is when the backup has a divisible amount of backup files per table and that the size of the files are the same which happens much less that 5% of the times.

@davidducos
Copy link
Member Author

During the developing of this PR, I added several changes to improve other pieces of code:

  • Control Job: removing code and making the logic less complex
  • now, myloader should only interact with the database through 2 commands, no need to know about the connection.
  • user will be aware of the connection that each thread is using

@davidducos davidducos merged commit 4341aeb into master Apr 25, 2024
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant