-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Refactoring] Connection pooling implemented, multiple connections per file with INSERTs #1474
Conversation
I tested this solution to show the performance improvements. The sysbench command that I use to prepare the test case is:
Then, I used this myloader command on all the cases as I only changed the mydumper parameters
Single file:With this command we are going to create just a single file with the 5M rows
Current 77 seconds
New 34 seconds
Multiple files: ./mydumper -o ~/data/ -B sbtest --clear -F 200With this command we are going to create files of 200MB which in our case are 5 files: Current 39 seconds
New 34 seconds
As you can see from the previous test, there were 5 files which sizes were not the same. The command that I executed to check with files with the same sizes was:
The list of files:
Current 33 seconds
New 34 seconds
ConclusionsAs we can see, the new version is taking a consistent amount of time. It is not depending on how you are taking the backup. The only case where it will be less than 5% slower, is when the backup has a divisible amount of backup files per table and that the size of the files are the same which happens much less that 5% of the times. |
During the developing of this PR, I added several changes to improve other pieces of code:
|
This PR will force myloader to import one file per table at a time. However, this file will be imported with multi threads/connections.
Previous to this PR, there each job of a table that has a single file, went to multiple worker_loader which has a single connection. On cases where there was a single file, then it was imported with a single connection.
Now, the jobs of a table will be executed sequentially or just one job per table at a time, but then, the INSERTs will be executed by multiple threads.
How does this improves performance?
On cases where we have a single file per table, it will be imported with multiple threads anyways.
Only that?
This also will reduce fragmentation. https://www.percona.com/blog/myloader-stops-causing-data-fragmentation/