-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ Enhancement ] Improve --rows implementation #529
Comments
we are seeing the above issue while dumping a table with a billion rows when we try to dump with we don'y really want to use the Any alternative options you can suggest where we can limit the amount of read activity per operation? |
Hi @balusarakesh, if you want to decrease the reads, you should decrease the amoun of threads with -t |
Btw what version of mydumper are you using? |
@davidducos we are using the latest version of mydumper and the number of threads is set to just 1 and --rows is set to 100,000 and we see a lot of spike in read-iops |
Hi @balusarakesh , mydumper doesn't perform write operations in the source servers. It is myloader the one that execute the inserts. |
I'm also running into this problem with --rows since upgrading to 0.11.5. Prior to that I was using 0.9.1-5 from the Ubuntu 18.04 repo and the behaviour of --rows on this table was very fast. When trying to backup the following table with 0.11.5, it basically never completes. I let it run for a few hours before giving up, by which time it had generated number suffix files over 100000.
I've switched to
For comparison, here's performance using
I would be happy to provide timing comparisons with any proposed changed version on this table, to confirm that this use case is handled. Using |
I also encountered the above problem. I hope the author can optimize this performance problem as soon as possible. Thank you |
Stage I should be reducing the amount of memory used. The Stage II will be implemented on next releases. |
@davidducos Could we detect if a table need to be split by rows when detect engine with |
Chunk builder has been added and it is better understanding of the chunk that is going to be executed next time that we need to get a chunk of data from the table. So, what remains? the logic to dynamically increase or reduce the chunk size, as currently the step is static (chunk size and step are synonyms in this context). My idea is to use --rows to set the initial step and then add 2 more parameter for min and max, or use something like |
Currently, when you use --rows, mydumper is going to:
This is simple and works for the cases where there are no gaps in the pk. However, it might be cases where jobs takes few milliseconds and other that take hundreds of seconds. Another thing to take into account is the amount of files that are going to be created, you just don't know.
So, my idea is to increase the chunk size dynamically taking in account the execution time of the chunk. The goal is to keep the time as nearest to 1 second as possible. We could start reducing to the half if the size if it is larger than 2 seconds and increase to the double it if it is lower than 0.5 seconds.
At this moment, I'm not 100% sure how to implement it. However, other issues that will be merged will allow it to be simpler to implement.
The text was updated successfully, but these errors were encountered: