Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
[5.2] Speed up chunk for large sets of data #12861
I've been working on a project that involves sifting through some pretty large databases, and the query builder's chunk method was not really doing it for me because it slowed down so much as it worked through the table.
As chunk gets deeper and deeper into a table, it asks MySQL to count through more and more rows. When chunk says to MySQL,
I realize this may be too much an edge case, so didn't want to spend too much time working on the code; just looking for feedback at this point.
If you're still reading, here are some scrappy benchmarks. :)
After 4 seconds, chunk is very optimistic:
However, in the end the process has taken substantially longer:
As you can imagine, the problem gets progressively worse as the database gets larger.
If we instead query by
Again, after 4 seconds:
And the same set takes only 21s in the end.
Yeah, it's def a pretty specific use case (large set, ordered by id), but that might be common enough. I mean, for people using an auto incrementing id, using a
Fwiw, for my needs i ended up implementing a method using
So far i've kept the changed functionality completely separated in a
Come to think of it, i guess
referenced this pull request
Apr 13, 2016
@kevindoole Did you perhaps also benchmark chunking with a