New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent a cursor timeout when syncing rpm repos #1015
Conversation
I couldn't actually reproduce the cursor timeout using a variety of configurations for my VM but using a batch size here didn't seem to slow down syncs significantly. fixes pulp#2502 https://pulp.plan.io/issues/2502
@daviddavis, thanks for your PR! By analyzing the history of the files in this pull request, we identified @seandst, @mhrivnak and @bmbouter to be potential reviewers. |
ok test |
aggregation = unit.objects.aggregate(sort, project, allowDiskUse=True) | ||
# Set the batch size to 5 to prevent a cursor timeout | ||
aggregation = unit.objects.aggregate(sort, project, allowDiskUse=True, | ||
batchSize=5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the max document size of 16MB and (I think) that we know the max length of the nevra fields, I wonder if we can come up with a calculated batch size that further improves the performance here.
That said, I'm missing the connection between increasing the amount of work done in mongo (and I assume the time taken to do that work), and preventing a cursor timeout. Can you elabortate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A cursor timeout occurs when the cursor has timed out between batch fetches. The variable here is how long the code is taking between fetching batches of records. Decreasing the batch size means smaller batches, more fetches from the cursor, and so forth.
It's hard to gauge the optimal batch size because each Pulp installation might have different hardware specs, be running different processes, etc. All of these things could affect how long the code is taking between each batch fetch. Another option would be to turn off the cursor timeout but that might cause cursors to be left open indefinitely if the process dies.
Does that make sense?
ok test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
I couldn't actually reproduce the cursor timeout using a variety of
configurations for my VM but using a batch size here didn't seem to slow
down syncs significantly when I benchmarked this change.
fixes #2502
https://pulp.plan.io/issues/2502