Prevent a cursor timeout when syncing rpm repos #1015

daviddavis · 2017-01-13T20:16:39Z

I couldn't actually reproduce the cursor timeout using a variety of
configurations for my VM but using a batch size here didn't seem to slow
down syncs significantly when I benchmarked this change.

fixes #2502
https://pulp.plan.io/issues/2502

I couldn't actually reproduce the cursor timeout using a variety of configurations for my VM but using a batch size here didn't seem to slow down syncs significantly. fixes pulp#2502 https://pulp.plan.io/issues/2502

mention-bot · 2017-01-13T20:16:40Z

@daviddavis, thanks for your PR! By analyzing the history of the files in this pull request, we identified @seandst, @mhrivnak and @bmbouter to be potential reviewers.

daviddavis · 2017-01-16T14:07:41Z

ok test

seandst · 2017-01-18T19:16:24Z

plugins/pulp_rpm/plugins/importers/yum/purge.py

-    aggregation = unit.objects.aggregate(sort, project, allowDiskUse=True)
+    # Set the batch size to 5 to prevent a cursor timeout
+    aggregation = unit.objects.aggregate(sort, project, allowDiskUse=True,
+                                         batchSize=5)


Given the max document size of 16MB and (I think) that we know the max length of the nevra fields, I wonder if we can come up with a calculated batch size that further improves the performance here.

That said, I'm missing the connection between increasing the amount of work done in mongo (and I assume the time taken to do that work), and preventing a cursor timeout. Can you elabortate?

A cursor timeout occurs when the cursor has timed out between batch fetches. The variable here is how long the code is taking between fetching batches of records. Decreasing the batch size means smaller batches, more fetches from the cursor, and so forth.

It's hard to gauge the optimal batch size because each Pulp installation might have different hardware specs, be running different processes, etc. All of these things could affect how long the code is taking between each batch fetch. Another option would be to turn off the cursor timeout but that might cause cursors to be left open indefinitely if the process dies.

Does that make sense?

daviddavis · 2017-01-25T20:31:09Z

ok test

dkliban

Thank you!

Prevent a cursor timeout when syncing rpm repos

b4d57ec

I couldn't actually reproduce the cursor timeout using a variety of configurations for my VM but using a batch size here didn't seem to slow down syncs significantly. fixes pulp#2502 https://pulp.plan.io/issues/2502

daviddavis changed the base branch from master to 2.11-dev January 13, 2017 20:18

seandst reviewed Jan 18, 2017

View reviewed changes

dkliban approved these changes Jan 30, 2017

View reviewed changes

daviddavis merged commit a509e22 into pulp:2.11-dev Jan 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent a cursor timeout when syncing rpm repos #1015

Prevent a cursor timeout when syncing rpm repos #1015

daviddavis commented Jan 13, 2017 •

edited

mention-bot commented Jan 13, 2017

daviddavis commented Jan 16, 2017

seandst Jan 18, 2017

daviddavis Jan 18, 2017 •

edited

daviddavis commented Jan 25, 2017

dkliban left a comment

Prevent a cursor timeout when syncing rpm repos #1015

Prevent a cursor timeout when syncing rpm repos #1015

Conversation

daviddavis commented Jan 13, 2017 • edited

mention-bot commented Jan 13, 2017

daviddavis commented Jan 16, 2017

seandst Jan 18, 2017

Choose a reason for hiding this comment

daviddavis Jan 18, 2017 • edited

Choose a reason for hiding this comment

daviddavis commented Jan 25, 2017

dkliban left a comment

Choose a reason for hiding this comment

daviddavis commented Jan 13, 2017 •

edited

daviddavis Jan 18, 2017 •

edited