Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent a cursor timeout when syncing rpm repos #1015

Merged
merged 1 commit into from Jan 30, 2017

Conversation

daviddavis
Copy link
Contributor

@daviddavis daviddavis commented Jan 13, 2017

I couldn't actually reproduce the cursor timeout using a variety of
configurations for my VM but using a batch size here didn't seem to slow
down syncs significantly when I benchmarked this change.

fixes #2502
https://pulp.plan.io/issues/2502

I couldn't actually reproduce the cursor timeout using a variety of
configurations for my VM but using a batch size here didn't seem to slow
down syncs significantly.

fixes pulp#2502
https://pulp.plan.io/issues/2502
@mention-bot
Copy link

@daviddavis, thanks for your PR! By analyzing the history of the files in this pull request, we identified @seandst, @mhrivnak and @bmbouter to be potential reviewers.

@daviddavis daviddavis changed the base branch from master to 2.11-dev January 13, 2017 20:18
@daviddavis
Copy link
Contributor Author

ok test

aggregation = unit.objects.aggregate(sort, project, allowDiskUse=True)
# Set the batch size to 5 to prevent a cursor timeout
aggregation = unit.objects.aggregate(sort, project, allowDiskUse=True,
batchSize=5)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the max document size of 16MB and (I think) that we know the max length of the nevra fields, I wonder if we can come up with a calculated batch size that further improves the performance here.

That said, I'm missing the connection between increasing the amount of work done in mongo (and I assume the time taken to do that work), and preventing a cursor timeout. Can you elabortate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cursor timeout occurs when the cursor has timed out between batch fetches. The variable here is how long the code is taking between fetching batches of records. Decreasing the batch size means smaller batches, more fetches from the cursor, and so forth.

It's hard to gauge the optimal batch size because each Pulp installation might have different hardware specs, be running different processes, etc. All of these things could affect how long the code is taking between each batch fetch. Another option would be to turn off the cursor timeout but that might cause cursors to be left open indefinitely if the process dies.

Does that make sense?

@daviddavis
Copy link
Contributor Author

ok test

Copy link
Member

@dkliban dkliban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@daviddavis daviddavis merged commit a509e22 into pulp:2.11-dev Jan 30, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants