New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-35094: Improved algorithms for random.sample #10192
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Unfortunately our records indicate you have not signed the CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. You can check yourself to see if the CLA has been received. Thanks again for your contribution, we look forward to reviewing it! |
Lib/random.py
Outdated
if is_set or k*2 >= n: | ||
for i, item in enumerate(population): | ||
r = randbelow(i + 1) | ||
if r < k: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can reduce one indentation level with an early continue:
if r >= k:
continue
if i < k:
result[i] = result[r]
result[r] = item
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed - thanks!
BTW CLA signed at 10:52 Pacific time this morning, just waiting for the systems to catch up! |
For future reference, here are my notes from evaluating the PR: Call count summary
Test Code
Results with the PR Applied
Results for the Baseline Version
|
Also, here at the comparative timings with and without the patch: With the PR Applied
Baseline
|
Thanks for following up on this! Surprised by the numbers you're seeing - I'll try reproducing your tests and investigate further. Thanks again! |
Current algorithms for random.sample allocate considerable auxiliary memory to track what's been used so far; with this pull request we sample in a maximally memory efficient way. Peformance is similar or in some cases faster. See also https://github.com/ciphergoth/sansreplace
https://bugs.python.org/issue35094