Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added test to test_random.py testing Random.shuffle #60041

Closed
eng793 mannequin opened this issue Sep 1, 2012 · 17 comments
Closed

Added test to test_random.py testing Random.shuffle #60041

eng793 mannequin opened this issue Sep 1, 2012 · 17 comments
Labels
tests Tests in the Lib/test dir type-feature A feature request or enhancement

Comments

@eng793
Copy link
Mannequin

eng793 mannequin commented Sep 1, 2012

BPO 15837
Nosy @rhettinger, @pitrou, @ezio-melotti, @bitdancer, @serhiy-storchaka
Files
  • random.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2012-11-04.01:11:50.610>
    created_at = <Date 2012-09-01.02:06:18.538>
    labels = ['type-feature', 'tests']
    title = 'Added test to test_random.py testing Random.shuffle'
    updated_at = <Date 2012-11-04.01:11:50.599>
    user = 'https://bugs.python.org/eng793'

    bugs.python.org fields:

    activity = <Date 2012-11-04.01:11:50.599>
    actor = 'pitrou'
    assignee = 'none'
    closed = True
    closed_date = <Date 2012-11-04.01:11:50.610>
    closer = 'pitrou'
    components = ['Tests']
    creation = <Date 2012-09-01.02:06:18.538>
    creator = 'eng793'
    dependencies = []
    files = ['27082']
    hgrepos = []
    issue_num = 15837
    keywords = ['patch']
    message_count = 17.0
    messages = ['169603', '169605', '169614', '169615', '169790', '169806', '169809', '169810', '169811', '169812', '169813', '169814', '169854', '172166', '172234', '174730', '174731']
    nosy_count = 7.0
    nosy_names = ['rhettinger', 'pitrou', 'ezio.melotti', 'r.david.murray', 'python-dev', 'serhiy.storchaka', 'eng793']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue15837'
    versions = ['Python 3.4']

    @eng793
    Copy link
    Mannequin Author

    eng793 mannequin commented Sep 1, 2012

    Random.shuffle does not have a test in test_random.py; the attached patch adds this test. In addition, I rewrote the documentation string for Random.shuffle, which apparently did not reflect recent changes in the code and was inconsistent with the definition of the method. This change is also part of this patch; I was not sure if this merited a separate issue, so I just included this here.

    On a related matter: in Random.shuffle there is a third optional argument which looks very odd to me:

    def shuffle(self, x, random=None, int=int):
    ....

    Besides being confusing to a user typing help(shuffle), what the "int" argument does in shuffle is to convert a float to an integer. But one could pass any one-argument function in principle, and it would be then very hard to predict what shuffle would do... it would not "shuffle" any more in the traditional sense - not with a uniform probability distribution. In summary, I don't see how that argument could be useful, although the people who wrote the library must have had something in mind... if so it would be a good idea to document it.

    @eng793 eng793 mannequin added tests Tests in the Lib/test dir type-feature A feature request or enhancement labels Sep 1, 2012
    @bitdancer
    Copy link
    Member

    The patch seems to be missing.

    The int=int is probably some sort of micro-optimization and perhaps should be removed.

    @serhiy-storchaka
    Copy link
    Member

    The int=int is probably some sort of micro-optimization and perhaps should be removed.

    Agree, this micro-optimization has no effect here.

    @eng793
    Copy link
    Mannequin Author

    eng793 mannequin commented Sep 1, 2012

    Sorry, here it is the patch.

    @eng793
    Copy link
    Mannequin Author

    eng793 mannequin commented Sep 3, 2012

    Comparing the execution time with and without the int=int argument of this command:

    amoura@amoura-laptop:~/cpython$ time ./python -c "from random import shuffle; lst=list(range(1000000)); shuffle(lst); print (len(lst))"

    I get with int=int:

    real 0m13.755s
    user 0m13.777s
    sys 0m0.124s

    and without it:

    real 0m13.876s
    user 0m13.701s
    sys 0m0.116s

    So it makes no difference in practice. On the other hand, removing this has a chance of braking existing code, if someone somewhere actually uses the third argument for something - I can't image what, but still...

    @serhiy-storchaka
    Copy link
    Member

    Third parameter (int) plays a role only in the presence of a second one (random).

    @bitdancer
    Copy link
    Member

    No, it always has an effect. It means that the name 'int' is bound in locals instead of being looked up via globals. That is what makes it a micro-optimization (LOAD_FAST vs LOAD_GLOBAL, if you do a dis on the two variants).

    @bitdancer
    Copy link
    Member

    Oh, I see what you are saying. The lookup of int is only done if random is not None. Yes, that is true.

    @bitdancer
    Copy link
    Member

    If the optimization is actually useful, it can be preserved by just putting 'int=int' (with an 'optimization' comment :) before the loop.

    @eng793
    Copy link
    Mannequin Author

    eng793 mannequin commented Sep 3, 2012

    The int=int still makes no difference, but if the second argument is set to random.random, we get a big speedup, regardless of whether the third argument is there:

    without int=int:

    amoura@amoura-laptop:~/cpython$ time ./python -c "import random; lst=list(range(1000000)); random.shuffle(lst,random.random); print (len(lst))"
    1000000

    real 0m7.082s
    user 0m6.952s
    sys 0m0.116s

    With int=int:

    amoura@amoura-laptop:~/cpython$ time ./python -c "import random; lst=list(range(1000000)); random.shuffle(lst,random.random); print (len(lst))"
    1000000

    real 0m7.281s
    user 0m7.156s
    sys 0m0.100s

    Without second argument:

    amoura@amoura-laptop:~/cpython$ time ./python -c "import random; lst=list(range(1000000)); random.shuffle(lst); print (len(lst))"
    1000000

    real 0m13.783s
    user 0m13.609s
    sys 0m0.108s

    This could be because of the many tests of whether the 2nd argument is None in the loop.

    @serhiy-storchaka
    Copy link
    Member

    This could be because of the many tests of whether the 2nd argument is None
    in the loop.

    This is because Random._randbelow (and therefore randrange, randint) is
    relatively slow.

    @eng793
    Copy link
    Mannequin Author

    eng793 mannequin commented Sep 3, 2012

    Yup. This is the result of simply eliminating the condition in the loop and just using the second argument (for the purposes of testing this only):

    amoura@amoura-laptop:~/cpython$ time ./python -c "import random; lst=list(range(1000000)); random.shuffle(lst,random.random); print (len(lst))"
    1000000

    real 0m7.330s
    user 0m7.148s
    sys 0m0.092s

    @rhettinger rhettinger self-assigned this Sep 5, 2012
    @rhettinger
    Copy link
    Contributor

    The patch look fine as-is and it can be applied in 3.4. (BTW, I miss having a Resolution status of Accepted, meaning that the patch passed review and is ready to apply).

    FWIW, I'll remove the int=int optimization in Py3.4. It doesn't provide much benefit anymore.

    @ezio-melotti
    Copy link
    Member

    I left a review on rietveld.

    FWIW these are the results of the tests using timeit:

    # with int=int
    $ ./python -m timeit -s 'from random import random, shuffle; lst = list(range(100000))' 'shuffle(lst, random)'
    10 loops, best of 3: 507 msec per loop

    # without int=int
    $ ./python -m timeit -s 'from random import random, shuffle; lst = list(range(100000))' 'shuffle(lst, random)'
    10 loops, best of 3: 539 msec per loop

    @serhiy-storchaka
    Copy link
    Member

    I am not sure that None as default should be documented. It's implementation details (as third "int" argument) and can be silently changed in future versions.

    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Nov 4, 2012

    New changeset 58776cc74e89 by Antoine Pitrou in branch 'default':
    Issue bpo-15837: add some tests for random.shuffle().
    http://hg.python.org/cpython/rev/58776cc74e89

    @pitrou
    Copy link
    Member

    pitrou commented Nov 4, 2012

    I've committed the patch, thank you Alessandro.

    @pitrou pitrou closed this as completed Nov 4, 2012
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    tests Tests in the Lib/test dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    5 participants