Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

random.choices has unexpected behavior with negative weights #81805

Closed
tewhalen mannequin opened this issue Jul 18, 2019 · 6 comments
Closed

random.choices has unexpected behavior with negative weights #81805

tewhalen mannequin opened this issue Jul 18, 2019 · 6 comments
Labels
3.7 (EOL) end of life docs Documentation in the Doc dir type-bug An unexpected behavior, bug, or error

Comments

@tewhalen
Copy link
Mannequin

tewhalen mannequin commented Jul 18, 2019

BPO 37624
Nosy @rhettinger, @mdickinson, @tewhalen, @aldwinaldwin
PRs
  • bpo-37624: random.choices has unexpected behavior with negative weights #14854
  • bpo-37624: Document weight assumptions for random.choices() #14855
  • [3.8] bpo-37624: Document weight assumptions for random.choices() (GH-14855) #14858
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-07-19.09:27:50.105>
    created_at = <Date 2019-07-18.20:51:27.633>
    labels = ['type-bug', '3.7', 'docs']
    title = 'random.choices has unexpected behavior with negative weights'
    updated_at = <Date 2019-07-19.09:27:50.092>
    user = 'https://github.com/tewhalen'

    bugs.python.org fields:

    activity = <Date 2019-07-19.09:27:50.092>
    actor = 'rhettinger'
    assignee = 'docs@python'
    closed = True
    closed_date = <Date 2019-07-19.09:27:50.105>
    closer = 'rhettinger'
    components = ['Documentation']
    creation = <Date 2019-07-18.20:51:27.633>
    creator = 'Ted Whalen'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 37624
    keywords = ['patch']
    message_count = 6.0
    messages = ['348128', '348140', '348144', '348154', '348156', '348158']
    nosy_count = 5.0
    nosy_names = ['rhettinger', 'mark.dickinson', 'docs@python', 'Ted Whalen', 'aldwinaldwin']
    pr_nums = ['14854', '14855', '14858']
    priority = 'low'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue37624'
    versions = ['Python 3.7']

    @tewhalen
    Copy link
    Mannequin Author

    tewhalen mannequin commented Jul 18, 2019

    The behavior of random.choices when negative weights are provided is unexpected. While giving a negative weight for one value is probably bad, it's really unfortunate that providing a negative weight for one value affects the probability of selecting an adjacent value.

    Throwing a ValueError exception when there was a use of negative weights was considered in bpo-31689, but at that time, there wasn't an example provided that failed silently.

    Note below that providing a weight of -1 for 'c' causes both 'c' and 'd' to drop out of the results.

    Python 3.7.2 (default, Jan 13 2019, 12:50:01)
    [Clang 10.0.0 (clang-1000.11.45.5)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> from collections import Counter
    >>> from random import choices
    >>> Counter(choices("abcdefg", weights=(1,1,-1,1,1,1,1), k=10000))
    Counter({'f': 2040, 'a': 2019, 'e': 2017, 'g': 2009, 'b': 1915})

    @tewhalen tewhalen mannequin added stdlib Python modules in the Lib dir 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error labels Jul 18, 2019
    @aldwinaldwin
    Copy link
    Mannequin

    aldwinaldwin mannequin commented Jul 19, 2019

    This is what happens with your weights:

    >>> list(itertools.accumulate(weights))
    [1, 2, 1, 2, 3, 3, 4]

    using bisect.bisect certain amount of times, will distribute on this:
    a<1<b<2<c<1<d<2<e<3<f<4<g

    random numbers between 1 and 2 will go to 1<'b'<2 and not 'd' that is also 1<'d'<2

    As discussed in bpo-31689, if no negative weights should be used, then a ValueError exception should be re-concidered.

    @rhettinger
    Copy link
    Contributor

    We can add a note the the docs that weights are assumed to be non-negative, but I don't want to add an extra O(n) step to check for unusual inputs with undefined meaning -- that would just impair the normal use cases for near zero benefit.

    @rhettinger rhettinger added docs Documentation in the Doc dir and removed stdlib Python modules in the Lib dir labels Jul 19, 2019
    @rhettinger
    Copy link
    Contributor

    New changeset 8dbe563 by Raymond Hettinger in branch 'master':
    bpo-37624: Document weight assumptions for random.choices() (GH-14855)
    8dbe563

    @rhettinger
    Copy link
    Contributor

    New changeset a50a622 by Raymond Hettinger (Miss Islington (bot)) in branch '3.8':
    bpo-37624: Document weight assumptions for random.choices() (GH-14855) (GH-14858)
    a50a622

    @rhettinger
    Copy link
    Contributor

    Misc side notes:

    • There is no expected behavior for negative a negative weight. Arguably, the only reasonable interpretation is what it already does (reduce the cumulative total).

    • Running simulations is a primary use case for choices(). Generally, running time there is important.

    • During the design phase, none of the other implementations studied had incorporated a scan of the inputs for negative weights.

    • bisect() does not check to make sure its inputs are sorted. The results for unsorted data are undefined. It is a documented precondition.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life docs Documentation in the Doc dir type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    1 participant