Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

statistics.median does not work with ordinal scale, add doc #77754

Closed
WdeW mannequin opened this issue May 18, 2018 · 10 comments
Closed

statistics.median does not work with ordinal scale, add doc #77754

WdeW mannequin opened this issue May 18, 2018 · 10 comments
Labels
3.7 (EOL) end of life 3.8 only security fixes docs Documentation in the Doc dir stdlib Python modules in the Lib dir type-feature A feature request or enhancement

Comments

@WdeW
Copy link
Mannequin

WdeW mannequin commented May 18, 2018

BPO 33573
Nosy @terryjreedy, @taleinat, @stevendaprano
PRs
  • bpo-33573: improve docs to suggest statistics.median() alternatives for non-numeric data #7587
  • [3.7] bpo-33573: docs to suggest median() alternatives for non-numeric data (GH-7587) #7906
  • [3.6] bpo-33573: docs to suggest median() alternatives for non-numeric data (GH-7587) #7907
  • Files
  • testMedian.py: simple demonstraion of failure
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-06-25.11:27:42.421>
    created_at = <Date 2018-05-18.19:29:45.729>
    labels = ['3.7', '3.8', 'type-feature', 'library', 'docs']
    title = 'statistics.median does not work with ordinal scale, add doc'
    updated_at = <Date 2018-06-25.11:27:42.421>
    user = 'https://bugs.python.org/WdeW'

    bugs.python.org fields:

    activity = <Date 2018-06-25.11:27:42.421>
    actor = 'taleinat'
    assignee = 'docs@python'
    closed = True
    closed_date = <Date 2018-06-25.11:27:42.421>
    closer = 'taleinat'
    components = ['Documentation', 'Library (Lib)']
    creation = <Date 2018-05-18.19:29:45.729>
    creator = 'W deW'
    dependencies = []
    files = ['47601']
    hgrepos = []
    issue_num = 33573
    keywords = ['patch']
    message_count = 10.0
    messages = ['317048', '317120', '317122', '317125', '317248', '317694', '319219', '320414', '320416', '320417']
    nosy_count = 5.0
    nosy_names = ['terry.reedy', 'taleinat', 'steven.daprano', 'docs@python', 'W deW']
    pr_nums = ['7587', '7906', '7907']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue33573'
    versions = ['Python 3.6', 'Python 3.7', 'Python 3.8']

    @WdeW
    Copy link
    Mannequin Author

    WdeW mannequin commented May 18, 2018

    The 0.5-quantile or median is defined for ordinal, interval, and ratio scales. An Enumerator as derived from Enum and extended with rich comparison methods implements an ordinal scale. Therefore calculating the median over a list of such enum-elements ought to be possible.

    The current implementation tries to interpolate the median value by averaging the two middle observations. This is allowed for interval and ratio scales, but since this interpolation involves an addition, not so for ordinal scales. Although computationally it is possible to do this for numeric ordinal variables, logically it is non-sense for the distance between ordinal values is - by definition - unknown. On non-numeric ordinal values it is even computationally impossible.

    The correct return value would be: the first value in an ordered set where al least half the number of observations is smaller or equal than it. This is observation[len(observation)//2] for odd and even length ordered lists of values.

    Whether the same applies to interval and ratio scales is a matter of opinion. The currently implemented algorith definitely is more popular these days.

    @WdeW WdeW mannequin added type-crash A hard crash of the interpreter, possibly with a core dump stdlib Python modules in the Lib dir labels May 18, 2018
    @stevendaprano
    Copy link
    Member

    For ordinal scales, you should use either median_low or median_high.

    I don't think the standard median function ought to choose for you whether to take the low or high median. It is better to be explicit about which you want, by calling the relevant function, than for median to guess which one you need.

    @stevendaprano
    Copy link
    Member

    By the way, this isn't a crash (that's for things which cause the interpreter to segfault). I'm marking this as Not a bug, but I'm open to suggestions to improve either the documentation or the median functions.

    @stevendaprano stevendaprano added 3.7 (EOL) end of life type-bug An unexpected behavior, bug, or error invalid and removed type-crash A hard crash of the interpreter, possibly with a core dump labels May 19, 2018
    @stevendaprano
    Copy link
    Member

    What do you think of adding a note in the documentation for median?

    "If your data is ordinal (supports order operations) but not numeric (doesn't support addition), you should use median_low or median_high instead."

    @WdeW
    Copy link
    Mannequin Author

    WdeW mannequin commented May 21, 2018

    Changing the documentation in tis way seems to me an excellent and easy way to solve the issue.

    @terryjreedy
    Copy link
    Member

    I agree.

    @terryjreedy terryjreedy added 3.8 only security fixes docs Documentation in the Doc dir and removed invalid labels May 25, 2018
    @terryjreedy terryjreedy changed the title statistics.median does not work with ordinal scale statistics.median does not work with ordinal scale, add doc May 25, 2018
    @terryjreedy terryjreedy added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels May 25, 2018
    @taleinat
    Copy link
    Contributor

    PR ready for review.

    @taleinat
    Copy link
    Contributor

    New changeset fdd6e0b by Tal Einat in branch 'master':
    bpo-33573: docs to suggest median() alternatives for non-numeric data (GH-7587)
    fdd6e0b

    @taleinat
    Copy link
    Contributor

    New changeset 150cd3c by Tal Einat (Miss Islington (bot)) in branch '3.7':
    [3.7] bpo-33573: docs to suggest median() alternatives for non-numeric data (GH-7587) (GH-7906)
    150cd3c

    @taleinat
    Copy link
    Contributor

    New changeset 8fd8cfa by Tal Einat (Miss Islington (bot)) in branch '3.6':
    [3.6] bpo-33573: docs to suggest median() alternatives for non-numeric data (GH-7587) (GH-7907)
    8fd8cfa

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 only security fixes docs Documentation in the Doc dir stdlib Python modules in the Lib dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants