Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error in unicodedata.numeric(u"\u2187") and 2188 #50632

Closed
vernondcole mannequin opened this issue Jun 30, 2009 · 8 comments
Closed

error in unicodedata.numeric(u"\u2187") and 2188 #50632

vernondcole mannequin opened this issue Jun 30, 2009 · 8 comments
Labels
topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@vernondcole
Copy link
Mannequin

vernondcole mannequin commented Jun 30, 2009

BPO 6383
Nosy @loewis, @amauryfa, @benjaminp, @ezio-melotti
Superseder
  • bpo-1571184: Generate numeric/space/linebreak from Unicode database.
  • Files
  • unicode_tonumeric.patch
  • unicode-tonumeric-2.patch
  • unnamed
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2009-06-30.21:29:30.614>
    created_at = <Date 2009-06-30.02:43:02.705>
    labels = ['type-bug', 'expert-unicode']
    title = 'error in unicodedata.numeric(u"\\u2187") and 2188'
    updated_at = <Date 2009-06-30.21:29:30.612>
    user = 'https://bugs.python.org/vernondcole'

    bugs.python.org fields:

    activity = <Date 2009-06-30.21:29:30.612>
    actor = 'amaury.forgeotdarc'
    assignee = 'none'
    closed = True
    closed_date = <Date 2009-06-30.21:29:30.614>
    closer = 'amaury.forgeotdarc'
    components = ['Unicode']
    creation = <Date 2009-06-30.02:43:02.705>
    creator = 'vernondcole'
    dependencies = []
    files = ['14404', '14405', '14409']
    hgrepos = []
    issue_num = 6383
    keywords = ['patch', 'needs review']
    message_count = 8.0
    messages = ['89899', '89911', '89913', '89919', '89926', '89946', '89947', '89949']
    nosy_count = 5.0
    nosy_names = ['loewis', 'amaury.forgeotdarc', 'benjamin.peterson', 'ezio.melotti', 'vernondcole']
    pr_nums = []
    priority = 'normal'
    resolution = 'duplicate'
    stage = None
    status = 'closed'
    superseder = '1571184'
    type = 'behavior'
    url = 'https://bugs.python.org/issue6383'
    versions = ['Python 2.6', 'Python 3.0', 'Python 3.1', 'Python 2.7']

    @vernondcole
    Copy link
    Mannequin Author

    vernondcole mannequin commented Jun 30, 2009

    I am making a demo program, a class which is a subset of int, which
    implements a partial implementation of PEP-313 (Roman numeral literals).

    I discover that my conversion routines fail for values > 50000 due to an
    error in unicodedata for the two code points 2187 and 2188. The return
    value of unicodedata.numeric() for those two points should be 50,000.0
    and 100,000.0 respectively. See the following console dump which
    includes code point 2181 which works correctly.

    ----- console dump follows -----

    c:\BZR\roman>c:\python26\python.exe
    Python 2.6.2 (r262:71605, Apr 14 2009, 22:40:02) [MSC v.1500 32 bit
    (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import unicodedata
    >>> unicodedata.name(u"\u2187")
    'ROMAN NUMERAL FIFTY THOUSAND'
    >>> unicodedata.numeric(u"\u2187")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: not a numeric character
    >>> unicodedata.name(u"\u2188")
    'ROMAN NUMERAL ONE HUNDRED THOUSAND'
    >>> unicodedata.numeric(u"\u2188")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: not a numeric character
    >>> unicodedata.name(u"\u2181")
    'ROMAN NUMERAL FIVE THOUSAND'
    >>> unicodedata.numeric(u"\u2181")
    5000.0
    >>>

    @vernondcole vernondcole mannequin added topic-unicode type-bug An unexpected behavior, bug, or error labels Jun 30, 2009
    @ezio-melotti
    Copy link
    Member

    Python 2.6 and all the following versions use the Unicode database
    version 5.1.0 1 (unicodedata.unidata_version).

    The numeric value is in the database for all the codepoints from U+2185
    to U+2188 (included), so the problem shouldn't be there.

    @amauryfa
    Copy link
    Member

    The _PyUnicode_ToNumeric() function was not in line with the unicode
    database.
    Here is a new version of this function, together with the script to
    generate its code.

    @benjaminp
    Copy link
    Contributor

    Wouldn't it make more sense to move this into unicode_db.h?

    @amauryfa
    Copy link
    Member

    Right. Actually unicodetype_db.h is the one included in unicodectype.c,
    I moved my script into makeunicodedata.py.

    Here is a new patch. The code generated for _PyUnicode_ToNumeric is the
    same as before (except for some tabs), see the old patch if you want to
    check the actual changes in the function.

    @vernondcole
    Copy link
    Mannequin Author

    vernondcole mannequin commented Jun 30, 2009

    Wow! Quick response! My outstanding bug on IronPython has been hanging out
    there since August of last year.
    I don't really want to try compiling the standard library on my laptop,
    but I do want to fully test my code soon. What is the first place I can
    expect to see this in binary form? 3.2 alpha?

    Vernon

    On Tue, Jun 30, 2009 at 8:28 AM, Amaury Forgeot d'Arc <
    report@bugs.python.org> wrote:

    Amaury Forgeot d'Arc <amauryfa@gmail.com> added the comment:

    Right. Actually unicodetype_db.h is the one included in unicodectype.c,
    I moved my script into makeunicodedata.py.

    Here is a new patch. The code generated for _PyUnicode_ToNumeric is the
    same as before (except for some tabs), see the old patch if you want to
    check the actual changes in the function.

    ----------
    Added file: http://bugs.python.org/file14405/unicode-tonumeric-2.patch


    Python tracker <report@bugs.python.org>
    <http://bugs.python.org/issue6383\>


    @loewis
    Copy link
    Mannequin

    loewis mannequin commented Jun 30, 2009

    Notice that this is a duplicate of the longstanding bpo-1571184, which
    has a patch that is more comprehensive than the one proposed here. So
    rather than accepting Amaury's patch, I'd prefer to see Anders' patch
    reviewed, and revised as necessary.

    @amauryfa
    Copy link
    Member

    Yes, my patch is entirely contained in the one from bpo-1571184.
    I mark this one as a duplicate, and will review and update the other.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-unicode type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants