Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

struct.pack fails first time with unicode fmt #63298

Closed
miwa mannequin opened this issue Sep 27, 2013 · 15 comments
Closed

struct.pack fails first time with unicode fmt #63298

miwa mannequin opened this issue Sep 27, 2013 · 15 comments
Assignees
Labels
topic-unicode type-feature A feature request or enhancement

Comments

@miwa
Copy link
Mannequin

miwa mannequin commented Sep 27, 2013

BPO 19099
Nosy @mdickinson, @pitrou, @ezio-melotti, @meadori, @serhiy-storchaka, @vajrasky
Files
  • handle_ascii_range_unicode_in_struct_pack.patch
  • handle_ascii_range_unicode_in_struct_pack_v2.patch
  • struct_unicode_fmt.patch
  • fix_error_message_struct_Struct.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2013-12-08.15:48:19.060>
    created_at = <Date 2013-09-27.08:28:07.051>
    labels = ['type-feature', 'expert-unicode']
    title = 'struct.pack fails first time with unicode fmt'
    updated_at = <Date 2013-12-15.09:31:39.993>
    user = 'https://bugs.python.org/miwa'

    bugs.python.org fields:

    activity = <Date 2013-12-15.09:31:39.993>
    actor = 'vajrasky'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2013-12-08.15:48:19.060>
    closer = 'serhiy.storchaka'
    components = ['Unicode']
    creation = <Date 2013-09-27.08:28:07.051>
    creator = 'miwa'
    dependencies = []
    files = ['31922', '31930', '31931', '33126']
    hgrepos = []
    issue_num = 19099
    keywords = ['patch']
    message_count = 15.0
    messages = ['198464', '198708', '198714', '198716', '198747', '198749', '198754', '198758', '198759', '198769', '203063', '205571', '205572', '206165', '206220']
    nosy_count = 8.0
    nosy_names = ['mark.dickinson', 'pitrou', 'ezio.melotti', 'miwa', 'meador.inge', 'python-dev', 'serhiy.storchaka', 'vajrasky']
    pr_nums = []
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue19099'
    versions = ['Python 2.7']

    @miwa
    Copy link
    Mannequin Author

    miwa mannequin commented Sep 27, 2013

    C:\>python
    Python 2.7.5 (default, May 15 2013, 22:44:16) [MSC v.1500 64 bit AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import struct
    >>> struct.pack(u'B',1)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: Struct() argument 1 must be string, not unicode
    >>> struct.pack(u'B',1)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: Struct() argument 1 must be string, not unicode
    >>> struct.pack('B',1) # this is ok
    '\x01'
    >>> struct.pack(u'B',1)
    '\x01'

    @miwa miwa mannequin added type-bug An unexpected behavior, bug, or error topic-unicode labels Sep 27, 2013
    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Sep 30, 2013

    Here's the preliminary patch. I am assuming that we should accept unicode argument not reject it straight away.

    Python3 does that.
    >>> import struct
    >>> struct.pack('b', 3)
    b'\x03'
    >>> struct.pack(b'b', 3)
    b'\x03'
    >>> struct.pack(b'\xff', 3)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    struct.error: bad char in struct format

    @serhiy-storchaka
    Copy link
    Member

    Struct constructor accepts only str and not unicode. But struct.pack() uses caching and it found Struct('B') in the cache (because u'B' and 'B' are equal and have same hash).

    I doubt we should fix this. Adding support of Unicode argument is new feature.

    @pitrou
    Copy link
    Member

    pitrou commented Sep 30, 2013

    Either way, the runtime inconsistency is a bug. Since we shouldn't break existing code, I would vote for always allowing unicode format strings, rather than always disallowing them.

    Another argument is that str and unicode are generally substituible in 2.x when they are pure ASCII (which they are here).

    @miwa
    Copy link
    Mannequin Author

    miwa mannequin commented Sep 30, 2013

    Thanks for feedback.

    I think it should be fixed with allowing unicode.
    "from __future__ import unicode_literals" may break existing code.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Oct 1, 2013

    Refactor test to clear the cache before using unicode format.

    @serhiy-storchaka
    Copy link
    Member

    struct.Struct() should be changed instead of struct.pack(). Here is a patch.

    @serhiy-storchaka serhiy-storchaka added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Oct 1, 2013
    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Oct 1, 2013

    Serhiy, you don't want to clear the cache in the test to simulate the bug?

    struct._clearcache()

    Other than that, should we raise struct.error instead of ValueError? Right now, the current behaviour in python 2.7 is:

    >>> struct.pack('\x80', 3)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    struct.error: bad char in struct format

    But you are right. Struct() should have been changed instead of struct.pack.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Oct 1, 2013

    Nevermind about my comment about clearing cache. It only happens if we use struct.pack not struct.Struct.

    @serhiy-storchaka
    Copy link
    Member

    Other than that, should we raise struct.error instead of ValueError?

    Python 3 raises UnicodeEncodeError. And Python 2 raises UnicodeEncodeError when coerce non-ASCII unicode to str.

    @serhiy-storchaka
    Copy link
    Member

    Any comments?

    @serhiy-storchaka serhiy-storchaka self-assigned this Nov 16, 2013
    @python-dev
    Copy link
    Mannequin

    python-dev mannequin commented Dec 8, 2013

    New changeset 42d3afd29460 by Serhiy Storchaka in branch '2.7':
    Issue bpo-19099: The struct module now supports Unicode format strings.
    http://hg.python.org/cpython/rev/42d3afd29460

    @serhiy-storchaka
    Copy link
    Member

    Fixed. Thank you Musashi for your report.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Dec 14, 2013

    Okay, I think the error message can be improved because in Python 2.7 we differentiate very clearly the string from the unicode.

    >>> import struct
    >>> struct.Struct(1)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: Struct() argument 1 must be string, not int

    But you can give unicode, right?

    >>> struct.Struct(u'b')
    <Struct object at 0x1f484b8>

    This is consistent with other example:

    >>> " cutecat ".strip(1)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: strip arg must be None, str or unicode

    What do you say, Serhiy?

    Here is the patch.

    @vajrasky
    Copy link
    Mannequin

    vajrasky mannequin commented Dec 15, 2013

    Nevermind, I already created this issue. http://bugs.python.org/issue19985

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    topic-unicode type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    2 participants