Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create_unicode_buffer() fails on non-BMP strings on Windows #64064

Closed
gergelyerdelyi mannequin opened this issue Dec 2, 2013 · 9 comments
Closed

create_unicode_buffer() fails on non-BMP strings on Windows #64064

gergelyerdelyi mannequin opened this issue Dec 2, 2013 · 9 comments
Labels
3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes topic-ctypes type-bug An unexpected behavior, bug, or error

Comments

@gergelyerdelyi
Copy link
Mannequin

gergelyerdelyi mannequin commented Dec 2, 2013

BPO 19865
Nosy @amauryfa, @abalkin, @vstinner, @meadori, @eryksun, @ZackerySpytz, @miss-islington
PRs
  • bpo-19865: ctypes.create_unicode_buffer() fails on non-BMP strings on Windows #14081
  • [3.7] bpo-19865: ctypes.create_unicode_buffer() supports non-BMP strings on Windows (GH-14081) #14087
  • [3.8] bpo-19865: ctypes.create_unicode_buffer() supports non-BMP strings on Windows (GH-14081) #14088
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2019-06-14.16:54:21.044>
    created_at = <Date 2013-12-02.19:36:45.979>
    labels = ['3.8', 'ctypes', 'type-bug', '3.7', '3.9']
    title = 'create_unicode_buffer() fails on non-BMP strings on Windows'
    updated_at = <Date 2019-06-14.16:54:21.043>
    user = 'https://bugs.python.org/gergelyerdelyi'

    bugs.python.org fields:

    activity = <Date 2019-06-14.16:54:21.043>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2019-06-14.16:54:21.044>
    closer = 'vstinner'
    components = ['ctypes']
    creation = <Date 2013-12-02.19:36:45.979>
    creator = 'gergely.erdelyi'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 19865
    keywords = ['patch']
    message_count = 9.0
    messages = ['205045', '228405', '228424', '330583', '345596', '345601', '345609', '345610', '345611']
    nosy_count = 9.0
    nosy_names = ['amaury.forgeotdarc', 'belopolsky', 'vstinner', 'meador.inge', 'eryksun', 'gergely.erdelyi', 'ZackerySpytz', 'miss-islington', 'Leonard de Ruijter']
    pr_nums = ['14081', '14087', '14088']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'behavior'
    url = 'https://bugs.python.org/issue19865'
    versions = ['Python 3.7', 'Python 3.8', 'Python 3.9']

    @gergelyerdelyi
    Copy link
    Mannequin Author

    gergelyerdelyi mannequin commented Dec 2, 2013

    create_unicode_buffer() fails on Windows if the initializer string contains unicode code points outside of the Basic Multilingual Plane and an explicit length is not specified.

    The problem appears to be rooted in the fact that, since PEP-393, len() returns the number of code points, which does not always correspond to the number of 16-bit wchar words needed for the encoding on Windows. Because of that, the preallocated c_wchar buffer will be too short for the UTF-16 string.

    The following small snippet demonstrates the problem:

    from ctypes import create_unicode_buffer
    b = create_unicode_buffer("\U00028318\U00028319")
    print(b)

    File "c:\Python33\lib\ctypes\init.py", line 294, in create_unicode_buffer
    buf.value = init
    ValueError: string too long

    @gergelyerdelyi gergelyerdelyi mannequin added type-crash A hard crash of the interpreter, possibly with a core dump topic-ctypes labels Dec 2, 2013
    @serhiy-storchaka serhiy-storchaka added type-bug An unexpected behavior, bug, or error and removed type-crash A hard crash of the interpreter, possibly with a core dump labels Dec 2, 2013
    @BreamoreBoy
    Copy link
    Mannequin

    BreamoreBoy mannequin commented Oct 3, 2014

    I can confirm that this problem still exists so can someone take a look please, thanks.

    @eryksun
    Copy link
    Contributor

    eryksun commented Oct 4, 2014

    When sizeof(c_wchar) == 2, it can just count the number of non-BMP ordinals in the string. Another approach would be to use size = pythonapi.PyUnicode_AsWideChar(init, None, 0), but then the whole function may as well be implemented in the _ctypes extension module.

    @LeonarddeRuijter
    Copy link
    Mannequin

    LeonarddeRuijter mannequin commented Nov 28, 2018

    I'm still able to reproduce this issue with ctypes under Python 3.7.0

    @MojoVampire MojoVampire mannequin added 3.7 (EOL) end of life 3.8 (EOL) end of life labels Nov 28, 2018
    @ZackerySpytz
    Copy link
    Mannequin

    ZackerySpytz mannequin commented Jun 14, 2019

    I have created a pull request for this issue. Please take a look.

    @ZackerySpytz ZackerySpytz mannequin added the 3.9 only security fixes label Jun 14, 2019
    @vstinner
    Copy link
    Member

    New changeset 9765efc by Victor Stinner (Zackery Spytz) in branch 'master':
    bpo-19865: ctypes.create_unicode_buffer() supports non-BMP strings on Windows (GH-14081)
    9765efc

    @miss-islington
    Copy link
    Contributor

    New changeset 0b592d5 by Miss Islington (bot) in branch '3.7':
    bpo-19865: ctypes.create_unicode_buffer() supports non-BMP strings on Windows (GH-14081)
    0b592d5

    @miss-islington
    Copy link
    Contributor

    New changeset b0f6fa8 by Miss Islington (bot) in branch '3.8':
    bpo-19865: ctypes.create_unicode_buffer() supports non-BMP strings on Windows (GH-14081)
    b0f6fa8

    @vstinner
    Copy link
    Member

    Thanks Zackery Spytz for the fix. Thanks Gergely Erdélyi for the bug report! Sorry for the long delay.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes topic-ctypes type-bug An unexpected behavior, bug, or error
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants