Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

array: Add 'w' type and deprecate 'u' type. #80480

Closed
methane opened this issue Mar 15, 2019 · 22 comments
Closed

array: Add 'w' type and deprecate 'u' type. #80480

methane opened this issue Mar 15, 2019 · 22 comments
Labels
3.13 bugs and security fixes extension-modules C modules in the Modules dir type-feature A feature request or enhancement

Comments

@methane
Copy link
Member

methane commented Mar 15, 2019

BPO 36299
Nosy @terryjreedy, @ncoghlan, @methane, @skrah, @serhiy-storchaka
PRs
  • bpo-36299: array('u') uses Py_UCS4 instead of Py_UNICODE  #12497
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2019-03-15.05:50:02.657>
    labels = ['3.8', 'library']
    title = "array: Deprecate 'u' type in array module"
    updated_at = <Date 2020-04-23.00:47:43.731>
    user = 'https://github.com/methane'

    bugs.python.org fields:

    activity = <Date 2020-04-23.00:47:43.731>
    actor = 'methane'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['Library (Lib)']
    creation = <Date 2019-03-15.05:50:02.657>
    creator = 'methane'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 36299
    keywords = ['patch']
    message_count = 12.0
    messages = ['337967', '338031', '338595', '338598', '338607', '338608', '338609', '338610', '338611', '367000', '367044', '367065']
    nosy_count = 5.0
    nosy_names = ['terry.reedy', 'ncoghlan', 'methane', 'skrah', 'serhiy.storchaka']
    pr_nums = ['12497']
    priority = 'normal'
    resolution = None
    stage = None
    status = 'open'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue36299'
    versions = ['Python 3.8']

    Linked PRs

    @methane
    Copy link
    Member Author

    methane commented Mar 15, 2019

    The doc says:

    'u' will be removed together with the rest of the Py_UNICODE API.
    Deprecated since version 3.3, will be removed in version 4.0.
    https://docs.python.org/3/library/array.html

    But DeprecationWarning is not raised yet. Let's raise it.

    • 3.8 -- PendingDeprecationWarning
    • 3.9 -- DeprecationWarning
    • 4.0 or 3.10 -- Remove it.

    @methane methane added 3.8 only security fixes stdlib Python modules in the Lib dir labels Mar 15, 2019
    @terryjreedy
    Copy link
    Member

    '4.0' is a stand-in for 'sometime after 2.7.final', scheduled for Jan 2020. A Pending... for 3.8.0, scheduled for Oct 2019, seems reasonable to me. Perhaps we should have a pydev discussion for the general issue of post 2.7 removals of already deprecated items.

    @methane
    Copy link
    Member Author

    methane commented Mar 22, 2019

    https://mail.python.org/pipermail/python-dev/2019-March/156807.html

    We may able to convert 'u' to wchar_t to int32_t and un-deprecate it.

    @methane
    Copy link
    Member Author

    methane commented Mar 22, 2019

    I found converting Py_UNICODE to Py_UCS4 wad happened, and reverted.
    ref: https://bugs.python.org/issue13072

    @methane methane changed the title Deprecate 'u' type in array module array: Deprecate 'u' type in array module Mar 22, 2019
    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Mar 22, 2019

    I think the problem is still whether to use 'u' == UCS2 and 'w' == UCS4 like in PEP-3118.

    For the project I'm currently working on I'd need these for buffer exports:

    >>> from xnd import *
    >>> x = xnd(["abc", "xyz"], dtype="fixed_string(10, 'utf16')")
    >>> y = xnd(["abc", "xyz"], dtype="fixed_string(10, 'utf32')")
    >>> 
    >>> memoryview(x)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: type is not supported by the buffer protocol

    The use case is not an array that represents a single utf16 string, but
    an array *of* fixed strings with different encodings.

    So x would be exported with format 'u' and y with format 'w'.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Mar 22, 2019

    Just to demonstrate what the format would look like, this is working
    for an array of fixed bytes:

    >>> x = xnd([b"123", b"23456"], dtype="fixed_bytes(size=10)")
    >>> memoryview(x).format
    '10s'

    So the formats in the previous message would be '10u' and '10w'.

    @serhiy-storchaka
    Copy link
    Member

    array('u') is not tied with the legacy Unicode C API. It is possible to use the modern wchar_t based Unicode C API for it. See bpo-36346.

    There are benefits from getting rid of the legacy Unicode C API, but not from array('u').

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Mar 22, 2019

    array() uses struct module characters except for 'u'. PEP-3118 was
    supposed to be implemented in the struct module.

    If array() continues to use 'u', the only sensible thing would be
    to remove (or rename) 'a', 'u' and 'w' from PEP-3118.

    @skrah
    Copy link
    Mannequin

    skrah mannequin commented Mar 22, 2019

    The funny thing is that array() already knows this:

    >>> import array
    >>> a = array.array("u", "123")
    >>> memoryview(a).format
    'w'

    @methane
    Copy link
    Member Author

    methane commented Apr 22, 2020

    I closed #56706 (Py_UNICODE -> Py_UCS4).
    I created #63852 (Py_UNICODE -> wchar_t) instead.

    @terryjreedy
    Copy link
    Member

    Should this issue be closed, possibly as superseded by bpo-36346, the issue for the new PR-19653?

    @methane
    Copy link
    Member Author

    methane commented Apr 23, 2020

    While array('u') doesn't use deprecated API with #63852, I still don't like 'u' because:

    • I don't have any reason to use platform dependant wchar_t. 1
    • It is not consistent with PEP-3118.

    How about this plan?

    • Add 'w' for Py_UCS4.
    • Deprecate 'u', and remove it in the future.

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    @methane methane removed the 3.8 only security fixes label May 20, 2022
    @ezio-melotti
    Copy link
    Member

    Even though this has been deprecated in the docs since 3.3, it doesn't seem to raise a DeprecationWarning. If we want to remove it we should add the warning, and then remove it in a later version. The documentation should also be updated according to the plan (see #93986).

    hugovk added a commit to hugovk/cpython that referenced this issue Aug 6, 2022
    hugovk added a commit to hugovk/cpython that referenced this issue Aug 7, 2022
    hugovk added a commit to hugovk/cpython that referenced this issue Aug 7, 2022
    hugovk added a commit to hugovk/cpython that referenced this issue Aug 7, 2022
    hugovk added a commit to hugovk/cpython that referenced this issue Aug 7, 2022
    hugovk added a commit to hugovk/cpython that referenced this issue Aug 7, 2022
    hugovk added a commit to hugovk/cpython that referenced this issue Aug 7, 2022
    hugovk added a commit to hugovk/cpython that referenced this issue Aug 7, 2022
    hugovk added a commit to hugovk/cpython that referenced this issue Aug 7, 2022
    @hugovk
    Copy link
    Member

    hugovk commented Aug 7, 2022

    Please see PR #95760 to emit a DeprecationWarning.

    @hugovk
    Copy link
    Member

    hugovk commented Nov 27, 2022

    • Add 'w' for Py_UCS4.

    @methane Would you be able to look into this? It would be good to have the replacement ready when we start emitting deprecation warnings.

    @hugovk
    Copy link
    Member

    hugovk commented May 2, 2023

    The 3.12 beta feature freeze is in a week, so we will likely need to target changing this in 3.13 with removal in 3.15.

    @arhadthedev arhadthedev added 3.13 bugs and security fixes extension-modules C modules in the Modules dir type-feature A feature request or enhancement and removed stdlib Python modules in the Lib dir labels May 5, 2023
    @encukou
    Copy link
    Member

    encukou commented May 9, 2023

    How about:

    • Add 'w' for Py_UCS4.
    • Wait until all Python versions that don't have w reach end of life
    • Deprecate 'u', and remove it in the future.

    Is there a reason to rush this?

    @methane
    Copy link
    Member Author

    methane commented May 10, 2023

    array('u') will cause bugs and almost no valid use case for it.
    So waiting deprecation 5+ years seems too slow.

    On the other hand, I don't hurry about removing it.
    I can wait 3+ releases instead of minimum 2 releases.

    My plan is:

    • Add 'w' and deprecate 'u' in Python 3.13
    • Remove 'u' in Python 3.16+ (Postpone removal if some users still use 'u').

    @methane methane changed the title array: Deprecate 'u' type in array module array: Add 'w' type and deprecate 'u' type. May 10, 2023
    @vstinner
    Copy link
    Member

    vstinner commented Jun 6, 2023

    @methane added array.array('w') format to Python 3.13.

    @methane: Would you mind to also document the addition with versionchanged in https://docs.python.org/dev/library/array.html ? Currently, it's only documented at: https://docs.python.org/dev/whatsnew/3.13.html#array

    @vstinner
    Copy link
    Member

    vstinner commented Jun 6, 2023

    I dislike the fact that array.array('u') is only deprecated in the doc. I would prefer to either emit a DeprecationWarning at runtime, or remove the deprecation. See also issue #105373. I'm fine with really deprecating it if someone wants to submit a PR (@methane ?).

    @hugovk
    Copy link
    Member

    hugovk commented Jun 6, 2023

    @vstinner PR already open at #95760 :)

    @hugovk
    Copy link
    Member

    hugovk commented Jun 11, 2023

    array('u') will cause bugs and almost no valid use case for it. So waiting deprecation 5+ years seems too slow.

    On the other hand, I don't hurry about removing it. I can wait 3+ releases instead of minimum 2 releases.

    My plan is:

    • Add 'w' and deprecate 'u' in Python 3.13

    Done in #105242 and #95760.

    • Remove 'u' in Python 3.16+ (Postpone removal if some users still use 'u').

    Let's close this issue and come back to this in four years.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.13 bugs and security fixes extension-modules C modules in the Modules dir type-feature A feature request or enhancement
    Projects
    None yet
    Development

    No branches or pull requests

    8 participants