Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate immortal interned strings: PyUnicode_InternImmortal() #85858

Closed
vstinner opened this issue Sep 2, 2020 · 9 comments
Closed

Deprecate immortal interned strings: PyUnicode_InternImmortal() #85858

vstinner opened this issue Sep 2, 2020 · 9 comments
Labels
3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@vstinner
Copy link
Member

vstinner commented Sep 2, 2020

BPO 41692
Nosy @vstinner, @methane, @serhiy-storchaka, @corona10, @shihai1991
PRs
  • bpo-41692: Deprecate PyUnicode_InternImmortal() #22486
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2020-10-02.12:49:46.896>
    created_at = <Date 2020-09-02.13:59:25.913>
    labels = ['interpreter-core', '3.10']
    title = 'Deprecate immortal interned strings: PyUnicode_InternImmortal()'
    updated_at = <Date 2021-12-08.11:28:35.305>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2021-12-08.11:28:35.305>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2020-10-02.12:49:46.896>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2020-09-02.13:59:25.913>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 41692
    keywords = ['patch']
    message_count = 9.0
    messages = ['376237', '376271', '376302', '376428', '377784', '377808', '377809', '393350', '408011']
    nosy_count = 5.0
    nosy_names = ['vstinner', 'methane', 'serhiy.storchaka', 'corona10', 'shihai1991']
    pr_nums = ['22486']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue41692'
    versions = ['Python 3.10']

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 2, 2020

    Python has the concept of "immortal" interned strings: PyUnicode_InternImmortal().

    The feature was first introduced in the Python 2 "str" (bytes) type, bpo-576101 (commit 45ec02a). New PyString_InternImmortal() function.

    commit 45ec02a
    Author: Guido van Rossum <guido@python.org>
    Date: Mon Aug 19 21:43:18 2002 +0000

    SF patch 576101, by Oren Tirosh: alternative implementation of
    interning.  I modified Oren's patch significantly, but the basic idea
    and most of the implementation is unchanged.  Interned strings created
    with PyString_InternInPlace() are now mortal, and you must keep a
    reference to the resulting string around; use the new function
    PyString_InternImmortal() to create immortal interned strings.
    

    Later, the feature was added to the PyUnicodeObject type, new PyUnicode_InternImmortal() function:

    commit 1680713
    Author: Walter Dörwald <walter@livinglogic.de>
    Date: Fri May 25 13:52:07 2007 +0000

    Add interning of unicode strings by copying the functionality from
    stringobject.c.
    
    Intern "True" and "False" in bool_repr() again as it was in the
    8bit string era.
    

    Since Python 3.10, (mortal) interned strings are cleared at Python exit in Py_Finalize(). It avoids leaking memory when Python is embedded in an application: bpo-1635741.

    commit 666ecfb
    Author: Victor Stinner <vstinner@python.org>
    Date: Thu Jul 2 01:19:57 2020 +0200

    bpo-1635741: Release Unicode interned strings at exit (GH-21269)
    
    * PyUnicode_InternInPlace() now ensures that interned strings are
      ready.
    * Add _PyUnicode_ClearInterned().
    * Py_Finalize() now releases Unicode interned strings:
      call _PyUnicode_ClearInterned().
    

    --

    PyUnicode_InternImmortal() is not used in the Python standard library. I propose to start deprecating the function and remove it in Python 3.12 (PEP-387 requires a deprecation for 2 releases). In Python 3.10, calling the function will emit a DeprecationWarning at runtime.

    Note: PyString_InternImmortal() (for bytes strings) has been removed from Python 3.0.

    @vstinner vstinner added 3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Sep 2, 2020
    @methane
    Copy link
    Member

    methane commented Sep 3, 2020

    +1

    2 similar comments
    @corona10
    Copy link
    Member

    corona10 commented Sep 3, 2020

    +1

    @shihai1991
    Copy link
    Member

    +1

    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 1, 2020

    I proposed PR 22486 to deprecate the function.

    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 2, 2020

    New changeset 583ee5a by Victor Stinner in branch 'master':
    bpo-41692: Deprecate PyUnicode_InternImmortal() (GH-22486)
    583ee5a

    @vstinner
    Copy link
    Member Author

    vstinner commented Oct 2, 2020

    The function is now deprecated. Thanks for the review INADA-san, I close the issue. Let's meet in Python 3.12 to remove it ;-)

    @methane
    Copy link
    Member

    methane commented May 10, 2021

    For the record, I noticed PyUnicode_InternImmortal() is a stable ABI.

    We may need to keep the function to avoid dynamic link errors.
    But we can still change its implementation to just raise an exception.

    @vstinner
    Copy link
    Member Author

    vstinner commented Dec 8, 2021

    I cannot find "PyUnicode_InternImmortal" pattern in the source code of the PyPI top 5000 projects (December 1, 2021).

    I only found a false positive in frozendict-2.1.1:

    frozendict/src/3_10/cpython_src/Include/unicodeobject.h: // PyUnicode_InternImmortal() is deprecated since Python 3.10
    frozendict/src/3_10/cpython_src/Include/unicodeobject.h: Py_DEPRECATED(3.10) PyAPI_FUNC(void) PyUnicode_InternImmortal(PyObject **);
    frozendict/src/3_6/cpython_src/Include/unicodeobject.h: PyAPI_FUNC(void) PyUnicode_InternImmortal(PyObject **);
    frozendict/src/3_7/cpython_src/Include/unicodeobject.h: PyAPI_FUNC(void) PyUnicode_InternImmortal(PyObject **);
    frozendict/src/3_8/cpython_src/Include/unicodeobject.h: PyAPI_FUNC(void) PyUnicode_InternImmortal(PyObject **);
    frozendict/src/3_9/cpython_src/Include/unicodeobject.h: PyAPI_FUNC(void) PyUnicode_InternImmortal(PyObject **);

    These are copies of the Python unicodeobject.h header files, but the PyUnicode_InternImmortal() function is not called by frozendict.

    I used my download_pypi_top.py and search_pypi_top.py tools which can be found at:
    https://github.com/vstinner/misc/tree/main/cpython

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    vstinner added a commit that referenced this issue May 13, 2022
    Remove the PyUnicode_InternImmortal() function and the
    SSTATE_INTERNED_IMMORTAL macro.
    
    The PyUnicode_InternImmortal() function is still exported in the
    stable ABI. The function is removed from the API.
    
    PyASCIIObject.state.interned size is now a single bit, rather than 2
    bits.
    
    Keep SSTATE_NOT_INTERNED and SSTATE_INTERNED_MORTAL macros for
    backward compatibility, but no longer use them internally since the
    interned member is now a single bit and so can only have two values
    (interned or not interned).
    
    Update stats of _PyUnicode_ClearInterned().
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants