Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promote PyUnicode_AsUTF8AndSize to be available with the limited API (PEP 384) #85950

Closed
alex opened this issue Sep 14, 2020 · 13 comments
Closed

Promote PyUnicode_AsUTF8AndSize to be available with the limited API (PEP 384) #85950

alex opened this issue Sep 14, 2020 · 13 comments
Assignees
Labels
3.10 easy expert-C-API type-feature

Comments

@alex
Copy link
Member

@alex alex commented Sep 14, 2020

BPO 41784
Nosy @benjaminp, @alex, @methane, @skrah, @serhiy-storchaka, @zooba
PRs
  • #22252
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/alex'
    closed_at = <Date 2020-10-19.22:18:07.424>
    created_at = <Date 2020-09-14.19:37:03.513>
    labels = ['easy', 'expert-C-API', 'type-feature', '3.10']
    title = 'Promote PyUnicode_AsUTF8AndSize to be available with the limited API (PEP 384)'
    updated_at = <Date 2020-10-19.22:18:07.423>
    user = 'https://github.com/alex'

    bugs.python.org fields:

    activity = <Date 2020-10-19.22:18:07.423>
    actor = 'steve.dower'
    assignee = 'alex'
    closed = True
    closed_date = <Date 2020-10-19.22:18:07.424>
    closer = 'steve.dower'
    components = ['C API']
    creation = <Date 2020-09-14.19:37:03.513>
    creator = 'alex'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 41784
    keywords = ['patch', 'easy (C)']
    message_count = 13.0
    messages = ['376896', '376922', '376941', '376947', '376952', '376973', '376978', '376987', '377094', '377106', '379026', '379036', '379037']
    nosy_count = 6.0
    nosy_names = ['benjamin.peterson', 'alex', 'methane', 'skrah', 'serhiy.storchaka', 'steve.dower']
    pr_nums = ['22252']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue41784'
    versions = ['Python 3.10']

    @alex
    Copy link
    Member Author

    @alex alex commented Sep 14, 2020

    This function is incredibly useful for efficient interoperability between Python and other languages with UTF-8 based strings (e.g. Rust). Right now it's not possible to do interop without several copies/allocations if you're trying to build an abi3 wheel. This is tactically relevant to me here: PyO3/pyo3#1125

    This API has been stable since it was introduced in Python 3.1, so I think making it stable would be appropriate.

    Inada, Benjamin suggested I should ask you for your feedback on this. I'm happy to send a pull request.

    @alex alex added expert-C-API easy type-feature labels Sep 14, 2020
    @methane
    Copy link
    Member

    @methane methane commented Sep 15, 2020

    +1. It is a very important API.

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Sep 15, 2020

    What about PyUnicode_AsUTF8? Should it be made public too or left for internal use only?

    What about third-party implementations of Python? How hard to implement this API on an implementation without reference counts? It is interesting to hear the expert opinion of the core developers of PyPy.

    @alex
    Copy link
    Member Author

    @alex alex commented Sep 15, 2020

    I think less is more, one API is plenty :-)

    It looks to me like the API is already supported on PyPy, so I think it's fine from that perspective: https://foss.heptapod.net/pypy/pypy/-/blob/branch/py3.7/pypy/module/cpyext/unicodeobject.py#L493

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Sep 15, 2020

    PyUnicode_AsUTF8() is used 3 times more than PyUnicode_AsUTF8AndSize().

    $ find -type f -name '*.c' -exec egrep 'PyUnicode_AsUTF8AndSize\(' '{}' + | wc -l
    35
    $ find -type f -name '*.c' -exec egrep 'PyUnicode_AsUTF8\(' '{}' + | wc -l
    101

    @methane
    Copy link
    Member

    @methane methane commented Sep 16, 2020

    PyUnicode_AsUTF8 is useful "API". But it can be implemented as C macro, C inline function, or functions/macros in any other languages using PyUnicode_AsUTF8AndSize.

    PyUnicode_AsUTF8AndSize is more important for "ABI".

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Sep 16, 2020

    I agree about PyUnicode_AsUTF8.

    But I think it would be worth to ask PyPy team about PyUnicode_AsUTF8AndSize.

    An alternate C API is PyUnicode_GetUTF8Buffer (bpo-39087). It requires explicit releasing the buffer after use, so it can be used even on implementations with moving garbage collector.

    @alex
    Copy link
    Member Author

    @alex alex commented Sep 16, 2020

    Py_buffer is not part of the limited API at all, so I don't think it's usable for this.

    @serhiy-storchaka
    Copy link
    Member

    @serhiy-storchaka serhiy-storchaka commented Sep 18, 2020

    Oh, would not be worth to add Py_buffer to the limited API?

    @alex
    Copy link
    Member Author

    @alex alex commented Sep 18, 2020

    It's a big project I think :-) Py_Buffer is allocated on the stack, so either we'd have to agree to never change it's ABI (size, alignment, etc.) or we'd need to completely change the interface.

    @zooba
    Copy link
    Member

    @zooba zooba commented Oct 19, 2020

    Agreed that there's no way we can make Py_buffer part of the limited ABI.

    I just looked over the PR and it's missing a What's New entry (e.g. https://docs.python.org/3/whatsnew/3.9.html#c-api-changes). Other than that, looks fine to me.

    @zooba
    Copy link
    Member

    @zooba zooba commented Oct 19, 2020

    New changeset 3a8fdb2 by Alex Gaynor in branch 'master':
    bpo-41784: make PyUnicode_AsUTF8AndSize part of the limited API (GH-22252)
    3a8fdb2

    @zooba
    Copy link
    Member

    @zooba zooba commented Oct 19, 2020

    Thanks, Alex!

    @zooba zooba closed this as completed Oct 19, 2020
    @zooba zooba closed this as completed Oct 19, 2020
    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.10 easy expert-C-API type-feature
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants