Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make PyUnicode_AsUTF8 returning "const char *" rather of "char *" #72955

Closed
serhiy-storchaka opened this issue Nov 22, 2016 · 10 comments
Closed

Make PyUnicode_AsUTF8 returning "const char *" rather of "char *" #72955

serhiy-storchaka opened this issue Nov 22, 2016 · 10 comments
Assignees
Labels
3.7 interpreter-core type-feature

Comments

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Nov 22, 2016

BPO 28769
Nosy @ncoghlan, @vstinner, @serhiy-storchaka
PRs
  • #1294
  • Files
  • PyUnicode_AsUTF8-const.patch
  • PyUnicode_AsUTF8-const-2.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = 'https://github.com/serhiy-storchaka'
    closed_at = <Date 2017-01-22.21:08:02.993>
    created_at = <Date 2016-11-22.07:55:24.600>
    labels = ['interpreter-core', 'type-feature', '3.7']
    title = 'Make PyUnicode_AsUTF8 returning "const char *" rather of "char *"'
    updated_at = <Date 2017-04-26.11:51:50.342>
    user = 'https://github.com/serhiy-storchaka'

    bugs.python.org fields:

    activity = <Date 2017-04-26.11:51:50.342>
    actor = 'vstinner'
    assignee = 'serhiy.storchaka'
    closed = True
    closed_date = <Date 2017-01-22.21:08:02.993>
    closer = 'serhiy.storchaka'
    components = ['Interpreter Core']
    creation = <Date 2016-11-22.07:55:24.600>
    creator = 'serhiy.storchaka'
    dependencies = []
    files = ['45596', '45981']
    hgrepos = []
    issue_num = 28769
    keywords = ['patch']
    message_count = 10.0
    messages = ['281439', '281445', '281447', '281448', '283547', '283730', '286026', '286029', '286030', '292335']
    nosy_count = 4.0
    nosy_names = ['ncoghlan', 'vstinner', 'python-dev', 'serhiy.storchaka']
    pr_nums = ['1294']
    priority = 'normal'
    resolution = 'fixed'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = 'enhancement'
    url = 'https://bugs.python.org/issue28769'
    versions = ['Python 3.7']

    @serhiy-storchaka
    Copy link
    Member Author

    @serhiy-storchaka serhiy-storchaka commented Nov 22, 2016

    PyUnicode_AsUTF8AndSize() and PyUnicode_AsUTF8() return a reference to cached readonly UTF-8 representation of a string. Changing the content of the UTF-8 representation is an error. Proposed patch makes these functions returning "const char *" rather of "char *" to force this restriction.

    This is backward-incompatible change. Since PyUnicode_AsUTF8AndSize() and PyUnicode_AsUTF8() can return an error, it is more likely that the result is saved in a local variable rather than passing to other function. If the type of this variable is "char *" rather than "const char *", this would cause a compiler error. The fix is simple -- just add the const qualifier to the local variable declaration (more preferable) or cast the result of PyUnicode_AsUTF8AndSize() or PyUnicode_AsUTF8() to "char *".

    Both functions are not in stable API.

    @serhiy-storchaka serhiy-storchaka added 3.7 interpreter-core type-feature labels Nov 22, 2016
    @vadmium
    Copy link
    Member

    @vadmium vadmium commented Nov 22, 2016

    No opinion if this is a good change to make, but I left some review suggestions

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Nov 22, 2016

    Hum, I would like to discuss this topic on python-dev.

    Changing PyUnicode_AsUTF8() alone is fine, but the issue with changing return type is that the const has to be propagated to callers, and then to callers of callers, etc. For example, if your patch, you cast (const char*) to (char*) to call tp_getattr.

    The question is why tp_getattr doesn't use (const char*)?

    I would prefer to take an overall decision for the C API, to decide if it's ok to "propagate" const changes in various places of the C API.

    About the stable API: in fact, it's more a stable *ABI*: PEP-384, "Defining a Stable ABI". At the ABI level, there is no more "const". So it's perfectly fine to add or remove const, we already did that in the past.

    Obviously, such change should only be done in Python 3.7.

    For me, the main issue is for Python modules compiled with -Werror: if they upgrade to Python 3.7, the compilation will fail because they cast (const char*) to (char*) implicitly, which is a warning when using -Wall -Wextra, warning converted to a compilation error.

    That's why I suggest to have an overall discussion on const on python-dev ;-)

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Nov 22, 2016

    Hum, sorry, my opinion on const is not obvious in my previous message: I like const :-) I want to use const everywhere! I still "believe" (I don't know if it's true or not) that const helps a lot compilers to optimize the code.

    I don't know if it helps for a single variable. Maybe it's more helpful on a whole structure and/or pointers to avoid complex heuristics on aliasing.

    My first attempt to design the _PyBytesWriter API was a big mistake: it was much slower: issue bpo-17742. I understood that using a structure instead of multiple variables does stress the compiler who doesn't know if some optimizations are still save. In case of doubt, the compiler doesn't optimize to avoid generating invalid code.

    @serhiy-storchaka
    Copy link
    Member Author

    @serhiy-storchaka serhiy-storchaka commented Dec 18, 2016

    @serhiy-storchaka
    Copy link
    Member Author

    @serhiy-storchaka serhiy-storchaka commented Dec 21, 2016

    Addressed comments, added the versionchanged directives, the code in _decimal.c is now more obvious.

    @serhiy-storchaka
    Copy link
    Member Author

    @serhiy-storchaka serhiy-storchaka commented Jan 22, 2017

    Stefan, what are your thoughts about this? The patch touches _decimal.c.

    @skrah
    Copy link
    Mannequin

    @skrah skrah mannequin commented Jan 22, 2017

    For _decimal I'm happy with just the cast from the first patch -- you have a one line diff and it's easy to see the focus of the issue.

    @serhiy-storchaka serhiy-storchaka self-assigned this Jan 22, 2017
    @python-dev
    Copy link
    Mannequin

    @python-dev python-dev mannequin commented Jan 22, 2017

    New changeset 0d89212941f4 by Serhiy Storchaka in branch 'default':
    Issue bpo-28769: The result of PyUnicode_AsUTF8AndSize() and PyUnicode_AsUTF8()
    https://hg.python.org/cpython/rev/0d89212941f4

    @vstinner
    Copy link
    Member

    @vstinner vstinner commented Apr 26, 2017

    New changeset 6e67695 by Victor Stinner in branch 'master':
    timemodule.c: Cast PyUnicode_AsUTF8() to char* (bpo-1294)
    6e67695

    @ezio-melotti ezio-melotti transferred this issue from another repository Apr 10, 2022
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.7 interpreter-core type-feature
    Projects
    None yet
    Development

    No branches or pull requests

    3 participants