Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyUnicode_FromFormat(): add %T format for an object type name #78776

Closed
vstinner opened this issue Sep 6, 2018 · 17 comments
Closed

PyUnicode_FromFormat(): add %T format for an object type name #78776

vstinner opened this issue Sep 6, 2018 · 17 comments
Labels
3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)

Comments

@vstinner
Copy link
Member

vstinner commented Sep 6, 2018

BPO 34595
Nosy @vstinner, @ericvsmith, @encukou, @serhiy-storchaka
PRs
  • bpo-34595: Add %T format to PyUnicode_FromFormatV() #9080
  • bpo-34595: Format string using %T in Python/ #9103
  • DO-NOT-MERGE: bpo-34595: Add %t format to PyUnicode_FromFormatV() #9122
  • Revert "bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)" #9187
  • bpo-34595: WIP: Type fully qualified name #9251
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = <Date 2018-09-19.22:48:11.516>
    created_at = <Date 2018-09-06.09:13:30.086>
    labels = ['interpreter-core', '3.8']
    title = 'PyUnicode_FromFormat(): add %T format for an object type name'
    updated_at = <Date 2018-09-19.22:48:11.515>
    user = 'https://github.com/vstinner'

    bugs.python.org fields:

    activity = <Date 2018-09-19.22:48:11.515>
    actor = 'vstinner'
    assignee = 'none'
    closed = True
    closed_date = <Date 2018-09-19.22:48:11.516>
    closer = 'vstinner'
    components = ['Interpreter Core']
    creation = <Date 2018-09-06.09:13:30.086>
    creator = 'vstinner'
    dependencies = []
    files = []
    hgrepos = []
    issue_num = 34595
    keywords = ['patch']
    message_count = 17.0
    messages = ['324675', '324676', '324677', '324678', '324679', '324767', '324792', '324801', '324821', '324828', '324875', '324876', '324877', '325045', '325078', '325079', '325814']
    nosy_count = 4.0
    nosy_names = ['vstinner', 'eric.smith', 'petr.viktorin', 'serhiy.storchaka']
    pr_nums = ['9080', '9103', '9122', '9187', '9251']
    priority = 'normal'
    resolution = 'rejected'
    stage = 'resolved'
    status = 'closed'
    superseder = None
    type = None
    url = 'https://bugs.python.org/issue34595'
    versions = ['Python 3.8']

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 6, 2018

    The PEP-399 requires that C accelerator behaves exactly as Python, but a lot of C code truncates type name to an arbitrary length: 80, 100, 200, up to 500 (not sure if it's a number of bytes or characters).

    Py_TYPE(obj)->tp_name is a common pattern: it would be nice to have a new "%T" format in PyUnicode_FromFormat(), so it can be used directly in PyErr_Format(), to format an object type name.

    Attached PR implements the proposed %T format and modify unicodeobject.c to use it.

    I propose to then write a second PR to modify all C code of CPython using Py_TYPE(obj)->tp_name to use the new %T type.

    @vstinner vstinner added 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Sep 6, 2018
    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 6, 2018

    My previous attempt to fix that issue, 7 years ago: bpo-10833 :-)

    See also bpo-7330 where I implemented width and precision (ex: "%5.3s") in PyUnicode_FromFormat().

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 6, 2018

    I cannot find %T in printf() manual pages on Fedora 28 (Linux).

    I can find it in the strftime() documentation:

    %T The time in 24-hour notation (%H:%M:%S). (SU)

    But I don't think that it's an issue since printf() and strftime() formatters are exclusive, no? For example, strftime() %s means "number of seconds since the Epoch" whereas printf() %s means a "const char*" byte string.

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 6, 2018

    Oh, PyUnicode_FromFormat() has %A to format as ASCII, whereas printf() already has %A but for a different meaning:

    a, A For a conversion, the double argument is converted to hexadecimal notation (using the letters abcdef) in the style [-]0xh.hhhhp+-; for A conversion the prefix 0X, the letters ABCDEF, and the exponent separator P is used. (...)

    @ericvsmith
    Copy link
    Member

    I don't think you have to worry about %T being used by other formatting functions. If (heaven forbid) dates were ever supported by PyUnicode_FromFormat(), there would have to be a way to switch from "normal" argument processing to argument-specific formatting specifiers, anyway.

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 7, 2018

    New changeset 886483e by Victor Stinner in branch 'master':
    bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)
    886483e

    @serhiy-storchaka
    Copy link
    Member

    IIRC there is similar issue or a discussion on one of mailing lists. But the idea about adding this feature on Python side too was considered. It would be better to find this discussion before pushing this change.

    In some cases we have the type itself, not an instance. So it makes sense to make %T an equivalent of arg->tp_name instead of Py_TYPE(arg)->tp_name.

    On Python side, you need to output either short name obj.__class__.__name__ or full qualified name obj.__class__.__module__ + '.' + obj.__class__.__qualname__ with exception that the module name should be omitted if it is 'builtins'.

    obj.__class__.__qualname__ if obj.__class__.__module__ == 'builtins' else f'{obj.__class__.__module__}.{obj.__class__.__qualname__}'
    

    The case of the module name '__main__' can be handled specially too.
    Obviously it is desirable to have a more handy way of writing such expression.

    On C side, the problem is that tp_name means different, depending of the kind of the type. In some cases it is a short name, in other cases it is a full qualified name. It is not easy to write a code that produces the same output in Python and C. I have added a helper _PyType_Name() that helps to solve a part of these issues. If you want to output a short name (just cls.__name__ in Python), use _PyType_Name(cls) instead of cls->tp_name. But this doesn't help for the case when you need to output a full qualified name.

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 7, 2018

    In some cases we have the type itself, not an instance. So it makes sense to make %T an equivalent of arg->tp_name instead of Py_TYPE(arg)->tp_name.

    "arg->tp_name" is rare in the code base, whereas "Py_TYPE(arg)->tp_name" is a very common pattern.

    I'm not sure that a formatter is needed for "arg->tp_name", you can already use "%s" with "arg->tp_name", no?

    My intent was to make the C code less verbose, respect the PEP-399, but also indirectly avoid the implicit borrowed reference of Py_TYPE() :-)
    https://pythoncapi.readthedocs.io/bad_api.html#borrowed-references

    On Python side, you need to output either short name obj.__class__.__name__ or full qualified name obj.__class__.__module__ + '.' + obj.__class__.__qualname__ with exception that the module name should be omitted if it is 'builtins'.

    Sometimes, the qualified name would be more appropriate, but it's sometimes tricky to decide if the short name, qualified name or fully qualified name is the "right" name... So I chose to restrict this issue to the most common case, Py_TYPE(arg)->tp_name :-)

    Ah, and changing strings is a risk of breaking the backward compatibility. For example, cause issue with pickle. So it should be discussed on a case by case basis.

    Moreover, if you want to change a string, the Python code should be updated as well. I suggest to open a new issue to discuss that.

    Don't get me wrong, I'm interested to do these changes, but it's a wider project :-)

    obj.__class__.__qualname__ if obj.__class__.__module__ == 'builtins' else f'{obj.__class__.__module__}.{obj.__class__.__qualname__}'

    The case of the module name '__main__' can be handled specially too.
    Obviously it is desirable to have a more handy way of writing such expression.

    To be honest, I also considered to proposed a second formatter to do something like that :-) But as you explained, I'm not sure which name is the good name: qualified or fully qualified (with module name)?

    First of all, would it help to have a *function* to get these names? Maybe we could first use such functions before discussing adding a new formatter in PyUnicode_FromFormat()?

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 8, 2018

    An alternative would be to add multiple formatters. Example:

    • %Tn: type name, type.__name__, Py_TYPE(obj)->tp_name
    • %Tq: qualified name, type.__qualname__
    • %Ts: short name, never contains "."
    • %Tf: fully qualified name, "module.qualified.name"

    What do you think Serhiy?

    @serhiy-storchaka
    Copy link
    Member

    I think we need to handle only two cases: short and fully qualified names. __qualname__ without __module__ doesn't make sense, and the value of tp_name depends on implementation details (is it Python class or builtin class, heap class or dynamic class?). Maybe use %t and %T?

    But we may want to support formatting the name of the type itself and the name of the object's type. This give us 4 variants.

    For old string formatting we can introduce new % codes (with possible modifiers). But in modern string formatting "T" can have meaning for some types (e.g. for datetime). We can implement __format__ for the type type itself (though it can cause confusion if cls.__format__() is different from cls.__format__(instance)), but for formatting the name of the object's type (as in your original proposition) we need to add a new conversion flag like "!r".

    @vstinner
    Copy link
    Member Author

    vstinner commented Sep 9, 2018

    I think we need to handle only two cases: short and fully qualified names. __qualname__ without __module__ doesn't make sense, and the value of tp_name depends on implementation details (is it Python class or builtin class, heap class or dynamic class?). Maybe use %t and %T?

    Ok, I wrote PR 9122 to add %t format and modify %T format:

    • %t <=> type(obj).__name__
    • %T <=> f"{type(obj).__module__}.{type(obj).__qualname__}"

    But we may want to support formatting the name of the type itself and the name of the object's type. This give us 4 variants.

    Again, I'm not sure about these ones. _PyType_Name() can be used for %t-like directly on a type. Later we can expose _PyType_FullName() (function that I added in my latest PR) as a private function for %T-like directly on a type.

    For old string formatting we can introduce new % codes (with possible modifiers). But in modern string formatting "T" can have meaning for some types (e.g. for datetime). We can implement __format__ for the type type itself (though it can cause confusion if cls.__format__() is different from cls.__format__(instance)), but for formatting the name of the object's type (as in your original proposition) we need to add a new conversion flag like "!r".

    I'm not sure that I understood directly.

    Do you want to add a third formatter in PyUnicode_FromFormat() which would use Py_TYPE(obj)->tp_name? I dislike Py_TYPE(obj)->tp_name, since my intent is to conform to the PEP-399: tp_name is not accessible at the Python level, only type(obj).__name__ and type(obj).__qualname__.

    Or do you want to add a new formatter to type.__format__() to expose %T at the Python level, f"{type(obj).__module__}.{type(obj).__qualname__}"?

    Currently, type(obj).__name__ is the most popular way to format a string. Would it break the backward compatibility to modify *existing* error messages?

    @serhiy-storchaka
    Copy link
    Member

    Ok, I wrote PR 9122 to add %t format and modify %T format:

    Nice!

    I agree that it is easy to use _PyType_Name() directly. But using _PyType_FullName() instead of tp_name can be cumbersome because it returns a new object and needs error handling.

    Or do you want to add a new formatter to type.__format__() to expose %T at the Python level, f"{type(obj).__module__}.{type(obj).__qualname__}"?

    Yes, I think we need a convenient way of formatting fully qualified name that omits the module name for types in the builtins module. It is equivalent to Py_TYPE(obj)->tp_name for extension types which is the most popular way to format a type name in error messages for now.

    There are several open issues for inconsistency in error messages for Python and C implementations, because the former use type(obj).__name__ or obj.__class__.__name__, and the latter use Py_TYPE(obj)->tp_name. I hope finally we will fix this.

    @serhiy-storchaka
    Copy link
    Member

    in error messages

    And in reprs. It is common to format a repr as "{typename}(...)" or "<{typename}(...)>". The difference is whether the typename is a short or fully qualified name.

    @encukou
    Copy link
    Member

    encukou commented Sep 11, 2018

    The PEP-399 requires that C accelerator behaves exactly as Python, [...]

    It does not. PEP-399 requires that that the C code must pass the same *test suite*. And error messages in particular tend to not be checked in tests.

    Anyway, I don't see how that applies to replacing Py_TYPE(obj)->tp_name by %T. The real reason for that change is removing borrowed references, right?
    I have not yet seen a good reason why Py_TYPE(obj) is bad. The reasons you give in https://pythoncapi.readthedocs.io/bad_api.html#borrowed-references are about tagged pointers and PyList_GetItem(), but Py_TYPE() is very different.

    I don't think the reasons are strong enough to add new API to PyUnicode_FromFormat().

    @vstinner
    Copy link
    Member Author

    New changeset 998b806 by Victor Stinner in branch 'master':
    Revert "bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)" (GH-9187)
    998b806

    @vstinner
    Copy link
    Member Author

    Petr Viktorin asked me to open a wider discussion about this issue on python-dev. I just reverted my first change and posted:
    https://mail.python.org/pipermail/python-dev/2018-September/155150.html

    @vstinner
    Copy link
    Member Author

    I close this issue until we can agree on an API.

    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs)
    Projects
    None yet
    Development

    No branches or pull requests

    4 participants