Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 737: PyUnicode_FromFormat(): Add %T format to format the type name of an object #111696

Closed
vstinner opened this issue Nov 3, 2023 · 10 comments

Comments

@vstinner
Copy link
Member

vstinner commented Nov 3, 2023

It's common to format a type name by accessing PyTypeObject.tp_name member. Example:

PyErr_Format(PyExc_TypeError,
             "__format__ must return a str, not %.200s",
             Py_TYPE(result)->tp_name);

Problems:

  • PyTypeObject.tp_name (type.__name__) is less helpful than PyHeapTypeObject.ht_qualname (type.__qualname__). I would prefer to display the qualified type name.
  • PyTypeObject.tp_name is a UTF-8 encoded string, it requires to decode the UTF-8 string at each call.
  • I would like to remove PyTypeObject members (tp_name) from the public C API: issue C API: Investigate how the PyTypeObject members can be removed from the public C API #105970.
  • By the way, in the early days of Python, PyString_FromFormat() used a buffer with a fixed size, so the output string should be truncated to avoid overflow. But nowadays, PyUnicode_FromFormat() allocates a buffer on the heap and is no longer limited to 200 characters. Truncated a type name can miss important information in the error message. I would prefer to not truncate the type name.

I propose adding a %T format to PyUnicode_FromUnicode() to format the qualified name of an object type. For example, the example would become:

PyErr_Format(PyExc_TypeError,
             "__format__ must return a str, not %T", result);

In 2018, I already added %T format to PyUnicode_FromFormat(): issue GH-78776. See related python-dev discussion. The change was reverted.

Linked PRs

@vstinner
Copy link
Member Author

vstinner commented Nov 3, 2023

I propose adding a %T format to PyUnicode_FromUnicode() to format the qualified name of an object type.

It was proposed adding %T for type(obj).__name__ and %#T for type(obj).__qualname__. I would prefer to use the qualified name by default, it provides more useful information. Maybe %#T can be added later for the short type name.

In 2018, I already added %T format to PyUnicode_FromFormat(): issue #78776. See related python-dev discussion. The change was reverted.

Since this revert change, the PyType_GetQualName() function was added to Python 3.11:
https://docs.python.org/dev/c-api/type.html#c.PyType_GetQualName

vstinner added a commit to vstinner/cpython that referenced this issue Nov 3, 2023
* Add "%T" and "%#T" formats to PyUnicode_FromFormat()
* Add "T" and "#T" formats to type.__format__()
* Add type.__fullyqualname__ attribute
vstinner added a commit to vstinner/cpython that referenced this issue Nov 3, 2023
* Add "%T" and "%#T" formats to PyUnicode_FromFormat().
* Add "T" and "#T" formats to type.__format__().
* Add type.__fullyqualname__ read-only attribute.
vstinner added a commit to vstinner/cpython that referenced this issue Nov 3, 2023
* Add "%T" and "%#T" formats to PyUnicode_FromFormat().
* Add "T" and "#T" formats to type.__format__().
* Add type.__fullyqualname__ read-only attribute.
vstinner added a commit to vstinner/cpython that referenced this issue Nov 3, 2023
* Add "%T" and "%#T" formats to PyUnicode_FromFormat().
* Add "T" and "#T" formats to type.__format__().
* Add type.__fullyqualname__ read-only attribute.
vstinner added a commit to vstinner/cpython that referenced this issue Nov 3, 2023
* Add "%T" and "%#T" formats to PyUnicode_FromFormat().
* Add "T" and "#T" formats to type.__format__().
* Add type.__fullyqualname__ read-only attribute.
vstinner added a commit to vstinner/cpython that referenced this issue Nov 3, 2023
* Add "%T" and "%#T" formats to PyUnicode_FromFormat().
* Add "T" and "#T" formats to type.__format__().
* Add type.__fullyqualname__ read-only attribute.
@serhiy-storchaka
Copy link
Member

Is there already an open issue with similar proposition? This idea was discussed years ago.

If add a specialized format, we should provide options for all reasonable variants:

  • t.__name__
  • t.__qualname__
  • f'{t.__module__}.{t.__qualname__}'
  • f'{t.__module__}.{t.__qualname__}'
  • type(x).__name__
  • type(x).__qualname__
  • f'{type(x).__module__}.{type(x).__qualname__}'

It would be nice to have also option that include __module__ only if it is not builtins or __main__ (maybe include types?). It is common format in many standard messages.

Note that we need both options for the type itself and for the type of the argument, because we want to avoid calling Py_TYPE() as well as accessing tp_name.

%t and %T was proposed earlier for two variants, but %t is now used for ptrdiff_t, so only %T left.

We can use the # modifier to chose between the type itself and the type of the argument.

We can use the "size" modifiers l and ll for "longer" and "longest" representation. We can add also "h" and "hh" for "shorter" and "shortest".

This is a half of the issue. Other half, that stopped the previous attempt, is designing a similar features for printf-like formatting in Python (it is easy, we can just copy this format) and for new-style formatting and f-strings. Several changes can help with the latter (they are optional and can be combined):

  1. Add type.__format__.
  2. Add type attributes like __fullname__ and __shortfullname__ (?).
  3. Add the !t converter.
  4. Make type.__repr__() returning a full qualified name.

@vstinner
Copy link
Member Author

vstinner commented Nov 3, 2023

Add type.format

Done by my PR.

Add type attributes like fullname and shortfullname (?).

My PR adds __fullyqualname__.

What is __shortfullname__?

Add the !t converter.

Eric Smith was against it: https://mail.python.org/archives/list/python-dev@python.org/message/BMIW3FEB77OS7OB3YYUUDUBITPWLRG3U/

Make type.__repr__() returning a full qualified name.

Isn't it already the case, module + qualname?

If add a specialized format, we should provide options for all reasonable variants: (...)

I disagree. Having too many options would lead to too many differences between functions. I would prefer to have limited choices. It's always possible to access to other formats by calling functions or reading type attributes, for more specialized cases.

It would be nice to have also option that include module only if it is not builtins or main (maybe include types?). It is common format in many standard messages.

I would prefer to have two choices:

  • short name: type.__name__
  • fully qualified name: type.__fullyqualname__ (new attribute)

Currently, Py_TYPE(obj)->tp_name is used and tp_name is different depending on how the type is created:

  • Heap type (implemented in C or in Python): tp_name is type.__name__ (short name)
  • Static type: tp_name is type.__fullyqualname__ (fully qualified name, usually module.Type), sometimes it's type.__name__ (short name) when the C extension doesn't include the module name.

In Python, the short type name is usually used (type.__name__):

raise TypeError('expected AST, got %r' % node.__class__.__name__)
raise TypeError("key: expected bytes or bytearray, but got %r" % type(key).__name__)

We can use the # modifier to chose between the type itself and the type of the argument.

My PR uses # modifier to select between short type name and fully qualified type name. I don't think that a modifier is the right API to chose between type(arg) and arg, IMO it's too different. Usually, the # modifier is to select between two ways to format the same value.

Note that we need both options for the type itself and for the type of the argument

My PR proposes a different API in Python and in C:

  • Python: raise TypeError(f"expect str, got {type(arg):T}"), the type() function must be called explicitly.
  • C: PyErr_Format(PyExc_TypeError, "expect str, got %T", arg), pass an object, Py_TYPE() is called implicitly.

Is it common to handle a type, instead of an instance?

vstinner added a commit to vstinner/cpython that referenced this issue Nov 4, 2023
* Add "%T" and "%#T" formats to PyUnicode_FromFormat().
* Add type.__fullyqualname__ read-only attribute.
vstinner added a commit to vstinner/cpython that referenced this issue Nov 4, 2023
* Add "%T" and "%#T" formats to PyUnicode_FromFormat().
* Add type.__fullyqualname__ read-only attribute.
@vstinner
Copy link
Member Author

vstinner commented Nov 4, 2023

Coarse statistics:

vstinner@mona$ grep 'raise.*\.__name__' $(find Lib/ -name "*.py")|wc -l
44
vstinner@mona$ grep 'raise.*\.__qualname__' $(find Lib/ -name "*.py")|wc -l
4

It seems like most Python functions raising error messages including a type name prefers the short name over the qualified name.

@vstinner
Copy link
Member Author

vstinner commented Nov 7, 2023

@vstinner
Copy link
Member Author

vstinner commented Nov 8, 2023

Such API will ease the conversion of C extensions using Argument Clinic to the limited C API (such as the grp module). Currently, Argument Clinic generates code reading the PyTypeObject.tp_name member which is not compatible with the limited C API:

    def bad_argument(self, displayname: str, expected: str, *, limited_capi: bool, expected_literal: bool = True) -> str:
        assert '"' not in expected
        if limited_capi:
            if expected_literal:
                return (f'PyErr_Format(PyExc_TypeError, '
                        f'"{{{{name}}}}() {displayname} must be {expected}, not %.50s", '
                        f'{{argname}} == Py_None ? "None" : Py_TYPE({{argname}})->tp_name);')
            else:
                return (f'PyErr_Format(PyExc_TypeError, '
                        f'"{{{{name}}}}() {displayname} must be %.50s, not %.50s", '
                        f'"{expected}", '
                        f'{{argname}} == Py_None ? "None" : Py_TYPE({{argname}})->tp_name);')
        else:
            if expected_literal:
                expected = f'"{expected}"'
            if clinic is not None:
                clinic.add_include('pycore_modsupport.h', '_PyArg_BadArgument()')
            return f'_PyArg_BadArgument("{{{{name}}}}", "{displayname}", {expected}, {{argname}});'

vstinner added a commit to vstinner/cpython that referenced this issue Nov 9, 2023
vstinner added a commit to vstinner/cpython that referenced this issue Nov 9, 2023
vstinner added a commit to vstinner/cpython that referenced this issue Nov 15, 2023
* Update modules:

  * enum
  * functools
  * optparse
  * pdb
  * xmlrcp.server

* Update tests:

  * test_dataclasses
  * test_descrtut
  * test_cmd_line_script
@vstinner
Copy link
Member Author

As discussed previously, one option is to change str(type) to format the fully qualified name of the type, instead of falling back on repr(type) which formats the type as <class ...>. I wrote PR gh-112129 to see how it goes. Multiple modules and tests are impacted by such change. I'm not sure that it's a good idea.

vstinner added a commit to vstinner/cpython that referenced this issue Nov 15, 2023
Add PyType_GetFullyQualName() function.
vstinner added a commit to vstinner/cpython that referenced this issue Nov 15, 2023
Add PyType_GetFullyQualName() function.
vstinner added a commit to vstinner/cpython that referenced this issue Nov 15, 2023
Add PyType_GetFullyQualName() function.
vstinner added a commit to vstinner/cpython that referenced this issue Nov 15, 2023
Add PyType_GetFullyQualName() function with documentation and tests.
vstinner added a commit to vstinner/cpython that referenced this issue Nov 15, 2023
Add PyType_GetFullyQualName() function with documentation and tests.
vstinner added a commit to vstinner/cpython that referenced this issue Nov 15, 2023
Add PyType_GetFullyQualName() function with documentation and tests.
vstinner added a commit to vstinner/cpython that referenced this issue Mar 14, 2024
Rewrite tests on type names in Python, they were written in C.
vstinner added a commit to vstinner/cpython that referenced this issue Mar 14, 2024
Rewrite tests on type names in Python, they were written in C.
vstinner added a commit to vstinner/cpython that referenced this issue Mar 14, 2024
Rewrite tests on type names in Python, they were written in C.
vstinner added a commit to vstinner/cpython that referenced this issue Mar 14, 2024
Rewrite tests on type names in Python, they were written in C.
vstinner added a commit that referenced this issue Mar 14, 2024
)

Rewrite tests on type names in Python, they were written in C.
vstinner added a commit to vstinner/cpython that referenced this issue Mar 14, 2024
vstinner added a commit to vstinner/cpython that referenced this issue Mar 14, 2024
vstinner added a commit to vstinner/cpython that referenced this issue Mar 14, 2024
Author: Eric Snow <ericsnowcurrently@gmail.com>
vstinner added a commit to vstinner/cpython that referenced this issue Mar 14, 2024
Author: Eric Snow <ericsnowcurrently@gmail.com>
vstinner added a commit that referenced this issue Mar 14, 2024
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
vstinner added a commit to vstinner/cpython that referenced this issue Mar 14, 2024
vstinner added a commit to vstinner/cpython that referenced this issue Mar 14, 2024
vstinner added a commit to vstinner/cpython that referenced this issue Mar 14, 2024
vstinner added a commit to vstinner/cpython that referenced this issue Mar 14, 2024
vstinner added a commit to vstinner/cpython that referenced this issue Mar 20, 2024
…python#116815)

Rewrite tests on type names in Python, they were written in C.
vstinner added a commit to vstinner/cpython that referenced this issue Mar 20, 2024
…#116824)

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024
…python#116815)

Rewrite tests on type names in Python, they were written in C.
adorilson pushed a commit to adorilson/cpython that referenced this issue Mar 25, 2024
…#116824)

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Apr 8, 2024
* Fix implementation of %#T and %#N (they were implemented as %T# and
  %N#).
* Restore tests removed in pythongh-111696.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…python#116815)

Rewrite tests on type names in Python, they were written in C.
diegorusso pushed a commit to diegorusso/cpython that referenced this issue Apr 17, 2024
…#116824)

Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants