PyUnicode_FromFormat(): add %T format for an object type name #78776

vstinner · 2018-09-06T09:13:30Z

BPO	34595
Nosy	@vstinner, @ericvsmith, @encukou, @serhiy-storchaka
PRs	bpo-34595: Add %T format to PyUnicode_FromFormatV() #9080 bpo-34595: Format string using %T in Python/ #9103 DO-NOT-MERGE: bpo-34595: Add %t format to PyUnicode_FromFormatV() #9122 Revert "bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)" #9187 bpo-34595: WIP: Type fully qualified name #9251

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2018-09-19.22:48:11.516>
created_at = <Date 2018-09-06.09:13:30.086>
labels = ['interpreter-core', '3.8']
title = 'PyUnicode_FromFormat(): add %T format for an object type name'
updated_at = <Date 2018-09-19.22:48:11.515>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2018-09-19.22:48:11.515>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2018-09-19.22:48:11.516>
closer = 'vstinner'
components = ['Interpreter Core']
creation = <Date 2018-09-06.09:13:30.086>
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 34595
keywords = ['patch']
message_count = 17.0
messages = ['324675', '324676', '324677', '324678', '324679', '324767', '324792', '324801', '324821', '324828', '324875', '324876', '324877', '325045', '325078', '325079', '325814']
nosy_count = 4.0
nosy_names = ['vstinner', 'eric.smith', 'petr.viktorin', 'serhiy.storchaka']
pr_nums = ['9080', '9103', '9122', '9187', '9251']
priority = 'normal'
resolution = 'rejected'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue34595'
versions = ['Python 3.8']

vstinner · 2018-09-06T09:13:30Z

The PEP-399 requires that C accelerator behaves exactly as Python, but a lot of C code truncates type name to an arbitrary length: 80, 100, 200, up to 500 (not sure if it's a number of bytes or characters).

Py_TYPE(obj)->tp_name is a common pattern: it would be nice to have a new "%T" format in PyUnicode_FromFormat(), so it can be used directly in PyErr_Format(), to format an object type name.

Attached PR implements the proposed %T format and modify unicodeobject.c to use it.

I propose to then write a second PR to modify all C code of CPython using Py_TYPE(obj)->tp_name to use the new %T type.

vstinner · 2018-09-06T09:23:40Z

My previous attempt to fix that issue, 7 years ago: bpo-10833 :-)

See also bpo-7330 where I implemented width and precision (ex: "%5.3s") in PyUnicode_FromFormat().

vstinner · 2018-09-06T09:26:57Z

I cannot find %T in printf() manual pages on Fedora 28 (Linux).

I can find it in the strftime() documentation:

%T The time in 24-hour notation (%H:%M:%S). (SU)

But I don't think that it's an issue since printf() and strftime() formatters are exclusive, no? For example, strftime() %s means "number of seconds since the Epoch" whereas printf() %s means a "const char*" byte string.

vstinner · 2018-09-06T09:29:05Z

Oh, PyUnicode_FromFormat() has %A to format as ASCII, whereas printf() already has %A but for a different meaning:

a, A For a conversion, the double argument is converted to hexadecimal notation (using the letters abcdef) in the style [-]0xh.hhhhp+-; for A conversion the prefix 0X, the letters ABCDEF, and the exponent separator P is used. (...)

ericvsmith · 2018-09-06T09:33:51Z

I don't think you have to worry about %T being used by other formatting functions. If (heaven forbid) dates were ever supported by PyUnicode_FromFormat(), there would have to be a way to switch from "normal" argument processing to argument-specific formatting specifiers, anyway.

vstinner · 2018-09-07T16:01:04Z

New changeset 886483e by Victor Stinner in branch 'master':
bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)
886483e

serhiy-storchaka · 2018-09-07T21:03:51Z

IIRC there is similar issue or a discussion on one of mailing lists. But the idea about adding this feature on Python side too was considered. It would be better to find this discussion before pushing this change.

In some cases we have the type itself, not an instance. So it makes sense to make %T an equivalent of arg->tp_name instead of Py_TYPE(arg)->tp_name.

On Python side, you need to output either short name obj.__class__.__name__ or full qualified name obj.__class__.__module__ + '.' + obj.__class__.__qualname__ with exception that the module name should be omitted if it is 'builtins'.

obj.__class__.__qualname__ if obj.__class__.__module__ == 'builtins' else f'{obj.__class__.__module__}.{obj.__class__.__qualname__}'

The case of the module name '__main__' can be handled specially too.
Obviously it is desirable to have a more handy way of writing such expression.

On C side, the problem is that tp_name means different, depending of the kind of the type. In some cases it is a short name, in other cases it is a full qualified name. It is not easy to write a code that produces the same output in Python and C. I have added a helper _PyType_Name() that helps to solve a part of these issues. If you want to output a short name (just cls.__name__ in Python), use _PyType_Name(cls) instead of cls->tp_name. But this doesn't help for the case when you need to output a full qualified name.

vstinner · 2018-09-07T21:45:00Z

In some cases we have the type itself, not an instance. So it makes sense to make %T an equivalent of arg->tp_name instead of Py_TYPE(arg)->tp_name.

"arg->tp_name" is rare in the code base, whereas "Py_TYPE(arg)->tp_name" is a very common pattern.

I'm not sure that a formatter is needed for "arg->tp_name", you can already use "%s" with "arg->tp_name", no?

My intent was to make the C code less verbose, respect the PEP-399, but also indirectly avoid the implicit borrowed reference of Py_TYPE() :-)
https://pythoncapi.readthedocs.io/bad_api.html#borrowed-references

On Python side, you need to output either short name obj.__class__.__name__ or full qualified name obj.__class__.__module__ + '.' + obj.__class__.__qualname__ with exception that the module name should be omitted if it is 'builtins'.

Sometimes, the qualified name would be more appropriate, but it's sometimes tricky to decide if the short name, qualified name or fully qualified name is the "right" name... So I chose to restrict this issue to the most common case, Py_TYPE(arg)->tp_name :-)

Ah, and changing strings is a risk of breaking the backward compatibility. For example, cause issue with pickle. So it should be discussed on a case by case basis.

Moreover, if you want to change a string, the Python code should be updated as well. I suggest to open a new issue to discuss that.

Don't get me wrong, I'm interested to do these changes, but it's a wider project :-)

obj.__class__.__qualname__ if obj.__class__.__module__ == 'builtins' else f'{obj.__class__.__module__}.{obj.__class__.__qualname__}'

The case of the module name '__main__' can be handled specially too.
Obviously it is desirable to have a more handy way of writing such expression.

To be honest, I also considered to proposed a second formatter to do something like that :-) But as you explained, I'm not sure which name is the good name: qualified or fully qualified (with module name)?

First of all, would it help to have a *function* to get these names? Maybe we could first use such functions before discussing adding a new formatter in PyUnicode_FromFormat()?

vstinner · 2018-09-08T08:16:20Z

An alternative would be to add multiple formatters. Example:

%Tn: type name, type.__name__, Py_TYPE(obj)->tp_name
%Tq: qualified name, type.__qualname__
%Ts: short name, never contains "."
%Tf: fully qualified name, "module.qualified.name"

What do you think Serhiy?

serhiy-storchaka · 2018-09-08T10:49:29Z

I think we need to handle only two cases: short and fully qualified names. __qualname__ without __module__ doesn't make sense, and the value of tp_name depends on implementation details (is it Python class or builtin class, heap class or dynamic class?). Maybe use %t and %T?

But we may want to support formatting the name of the type itself and the name of the object's type. This give us 4 variants.

For old string formatting we can introduce new % codes (with possible modifiers). But in modern string formatting "T" can have meaning for some types (e.g. for datetime). We can implement __format__ for the type type itself (though it can cause confusion if cls.__format__() is different from cls.__format__(instance)), but for formatting the name of the object's type (as in your original proposition) we need to add a new conversion flag like "!r".

vstinner · 2018-09-09T07:53:53Z

I think we need to handle only two cases: short and fully qualified names. __qualname__ without __module__ doesn't make sense, and the value of tp_name depends on implementation details (is it Python class or builtin class, heap class or dynamic class?). Maybe use %t and %T?

Ok, I wrote PR 9122 to add %t format and modify %T format:

%t <=> type(obj).__name__
%T <=> f"{type(obj).__module__}.{type(obj).__qualname__}"

But we may want to support formatting the name of the type itself and the name of the object's type. This give us 4 variants.

Again, I'm not sure about these ones. _PyType_Name() can be used for %t-like directly on a type. Later we can expose _PyType_FullName() (function that I added in my latest PR) as a private function for %T-like directly on a type.

For old string formatting we can introduce new % codes (with possible modifiers). But in modern string formatting "T" can have meaning for some types (e.g. for datetime). We can implement __format__ for the type type itself (though it can cause confusion if cls.__format__() is different from cls.__format__(instance)), but for formatting the name of the object's type (as in your original proposition) we need to add a new conversion flag like "!r".

I'm not sure that I understood directly.

Do you want to add a third formatter in PyUnicode_FromFormat() which would use Py_TYPE(obj)->tp_name? I dislike Py_TYPE(obj)->tp_name, since my intent is to conform to the PEP-399: tp_name is not accessible at the Python level, only type(obj).__name__ and type(obj).__qualname__.

Or do you want to add a new formatter to type.__format__() to expose %T at the Python level, f"{type(obj).__module__}.{type(obj).__qualname__}"?

Currently, type(obj).__name__ is the most popular way to format a string. Would it break the backward compatibility to modify *existing* error messages?

serhiy-storchaka · 2018-09-09T08:33:15Z

Ok, I wrote PR 9122 to add %t format and modify %T format:

Nice!

I agree that it is easy to use _PyType_Name() directly. But using _PyType_FullName() instead of tp_name can be cumbersome because it returns a new object and needs error handling.

Or do you want to add a new formatter to type.__format__() to expose %T at the Python level, f"{type(obj).__module__}.{type(obj).__qualname__}"?

Yes, I think we need a convenient way of formatting fully qualified name that omits the module name for types in the builtins module. It is equivalent to Py_TYPE(obj)->tp_name for extension types which is the most popular way to format a type name in error messages for now.

There are several open issues for inconsistency in error messages for Python and C implementations, because the former use type(obj).__name__ or obj.__class__.__name__, and the latter use Py_TYPE(obj)->tp_name. I hope finally we will fix this.

serhiy-storchaka · 2018-09-09T08:37:07Z

in error messages

And in reprs. It is common to format a repr as "{typename}(...)" or "<{typename}(...)>". The difference is whether the typename is a short or fully qualified name.

encukou · 2018-09-11T18:34:52Z

The PEP-399 requires that C accelerator behaves exactly as Python, [...]

It does not. PEP-399 requires that that the C code must pass the same *test suite*. And error messages in particular tend to not be checked in tests.

Anyway, I don't see how that applies to replacing Py_TYPE(obj)->tp_name by %T. The real reason for that change is removing borrowed references, right?
I have not yet seen a good reason why Py_TYPE(obj) is bad. The reasons you give in https://pythoncapi.readthedocs.io/bad_api.html#borrowed-references are about tagged pointers and PyList_GetItem(), but Py_TYPE() is very different.

I don't think the reasons are strong enough to add new API to PyUnicode_FromFormat().

vstinner · 2018-09-11T22:23:28Z

New changeset 998b806 by Victor Stinner in branch 'master':
Revert "bpo-34595: Add %T format to PyUnicode_FromFormatV() (GH-9080)" (GH-9187)
998b806

vstinner · 2018-09-11T22:24:40Z

Petr Viktorin asked me to open a wider discussion about this issue on python-dev. I just reverted my first change and posted:
https://mail.python.org/pipermail/python-dev/2018-September/155150.html

vstinner · 2018-09-19T22:48:11Z

I close this issue until we can agree on an API.

vstinner added 3.8 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Sep 6, 2018

vstinner closed this as completed Sep 19, 2018

ezio-melotti transferred this issue from another repository Apr 10, 2022

vstinner mentioned this issue Jun 21, 2023

C API: Investigate how the PyTypeObject members can be removed from the public C API #105970

Closed

vstinner mentioned this issue Nov 3, 2023

PEP 737: PyUnicode_FromFormat(): Add %T format to format the type name of an object #111696

Closed

vstinner mentioned this issue Nov 16, 2023

PEP 737: gh-111696: Add type.__fully_qualified_name__ attribute #112133

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyUnicode_FromFormat(): add %T format for an object type name #78776

PyUnicode_FromFormat(): add %T format for an object type name #78776

vstinner commented Sep 6, 2018

vstinner commented Sep 6, 2018

vstinner commented Sep 6, 2018

vstinner commented Sep 6, 2018

vstinner commented Sep 6, 2018

ericvsmith commented Sep 6, 2018

vstinner commented Sep 7, 2018

serhiy-storchaka commented Sep 7, 2018

vstinner commented Sep 7, 2018

vstinner commented Sep 8, 2018

serhiy-storchaka commented Sep 8, 2018

vstinner commented Sep 9, 2018

serhiy-storchaka commented Sep 9, 2018

serhiy-storchaka commented Sep 9, 2018

encukou commented Sep 11, 2018

vstinner commented Sep 11, 2018

vstinner commented Sep 11, 2018

vstinner commented Sep 19, 2018

PyUnicode_FromFormat(): add %T format for an object type name #78776

PyUnicode_FromFormat(): add %T format for an object type name #78776

Comments

vstinner commented Sep 6, 2018

vstinner commented Sep 6, 2018

vstinner commented Sep 6, 2018

vstinner commented Sep 6, 2018

vstinner commented Sep 6, 2018

ericvsmith commented Sep 6, 2018

vstinner commented Sep 7, 2018

serhiy-storchaka commented Sep 7, 2018

vstinner commented Sep 7, 2018

vstinner commented Sep 8, 2018

serhiy-storchaka commented Sep 8, 2018

vstinner commented Sep 9, 2018

serhiy-storchaka commented Sep 9, 2018

serhiy-storchaka commented Sep 9, 2018

encukou commented Sep 11, 2018

vstinner commented Sep 11, 2018

vstinner commented Sep 11, 2018

vstinner commented Sep 19, 2018