Skip to content

Conversation

vstinner
Copy link
Member

@vstinner vstinner commented Oct 2, 2025

  • Move PyUnicode_Format() implementation from unicodeobject.c to unicode_format.c.
  • Replace unicode_modifiable() with _PyUnicode_IsModifiable()
  • Add empty lines to have two empty lines between functions.

@vstinner
Copy link
Member Author

vstinner commented Oct 2, 2025

@picnixz picnixz self-requested a review October 2, 2025 11:09
Comment on lines +11 to +52
#define MAX_UNICODE _Py_MAX_UNICODE
#define ensure_unicode _PyUnicode_EnsureUnicode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I understand that it's to avoid a refactoring in this file? do you plan to remove those defines later? (I think we could do it but in another PR)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't plan to remove those defines. I don't think that it's worth it.

#define MAX_UNICODE _Py_MAX_UNICODE
#define ensure_unicode _PyUnicode_EnsureUnicode

struct unicode_formatter_t {
Copy link
Member

@picnixz picnixz Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I assume that, starting from here, it's just a huge copy-paste, possibly with some names changed due to some private functions that are now _Py* something? I can otherwise run a diff myself but I wanted to be ask (if you say "no", I'll just run the diff locally)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made few changes:

  • Replace unicode_modifiable() with _PyUnicode_IsModifiable()
  • Replace unicode_fill() with _PyUnicode_Fill()
  • Add empty lines to have two empty lines between functions.

Copy link
Member

@StanFromIreland StanFromIreland Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good time to consider PEP 7'ing the code, it would be nice (I would volunteer to do it :-). IIRC the main argument was that it'll ruin the blame, but that's not an issue any more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The split to a new file should have as few changes as possible so Git can calculate cross-file moves, formatting changes can be left to their own PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to reduce changes to the bare minimum. If someone wants to apply coding style changes, I would suggest to make a separated PR. I prefer to stick to just "move code around".

@serhiy-storchaka
Copy link
Member

Please take your time. Some people will only be able to participate in the discussion on weekends.

There are three formatters: str.__mod__() (PyUnicode_Format()), str.format() and PyUnicode_FromFormat(). The code for str.format() is scattered between Objects/stringlib/unicode_format.h and Python/formatter_unicode.c (3000 lines total). The code that you moved here is for str.__mod__() (1000 lines). How many lines are for PyUnicode_FromFormat()?

Do we want all three formatters in one file or in three separate files? In any case I would start with merging Objects/stringlib/unicode_format.h and Python/formatter_unicode.c into a single file.

@vstinner
Copy link
Member Author

vstinner commented Oct 2, 2025

How many lines are for PyUnicode_FromFormat()?

In PR gh-139354, PyUnicode_FromFormat() takes 657 lines.

Do we want all three formatters in one file or in three separate files?

I propose to have one file per formatter. IMO it's already tedious to navigate in a 1000 lines file (unicode_format.c).

In any case I would start with merging Objects/stringlib/unicode_format.h and Python/formatter_unicode.c into a single file.

I started with PyUnicode_Format() since it's easy to split it. It has few "dependencies" to other unicodeobject.c functions.

I can create another PR for Python/formatter_unicode.c.

@serhiy-storchaka
Copy link
Member

Then what will be the names for three files?

Some code can be shared between formatters, so we may add the forth file for shared code and the fifth file for the header.

@vstinner
Copy link
Member Author

vstinner commented Oct 2, 2025

Then what will be the names for three files?

In PR #139354, there are:

  • Objects/unicode_fromformat.c: PyUnicode_FromFormat()
  • Objects/unicode_format.c: PyUnicode_Format()

For the 3rd one (Python/formatter_unicode.c), it can be Objects/unicode_formatter.c or another name. Or it can stay in Python/formatter_unicode.c.

Some code can be shared between formatters, so we may add the forth file for shared code and the fifth file for the header.

So far, I didn't find any code which is shared between these 3 functions. If later we discover shared code, _PyUnicode functions can be added and code can stay where it is (no need to add an helper C file), and declared in pycore_unicodeobject.h.

* Move PyUnicode_Format() implementation from unicodeobject.c
  to unicode_format.c.
* Replace unicode_modifiable() with _PyUnicode_IsModifiable()
* Add empty lines to have two empty lines between functions.
@vstinner vstinner force-pushed the split_unicode_format branch from f099ed2 to e7fab18 Compare October 8, 2025 14:03
@vstinner
Copy link
Member Author

vstinner commented Oct 8, 2025

I rebased the PR on the main branch: on top of the "gh-139353: Rename formatter_unicode.c to unicode_formatter.c (#139723)" change.

@vstinner
Copy link
Member Author

vstinner commented Oct 10, 2025

@serhiy-storchaka: So what do you think about adding Objects/unicode_format.c?

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 👍

@vstinner vstinner merged commit 4c11971 into python:main Oct 10, 2025
47 checks passed
@vstinner vstinner deleted the split_unicode_format branch October 10, 2025 10:53
@vstinner
Copy link
Member Author

Merged, thanks for the review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants