Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-113190: Reenable non-debug interned string cleanup #113601

Merged
merged 19 commits into from
Aug 15, 2024

Conversation

eduardo-elizondo
Copy link
Contributor

@eduardo-elizondo eduardo-elizondo commented Dec 31, 2023

In GH-19474 we introduced immortalized interned strings. However, at the time, we didn't have a strict guarantee that a runtime shutdown (i.e Py_Finalize()) would completely cleanup all the allocated PyObjects (i.e a simple example of ./python -X showrefcount -c 'import itertools' would show leaks).

Today, these leaks have been largely fixed and there have been stricter guarantees into how obmalloc and the interpreter state works after a runtime shutdown. Given this, it should be safe to re-enable the interned string cleanup and revert the structseq test that had issues with this leaks before. This test should show the correctness of this code now.

@Eclips4 Eclips4 changed the title [DONOTMETGE] gh-113190: Reenable non-debug interned string cleanup [DO NOT MERGE] gh-113190: Reenable non-debug interned string cleanup Dec 31, 2023
@Eclips4
Copy link
Member

Eclips4 commented Dec 31, 2023

I added DO-NOT-MERGE label for you :)

@Eclips4 Eclips4 changed the title [DO NOT MERGE] gh-113190: Reenable non-debug interned string cleanup gh-113190: Reenable non-debug interned string cleanup Dec 31, 2023
@eduardo-elizondo
Copy link
Contributor Author

eduardo-elizondo commented Dec 31, 2023

I added DO-NOT-MERGE label for you :)

Thanks!! Will ping you once all the tests are passing to move this into safe-to-merge mode 🙂

@eduardo-elizondo
Copy link
Contributor Author

eduardo-elizondo commented Dec 31, 2023

@Eclips4 tests are passing! Could you help me by adding the "skip news" label (since it's probably not required) and removing the "DO-NOT-MERGE" label?

Thanks!!

@Eclips4 Eclips4 added skip news interpreter-core (Objects, Python, Grammar, and Parser dirs) needs backport to 3.12 bug and security fixes and removed DO-NOT-MERGE labels Dec 31, 2023
@eduardo-elizondo
Copy link
Contributor Author

cc @nascheme to take a look when you have a chance!

@nascheme
Copy link
Member

nascheme commented Jan 4, 2024

It would be nice to do a better cleanup but I'm concerned about how safe this actually is. Since we don't have an accurate refcnt for these strings, we are assuming that no one is actually still using them. It seems possible that some extensions or programs embedding Python could hang on to interned strings after Py_Finalize(). That would risk introducing "use-after-free" bugs. In the worst case, that could turn into an arbitrary code execution security bug.

Maybe I'm being too paranoid? I don't know how we could actually fix this though. Keeping hold of and using memory allocated by the Python runtime after Py_Finalize() surely puts you an very shaky ground. Previously, the obmalloc state was global to the process and you could keep using memory allocated from there after finalizing. With sub-interpeters potentially having their own obmalloc state, that's likely not okay.

My gut feeling is we should proceed cautiously, even though we would like to fix the leak. Maybe we could enable this for alphas and then turn it off again for release? Maybe we need to wait until we get python -X showrefcount showing zero for all or nearly all built-in and well known extensions? Seems like a tall order though.

@eduardo-elizondo
Copy link
Contributor Author

eduardo-elizondo commented Jan 5, 2024

@nascheme I agree with what you mentioned above and I was paranoid of that scenario too, hence the disabling of the code.

Purely going off from the docs, I don't think there's anywhere where Python provides an explicit expectation of what the behavior is of memory that survives a call to Py_Finalize. Furthermore, in the C-API it is specified that Py_Finalize will: "Undo all initializations made by Py_Initialize() and subsequent use of Python/C API functions". This gives an an implicit assumption that holding onto these interned strings after Py_Finalize is an undefined behavior.

Perhaps we should be more explicit about this in the docs (maybe in the "Bugs and Caveats" section of Py_Finalize)? i.e python can't guarantee correctness for any memory allocated during a python initialization and used in any subsequent initializations. Thoughts?

Regardless, I completely agree that we should still be careful about putting this out there. Stating with alphas sounds like a good idea to get some early signal. Let me know what's the best way to only enable this only for alphas.

@eduardo-elizondo
Copy link
Contributor Author

eduardo-elizondo commented Jan 5, 2024

Maybe we need to wait until we get python -X showrefcount showing zero for all or nearly all built-in and well known extensions?

I think we are already there? Here's a quick test program that I ran:

./python -X showrefcount -c "
import sys
import importlib
for mod in sys.modules.keys():
  importlib.import_module(mod)
"
[0 refs, 0 blocks]

@vstinner
Copy link
Member

vstinner commented Jan 8, 2024

Maybe we need to wait until we get python -X showrefcount showing zero for all or nearly all built-in and well known extensions?

It's mostly done for one year: https://mail.python.org/archives/list/python-dev@python.org/thread/E4C6TDNVDPDNNP73HTGHN5W42LGAE22F/ "Mostly" means: some remaining stdlib C extensions still implement their own static types which are not cleared by _PyTypes_FiniTypes(), and/or don't use the multi-phase initialization API yet.

test_embed checks that python -c pass does not leak memory at Python exit: see MiscTests.test_no_memleak().

Purely going off from the docs, I don't think there's anywhere where Python provides an explicit expectation of what the behavior is of memory that survives a call to Py_Finalize.

This change is a backward incompatible. I suggest to just document well the change in What's New in Python 3.13:

  • Explain how to qualify if a crash is related to this issue,
  • Document how to update affected code.

In short, a program must not keep references to any Python object between Py_Finalize() and Py_Initialize(). IMO the general rule is calling Py_Finalize() converts all Python object pointers to dangling pointers. There are some very specific exceptions which should be listed with conditions, such as "only on CPython and it's an implementation detail which can change between two Python versions".

In general, I suggest to reset all pointers to Python objects at each Py_Initialize() call. Or said differently: don't store any pointer to Python objects after Py_Finalize() :-)

Examples of exceptions:

  • Immortal objects
  • Static types (built-in immortal objects are marked as immortal)
  • Singletons (built-in singleton objects are marked as immortal) such as: empty string, empty tuple, small integers (-5..256).

@eduardo-elizondo
Copy link
Contributor Author

eduardo-elizondo commented Jan 8, 2024

This change is a backward incompatible. I suggest to just document well the change in What's New in Python 3.13

For sure, I can update the PR with the details with what you mentioned here!

It's mostly done for one year

Given this guarantee, I think it's safe to assume that if we have any Python object leak it's a user induced leak, right? i.e. the runtime itself will not cause incorrect behavior. I will include some wording around this so that users can isolate the surface are when looking for issues (i.e look at the third-party extensions not at the runtime).

In short, a program must not keep references to any Python object between Py_Finalize() and Py_Initialize()

That's my take too - though it's not specified in the docs! Any thoughts of also updating the Py_Finalize/Py_Initialize documentation as part of this PR to include stricter language like this?

@vstinner
Copy link
Member

vstinner commented Jan 8, 2024

This change is a backward incompatible. I suggest to just document well the change in What's New in Python 3.13

I looked again at the code. It has a complicated history:

  • In Python 3.9 and older, interned strings were never deleted. Only special build for Valgrind and Purity cleared them at exit.
  • In Python 3.10 and 3.11, interned strings are always deleted at exit.
  • In Python 3.12, interned strings are deleted only if Python is built in debug mode: they are not deleted in release mode.

If there are applications relying on interned strings to survive between multiple Py_Initialize()/Py_Finalize() cycles, they should already be affected by Python 3.10 and 3.11.

By the way, Python 3.11 added _PyTypes_FiniTypes() to "clear" static types in Py_Finalize().

@bedevere-app
Copy link

bedevere-app bot commented Jun 19, 2024

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@encukou encukou dismissed their stale review July 19, 2024 11:47

The conflicting PR is merged.

@encukou
Copy link
Member

encukou commented Aug 2, 2024

@eduardo-elizondo Do you want to finish the docs here, or should I take over?

Copy link

cpython-cla-bot bot commented Aug 11, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

@eduardo-elizondo
Copy link
Contributor Author

@encukou updated and addressed the comments, this should be ready to go!

Copy link
Member

@encukou encukou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thank you! I'll run the refleak buildbots as there might have been regressions.

A style nit:

Doc/whatsnew/3.14.rst Outdated Show resolved Hide resolved
@encukou encukou added the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Aug 12, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @encukou for commit 6aee7f5 🤖

If you want to schedule another build, you need to add the 🔨 test-with-refleak-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-refleak-buildbots Test PR w/ refleak buildbots; report in status section label Aug 12, 2024
@encukou encukou enabled auto-merge (squash) August 15, 2024 11:28
@encukou encukou merged commit 3203a74 into python:main Aug 15, 2024
37 checks passed
@miss-islington-app
Copy link

Thanks @eduardo-elizondo for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

@miss-islington-app
Copy link

Sorry, @eduardo-elizondo and @encukou, I could not cleanly backport this to 3.13 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 3203a7412977b8da3aba2770308136a37f48c927 3.13

@miss-islington-app
Copy link

Sorry, @eduardo-elizondo and @encukou, I could not cleanly backport this to 3.12 due to a conflict.
Please backport using cherry_picker on command line.

cherry_picker 3203a7412977b8da3aba2770308136a37f48c927 3.12

@eduardo-elizondo
Copy link
Contributor Author

Thanks @encukou !!

@neonene
Copy link
Contributor

neonene commented Oct 14, 2024

Is there any chance this will be backported to 3.13.1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes skip news
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants