Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-101859: Add caching of types.GenericAlias objects #103541

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

sobolevn
Copy link
Member

@sobolevn sobolevn commented Apr 14, 2023

Goals of this PR:

  1. Make code using Py_GenericAlias C-function cacheable. This is similar to how we use typing.List[int] is typing.List[int]
  2. Support caching of unpacked generic alias, so *tuple[int] is *tuple[int], but clearly is not tuple[int]
  3. Do not cache direct types.GenericAlias creation. For example, right now typing._GenericAlias has this behavior: typing._GenericAlias(typing.List, (int,)) is not typing._GenericAlias(typing.List, (int,)). So, I kept the same for types.GenericAlias: types.GenericAlias(list, (int,)) is not types.GenericAlias(list, (int,)). Note: this basically means that all classes using __class_getitem__ = classmethod(GenericAlias) are not cached
  4. Keeps cache size under control: see 3.

Open problems

References are leaked 🙂

For example, run: ./python.exe -m test -R 3:3 -v test_genericalias -m test_no_chaining

test_genericalias leaked [2, 2, 2] references, sum=6
test_genericalias leaked [1, 1, 1] memory blocks, sum=3
test_genericalias failed (reference leak)

It will detect a reference leak. As far as I understand the only thing we leak is genericalias_cache state object. It is only cleared after the interpreter is shut down.

I see several options here:

  1. Add a protected classmethod for types.GenericAlias to clear the cache and use it in global test teardown with other cleanup
  2. Add a regular module state, for example as _types module
  3. Find another way to cache it ???
  4. Give up on this idea

Performance and memory testing

I haven't done any yet. But, I think I slowed things down pretty badly :(
But, as far as I understand this is a price we want to pay for less memory usage.

Final thoughts

I am bit rusty (🦀, pun intended) with C code, so any feedback is welcome.

@sobolevn
Copy link
Member Author

Yes, as you can see memory leak tests are failing:


======================================================================
FAIL: test_no_memleak (test.test_embed.MiscTests.test_no_memleak) (frozen_modules='off', stmt='pass')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_embed.py", line 1801, in test_no_memleak
    self.assertEqual(refs, 0, out)
AssertionError: 12 != 0 : [12 refs, 5 blocks]

======================================================================
FAIL: test_no_memleak (test.test_embed.MiscTests.test_no_memleak) (frozen_modules='on', stmt='pass')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_embed.py", line 1801, in test_no_memleak
    self.assertEqual(refs, 0, out)
AssertionError: 12 != 0 : [12 refs, 5 blocks]

======================================================================
FAIL: test_no_memleak (test.test_embed.MiscTests.test_no_memleak) (frozen_modules='off', stmt='import __hello__')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_embed.py", line 1801, in test_no_memleak
    self.assertEqual(refs, 0, out)
AssertionError: 12 != 0 : [12 refs, 5 blocks]

======================================================================
FAIL: test_no_memleak (test.test_embed.MiscTests.test_no_memleak) (frozen_modules='on', stmt='import __hello__')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/cpython/cpython-ro-srcdir/Lib/test/test_embed.py", line 1801, in test_no_memleak
    self.assertEqual(refs, 0, out)
AssertionError: 12 != 0 : [12 refs, 5 blocks]

----------------------------------------------------------------------
Ran 1 test in 0.137s

@sunmy2019
Copy link
Member

I think it’s due to how we detect ref leak in out tests.

I propose that a new API can be introduced to clear this GenericAlias cache.

We only guarantee it’s the same object between 2 consecutive API calls.

Normally we expect users will never call this API, and it’s exposed mainly for testing.

@sunmy2019
Copy link
Member

Since no existing codes utilize this cache method, it wont break things.

@sobolevn
Copy link
Member Author

We only guarantee it’s the same object between 2 consecutive API calls.

Sorry, I don't understand this idea.

Normally we expect users will never call this API, and it’s exposed mainly for testing.

Yes, undocumented protected methods are a good indication of this :)

@sunmy2019
Copy link
Member

We only guarantee it’s the same object between 2 consecutive API calls.

Sorry, I don't understand this idea.

means

a = list[int]

clear_cache()

b = list[int]

a is b # False (no more guarantee)

@sobolevn
Copy link
Member Author

sobolevn commented Apr 15, 2023

Yes, agreed! 👍

I am willing to go further and never give this guarantee at all.
This is just an optimization, not an API. People should not rely on it.
We are not going to document this behaviour.

@Fidget-Spinner
Copy link
Member

I propose that a new API can be introduced to clear this GenericAlias cache.

Yeah we can expose a private function in _testcapi or sys (preferrably _testcapi) that clears the generic alias cache. We do the same for the type cache on every test runner startup.

@sobolevn
Copy link
Member Author

Ok, global cached is now cleaned, but there's still a leak somewhere.

Without cache cleaned:

Ran 36 tests in 0.094s

OK
.
test_genericalias leaked [1139, 1137, 1139] references, sum=3415
test_genericalias leaked [404, 403, 404] memory blocks, sum=1211
test_genericalias failed (reference leak)

With cache cleaned:

Ran 36 tests in 0.102s

OK
.
test_genericalias leaked [875, 875, 875] references, sum=2625
test_genericalias leaked [305, 305, 305] memory blocks, sum=915
test_genericalias failed (reference leak)

Now I have no idea what is going on; I think that I need advice here @Fidget-Spinner

@Fidget-Spinner
Copy link
Member

Sorry, I won't be able to take a close look at this until my exams are over. Could you ping me in exactly 2 weeks please?

if (PyErr_Occurred()) {
goto error;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args are not released in these paths. L1022, L1025

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome catch! I've missed this completely. Manual memory management is driving me crazy :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote an extension to track down all ref count changes. Then I got here.

I am considering opening an issue to add this util into the main.

@sunmy2019
Copy link
Member

sunmy2019 commented Apr 19, 2023

If anyone is interested, my debugger demo (for x64 Linux) is available at https://github.com/sunmy2019/cpython/tree/memory-tracking-demo

Try compiling the cb.cpp with C++20 as a shared library, then link it (also the libstdc++) to the python.
g++ -g -fPIC -c -std=c++20 cb.cpp -o libcb.so -shared -O3

Prints something like

unexpected ref count change of 0x7f4e3bae2db0: 1, 2 -> 1
unexpected ref count change of 0x7f4e3c0fc170: 1, 2 -> 1
unexpected ref count change of 0x7f4e3bae11f0: 1, 2 -> 1
ref change of living 0x560ed45c13a0 (bool): 3,  808 -> 811
ref change of living 0x560ed45d26c0 (type): 6,  70 -> 76
ref change of living 0x560ed45d2e00 (type): 3,  109 -> 112
ref change of living 0x560ed4700960 (int): -2,  1000001675 -> 1000001673
ref change of living 0x560ed47009a0 (int): -1,  1000000372 -> 1000000371
ref change of living 0x560ed47009c0 (int): 1,  1000000162 -> 1000000163
ref change of living 0x560ed4700ae0 (int): 1,  1000000036 -> 1000000037
ref change of living 0x560ed4700de0 (int): 1,  1000000012 -> 1000000013
ref change of living 0x560ed67095c0 (_ctypes.PyCPointerType): 4,  0 -> 4
ref change of living 0x560ed670d7d0 (_ctypes.PyCPointerType): 4,  0 -> 4
ref change of living 0x7f4e3b93d4a0 (tuple): 2,  0 -> 2
ref change of living 0x7f4e3b96bf50 (tuple): 1,  0 -> 1
ref change of living 0x7f4e3b96eb30 (types.GenericAlias): 2,  0 -> 2
ref change of living 0x7f4e3b98d3b0 (tuple): 2,  0 -> 2
......

ref change of died   0x7f4e3baff640 (int): -1
ref change of died   0x7f4e3bafde80 (int): -2
ref change of died   0x7f4e3bae2db0 (dict_itemiterator): -2
ref change of died   0x7f4e3c0eebc0 (list): -1
ref change of died   0x7f4e3bada530 (list): -1
ref change of died   0x7f4e3bad9f40 (list): -1
ref change of died   0x7f4e3c2ab650 (str): -1
ref change of died   0x7f4e3ba75730 (getset_descriptor): -1
ref change of died   0x7f4e3ba77770 (getset_descriptor): -1
......

ref change of type bool: 3
ref change of type dict: 2
ref change of type dict_itemiterator: -2
ref change of type tuple: 9
ref change of type type: 9
ref change of type types.GenericAlias: 6
total ref change 27
ref total 237877

@sunmy2019
Copy link
Member

I haven't looked closely into this, but support.run_in_subinterp is leaking something.

Several tests with it failed. run_in_subinterp does call _PyGenericAlias_Fini, which has a known problem.

_PyGenericAlias_Fini(PyInterpreterState *interp) {
if (interp->genericalias_cache != NULL) {
PyObject *cache = interp->genericalias_cache;
PyObject_GC_Del(cache);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My debugger tells me that PyObject_GC_Del will leave sys.gettotalrefcount() unchanged. It is responsible for some leaks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can clear the dict in the early stages of fini?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. Shouldn't the dict's refcount be 1 at this point. So a possible way is to call Py_DECREF on the dict and let it handle its own deallocation?

Copy link
Member

@sunmy2019 sunmy2019 Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused. Shouldn't the dict's refcount be 1 at this point. So a possible way is to call Py_DECREF on the dict and let it handle its own deallocation?

It needs a fully functional interpreter state to do GC. But we are shutting the interpreter down at this moment, the interpreter state may be broken.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs a fully functional interpreter state to do GC. But we are shutting the interpreter down at this moment, the interpreter's state may be broken.

That is my guess of what @sobolevn was thinking. Seems ok to use Py_DECREF(cache) here since it is called before _PyDict_Fini.

@sunmy2019
Copy link
Member

When running in sub-intepreter, in call to

run_in_subinterp
--> Py_EndInterpreter
   --> finalize_interp_clear
      --> _PyInterpreterState_Clear
         --> interpreter_clear
            --> _PyGC_Fini
-----------------> gc_fini_untrack

cpython/Modules/gcmodule.c

Lines 2154 to 2168 in 6be7aee

static void
gc_fini_untrack(PyGC_Head *list)
{
PyGC_Head *gc;
for (gc = GC_NEXT(list); gc != list; gc = GC_NEXT(list)) {
PyObject *op = FROM_GC(gc);
_PyObject_GC_UNTRACK(op);
// gh-92036: If a deallocator function expect the object to be tracked
// by the GC (ex: func_dealloc()), it can crash if called on an object
// which is no longer tracked by the GC. Leak one strong reference on
// purpose so the object is never deleted and its deallocator is not
// called.
Py_INCREF(op);
}
}

The ref count is leaked here!

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Apr 20, 2023

The main thing we want to verify:

  • Slowdown (if any, and how much?)

We already know it should save memory (if you want to test how much it saves, that's also welcome!)

For a workload, you can run the test_typing or test_genericalias workload using the pyperf runner. I know this is a really tall order to ask from both of you. So if no one has the time to do this, I can do it in 2 weeks.

@sobolevn
Copy link
Member Author

I am already working on perf stats 😉

@AlexWaygood
Copy link
Member

@samuelcolvin raised concerns about how this would interact with pydantic today at the typing summit, so marking as "do not merge" until the pydantic folks have had a chance to give their thoughts on the issue thread :)

@sunmy2019
Copy link
Member

@samuelcolvin raised concerns about how this would interact with pydantic today at the typing summit, so marking as "do not merge" until the pydantic folks have had a chance to give their thoughts on the issue thread :)

Do they have a timeline?

@samuelcolvin
Copy link
Sponsor Contributor

As explained at the typing summit yesterday - it's really important to us that order of Unions is maintained.

While in the static typing context, union order does not matter, in other contexts it really does:

  • if you're using unions to inform data coercion (as pydantic and other libraries do), the order of unions members should logically inform the order in which you try to coerce data types.
  • Similarly, if you're using unions for documentation (one of type hints other prime uses, and one that PEP 649 has bent over backwards to support), being able to repr the type hint as closely as possible to the code is important.

In both contexts X | Y != Y | X.

I think we absolutely fine with caching, but it would be extremely problematic if the current broken behaviour of typing.List et. al. was replicated on list etc.

Namely:

from typing import List, Union

print(list[Union[int, float]])
#> list[typing.Union[int, float]]
print(list[Union[float, int]])
#> list[typing.Union[float, int]]  correct!

print(List[Union[int, float]])
#> typing.List[typing.Union[int, float]]
print(List[Union[float, int]])
#> typing.List[typing.Union[int, float]]  broken!

If you need further explanation of why union order is important in the runtime use of type hints in particular - I can give it, but I hope this explanation is sufficient.

I'm very happy with caching provided it doesn't mistakenly assuming X | Y == Y | X.

@sobolevn
Copy link
Member Author

@samuelcolvin since I am a grateful user of pydantic myself, I surely understand the problem.

Here's the minimal demo:

>>> import pydantic
>>> class My(pydantic.BaseModel):
...    x: int | str
...    y: str | int
... 
>>> My(x='1', y='1')
My(x=1, y='1')

(notice that x was coerced to int, because int was the first union item, while y was left as str, because str was the first union item)

So others can see why is that important to runtime type-checkers.
But, since UnionType does not inherit from GenericAlias objects, this is not related.

>>> type(int | str).__mro__
(<class 'types.UnionType'>, <class 'object'>)

C-implementation of UnionType also does not use any Py_GenericAlias APIs that we cache here. We can add UnionType caching later if required.

I've added several test cases to be sure that X | Y is not Y | X 👍
Do you have any other feedback on this? :)

@Fidget-Spinner
Copy link
Member

Isn't it still possible we could end up with broken behavior? Note that list[int | str] and list[str | int] hash and equal the same thing. So that means in the dictionary cache, they would be treated the same.

For example, on Python 3.11

>>> d = {}
>>> d[list[int | str]] = 1
>>> d
{list[int | str]: 1}
>>> d[list[str | int]] = 2
>>> d
{list[int | str]: 2}

Does the cache cause the same problems? I'm not sure but I'd imagine it would.

@Fidget-Spinner
Copy link
Member

The main problem is that unions with different order do not hash to different things. This is however in-line with the spec, as unions are supposed to be unordered, much like frozenset.

@samuelcolvin
Copy link
Sponsor Contributor

The main problem is that unions with different order do not hash to different things. This is however in-line with the spec, as unions are supposed to be unordered, much like frozenset.

Yup - that's the kernel of this problem, and something I think we need to get changed. Given the comments of the SC yesterday about legitimate uses of type hints, I think they would be amenable to a change. Question is, what is required to get this changed? Does it need a PEP?

@samuelcolvin
Copy link
Sponsor Contributor

cc @dmontagu @adriangb in case they're interested.

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Apr 21, 2023

Question is, what is required to get this changed? Does it need a PEP?

My view as a (someone outdated) maintainer of typing is that that would need a PEP. The main reason is that it is modifying something that PEP 604 spells out very clearly. Note that PEP 604 says

The order of the items in the Union should not matter for equality.

And it gives some examples. Note that __hash__ docs says

The only required property is that objects which compare equal have the same hash value

https://docs.python.org/3/reference/datamodel.html#object.__hash__

Which means PEP 604 through specifying equality, is also indirectly requiring that union order must not affect its hash.

Modifying portions of an accepted PEP usually require another PEP. Along with a deprecation period following Python's deprecation policy (currently minimum 2 cycles).

I don't think this PR is the right place to discuss this. Could you open a thread on Discourse please? Thanks.

@AlexWaygood
Copy link
Member

I agree with @Fidget-Spinner that it probably requires a PEP to reverse behaviour clearly articulated in an accepted PEP :/

A https://discuss.python.org/c/ideas/6 thread or an issue at https://github.com/python/typing would probably be the best next step, to float the idea and try to get consensus. You'll probably reach more of a general audience with the former venue, but more folks from the typing community with the latter venue (I found it worked pretty well for getting to a consensus in python/typing#1363).

@adriangb
Copy link
Contributor

Note that PEP 604 says

The order of the items in the Union should not matter for equality.

I do wonder how much that was target at runtime behavior vs. what static type checkers should do.

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Apr 21, 2023

I do wonder how much that was target at runtime behavior vs. what static type checkers should do.

Ah yes I thought so too, but the example code from the PEP makes it extremely clear it is also targetting runtime.

(lifted from PEP 604)

The order of the items in the Union should not matter for equality.

(int | str) == (str | int)
(int | str | float) == typing.Union[str, float, int]

See https://peps.python.org/pep-0604/#simplified-syntax

@adriangb
Copy link
Contributor

I did see the example code, but I read that as more like pseudocode or "communicating complex ideas via a simple code example" more so than "this should literally execute and give this result"

@AlexWaygood
Copy link
Member

AlexWaygood commented Apr 21, 2023

Whether or not this partially reverses a PEP, I still think it would be best to open a thread at discuss.python.org or an issue at python/typing, so that this potential change to the behavior at runtime can be discussed in a more public forum

@Fidget-Spinner
Copy link
Member

Hmm, I see your point @adriangb. I'll go ahead and open the Discourse post so we can continue discussions there.

@sobolevn
Copy link
Member Author

sobolevn commented Apr 21, 2023

Yes, indeed list[int | str] is affected:

>>> list[int | str] is list[str | int]
True
>>> list[str | int].__args__[0].__args__
(<class 'int'>, <class 'str'>)

@hmc-cs-mdrissi
Copy link
Contributor

hmc-cs-mdrissi commented Apr 24, 2023

I have a different Union runtime use case where currently equality is True but I would like to be able to distinguish the two and not have them cached the same.

>>> from typing import Union
>>> A = Union[int, list[int]]
>>> Union[A, str] == Union[int, list[int], str] # True

I do have dictionaries that map from a type to some serializer/deserializer. For certain types I write parsers for specific Union and want to be able to recognize that Union when it's present in another Union. I'd guess documentation use case may also keep track of union alias and be able to output back A (especially if it's lengthy union), but if it gets collapsed automatically would lose that.

Would this type caching affect that?

edit: I just noticed that Union[A, str].__args__[0] != A even before this pr. So I'm unsure how my current code works. Maybe I just never hit case where I insert into my dictionaries both Union[A, str] + collapsed version and have a bug I've just missed involving this case. Based on that I suspect type caching to make is True for nesting won't really affect me. It would be nice to be able to distinguish nested cases though but that's bordering separate issue/question.

@adriangb
Copy link
Contributor

@sobolevn sorry if this is in the PR, I can’t find it. Are there tests for interaction between the cache and subclasses of GenericAlias?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants