Skip to content

Conversation

dmontagu
Copy link
Contributor

Closes #6672

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jul 14, 2023

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: 896ebcb
Status: ✅  Deploy successful!
Preview URL: https://b896c362.pydantic-docs2.pages.dev
Branch Preview URL: https://fix-generic-memory-leak.pydantic-docs2.pages.dev

View logs

Copy link
Member

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a reasonable trade off to me to xfail the pypy tests if we can fix memory leaks on all other Python versions

result = {}
for k, v in d.items():
try:
proxy = _PydanticWeakValueDictionary(ref=v)
Copy link
Contributor

@MarkusSintonen MarkusSintonen Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is an issue using WeakValueDictionary could this instead use weakref.ref for proxy instead of wrapping it to a whole _PydanticWeakValueDictionary with a single key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that works, I think that's better. I tried using a weakref.proxy (which is where that variable name comes from), but I couldn't find a way to unwrap it. I'll fiddle with it.

Copy link
Contributor

@MarkusSintonen MarkusSintonen Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this would also work by implementing MutableMapping (without overriding WeakValueDictionary and keeping __getitem__ to return the actual value instead of weakrefs from __pydantic_parent_namespace__):

class ParentNamespaceDictionary(MutableMapping[str, Any]):
    def __init__(self) -> None:
        self._data: dict[str, Union[weakref.ReferenceType[Any], Any]] = {}
        self._needs_remove_dead = False

    def __setitem__(self, key: str, value: Any) -> None:
        self._remove_dead()
        try:
            self._data[key] = weakref.ReferenceType(value, self._on_item_finalized)
        except TypeError:
            self._data[key] = value  # Eg str instances

    def __getitem__(self, key: str) -> Any:
        self._remove_dead()
        val = self._data[key]
        if not isinstance(val, weakref.ReferenceType):
            return val
        if (unwrapped := val()) is not None:
            return unwrapped
        raise KeyError(key)

    def __delitem__(self, key: str) -> None:
        self._remove_dead()
        del self._data[key]

    def __len__(self) -> int:
        self._remove_dead()
        return len(self._data)

    def __iter__(self) -> Iterator[str]:
        self._remove_dead()
        return iter([*self._data])

    def _remove_dead(self) -> None:
        if self._needs_remove_dead:
            self._needs_remove_dead = False
            dead_keys = [k for k, v in self._data.items() if isinstance(v, weakref.ReferenceType) and v() is None]
            for k in dead_keys:
                self._data.pop(k, None)

    def _on_item_finalized(self, _ref: weakref.ReferenceType) -> None:
        self._needs_remove_dead = True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see how the ParentNamespaceDict would work, but ultimately I think it's a bit more complexity than just replacing the WeakValuesDict with a custom weakref.ReferenceType subclass.

I'd be open to a PR making this change. And would be open to doing one myself if there was more of a clear benefit for it, probably in a future where we do more with the parent namespace dict. I'd rather not worry about getting this exactly right though unless that dict gets used more broadly than it is now.

If you disagree I could put in the effort to implement it this way.

@MarkusSintonen
Copy link
Contributor

MarkusSintonen commented Jul 18, 2023

@dmontagu it seems the leak is still present when model also defines a field_validator. So try to add the same test but this time change the Outer to include a field_validator:

        class Outer(PydanticModel, Generic[C]):
            c: Inner[int, C]

            @field_validator("c", mode="before")
            @classmethod
            def _validator(cls, val: Any, info: FieldValidationInfo) -> Any:
                return val

Test also fails when using model_validator.
Weird...

This seems to be actually a separate issue so Ill open it as a separate issue. Here: #6727

@dmontagu
Copy link
Contributor Author

dmontagu commented Jul 20, 2023

@MarkusSintonen it looks like the latest version of pydantic-core should have addressed the leak described in your previous comment — any chance you could confirm? (If that's annoying to do don't worry about it, I can make sure.)

I'm wondering now if maybe that PR will render this "fix" unnecessary.. @davidhewitt is there any reason I shouldn't be able to use the main branch of pydantic-core with this branch and see if this memory leak has gone away?

@MarkusSintonen
Copy link
Contributor

MarkusSintonen commented Jul 21, 2023

it looks like the latest version of pydantic-core should have addressed the leak described in your previous comment — any chance you could confirm? (If that's annoying to do don't worry about it, I can make sure.)
I'm wondering now if maybe that PR will render this "fix" unnecessary..

@dmontagu I used commit that bumps pydantic-core to run some of our tests. For me it partially fixes the issue but not fully. So it seems both are needed, the fix now in master having latest pydantic-core and this one fixing issues with __pydantic_parent_namespace__. The issues to me seem two different sources of possible leaks.

I used both this branch and above master commit in a fork. There I verified that also the __pydantic_parent_namespace__ fix is required.

Would recommend you on verifying with the unit tests that issue gets fixed. The tests in this PR should not pass in the linked master commit.

@davidhewitt
Copy link
Contributor

davidhewitt commented Jul 21, 2023

In principle you can use pydantic-core main by updating the pdm dependency here and running in CI, but I think in practice we need to tweak CI a bit to make that work effortlessly.

However in this case, no need, as the fix is already landed in main courtesy of pydantic-core==2.3.1. So you can rebase here to test.

@dmontagu dmontagu added this to the 2.0.X bugfixes milestone Jul 25, 2023
@dmontagu
Copy link
Contributor Author

This is approved and I think ready to merge. @MarkusSintonen is there any reason not to merge this now? If not, I think it can get into today's release.

@MarkusSintonen
Copy link
Contributor

This is approved and I think ready to merge. MarkusSintonen is there any reason not to merge this now? If not, I think it can get into today's release.

@dmontagu good to go by me, thanks!

parent_namespace = getattr(cls, '__pydantic_parent_namespace__', None)
if isinstance(parent_namespace, dict):
parent_namespace = unpack_lenient_weakvaluedict(parent_namespace)
Copy link
Contributor

@MarkusSintonen MarkusSintonen Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isinstance check maybe redundant as there is already the None check inside unpack_lenient_weakvaluedict

Copy link
Member

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@samuelcolvin samuelcolvin merged commit f5e3aa9 into main Jul 25, 2023
@samuelcolvin samuelcolvin deleted the fix-generic-memory-leak branch July 25, 2023 11:36
@samuelcolvin
Copy link
Member

thank you all for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Possible memory leak (again) in generic models after trying V2
5 participants