Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plone/ZCA memory leak? #3829

Closed
mauritsvanrees opened this issue Jul 25, 2023 · 24 comments
Closed

Plone/ZCA memory leak? #3829

mauritsvanrees opened this issue Jul 25, 2023 · 24 comments

Comments

@mauritsvanrees
Copy link
Sponsor Member

TLDR: There seems to be a memory problem in the Zope Component Architecture.

Background:

  • A customer that we moved from Plone 5.1 to Plone 5.2 on Python 3.8 in January, experienced that the memory usage of Plone kept growing and they needed to restart. Restart was handled by systemd after a memory limit was hit. Usually a few times per week.
  • The sysadmin changed the setup to start Plone with jemalloc: http://jemalloc.net/. This helped, and the memory usage stabilized. jemalloc should help when memory fragmentation was the problem.
  • We switched to Plone 6.0.6 on Python 3.10 in June, and the problems are back. This is a multilingual site, with 13 languages.
  • My personal site is experiencing memory growth as well. Plone 6.0, Python 3.11. My brother had to restart our shared server, or the Plone process, during my vacation because my Plone site used too much memory. Now after 17 days the memory has grown to 1.1 GB. This seems strange: the Data.fs is 30MB, and the blobstorage 59 MB. This is Plone 6.0 Classic UI, without any add-ons, and even without any "core add-ons" activated. It does support two languages, English and Dutch, but no plone.app.multilingual setup.

Reproduce:

This weekend we noticed that a lot of traffic at the customer site was for one bot, and most of this went to the search pages. So I started hammering my local setup in a similar way. For every language, get the search page, and get the next 50 batched pages. Do this 100 times:

for j in $(seq 100); do
    for start in $(seq 50); do
        echo $j - $start
        for lang in en de es fr it pl fr-ca ar zh ja ru; do
            curl -s -o search.html "http://localhost:8080/Plone/${lang}/@@search?b_start%3Aint=${start}0&SearchableText="
        done
    done
done

The search url above is shortened for clarity. In reality I pasted the really long line that you get when you go to the second result page. All these languages contain at least 500 search result pages.

With this, I did notice that memory kept creeping up.
I radically simplified the search page to display almost nothing and basically only do the correct catalog query, and even then memory increased.

Investigation in Plone:

I went to the Zope debug control panel on the local site that I had hammered: http://localhost:8080/Control_Panel/DebugInfo/manage_main
It showed this:
Screenshot 2023-07-25 at 12 35 53

(When I look at the live site, the top two are 400,000 DateTime objects, and then 84,000 ZCatalog Benchmarks.)

Why are the counts for NegotiateLanguage and DexterityPublishTraverse so high? I focused on the first one.
It is this class. Not an instance of the class, but the actual class. This is registered as adapter. It is looked up as multi adapter in setLanguageBinding.

Investigation in ZCA:

Let's get some numbers. Edit zope.interface/adapter.py and add import sys at the top. Go down to the queryMultiAdapter method and add a few lines before returning the result:

if (
    factory.__name__ in ("NegotiateLanguage", "DexterityPublishTraverse")
    or sys.getrefcount(factory) > 100
):
    print(f"Factory '{factory.__name__}' refcount={sys.getrefcount(factory)}")

Careful: typos may be silently ignored, depending on who calls this code.

This shows the reference count to the factory that was looked up. See https://stackify.com/python-garbage-collection/ for an article on Python garbage collection that explains sys.getrefcount.

Restart Plone, and browse the site or use the curl command, and sure, increasing reference counts are printed. With my big curl command and my hacked up search page, I only see the two that I was looking at:

Factory 'NegotiateLanguage' refcount=13
Factory 'NegotiateLanguage' refcount=14
Factory 'NegotiateLanguage' refcount=14
Factory 'NegotiateLanguage' refcount=15
Factory 'DexterityPublishTraverse' refcount=13
Factory 'NegotiateLanguage' refcount=16
Factory 'DexterityPublishTraverse' refcount=14
Factory 'NegotiateLanguage' refcount=16
Factory 'DexterityPublishTraverse' refcount=14
Factory 'NegotiateLanguage' refcount=17
Factory 'DexterityPublishTraverse' refcount=15
Factory 'NegotiateLanguage' refcount=18
Factory 'DexterityPublishTraverse' refcount=16
Factory 'NegotiateLanguage' refcount=18
Factory 'DexterityPublishTraverse' refcount=16
Factory 'NegotiateLanguage' refcount=19
Factory 'DexterityPublishTraverse' refcount=17
Factory 'NegotiateLanguage' refcount=20
Factory 'DexterityPublishTraverse' refcount=18
Factory 'NegotiateLanguage' refcount=20
Factory 'DexterityPublishTraverse' refcount=18

I still had a copy of this customer site on Plone 5.2 as well, and tried the same with curl or just browsing and quickly reloading. NegotiateLanguage got a refcount of 28 and stayed there. DexterityPublishTraverse may not have existed then. But others showed up, for example two very often used ones:

Factory 'PortalState' refcount=1445
Factory 'ContextState' refcount=2080

Sometimes counts stayed the same, or even decreased a bit. Maybe if you make the get/query multi adapter call several times in the same request, it stays the same.

Now what?

So now the question is: is this a memory leak deep down in the Zope Component Architecture?

And how do we solve this?

Working theory:

  • queryMultiAdapter gives you a new factory for each request.
  • The old factory is never cleaned up from some component cache.
@davisagli
Copy link
Sponsor Member

@mauritsvanrees Maybe https://docs.python.org/3/library/gc.html#gc.get_referrers can be helpful to check what objects hold the references?

https://github.com/bloomberg/memray could also be helpful.

@davisagli
Copy link
Sponsor Member

It looks like instances of NegotiateLanguage store a reference to the request, and also get stored in an attribute of the request. That sounds suspicious for creating a cycle that can't be garbage collected, but it should be cleaned up when the request's clear method is called at the end of processing the request, so this might not be relevant.

@mauritsvanrees
Copy link
Sponsor Member Author

@davisagli wrote:

Maybe https://docs.python.org/3/library/gc.html#gc.get_referrers can be helpful to check what objects hold the references?

In Plone 6.0 coredev, Python 3.11, I created a Classic UI Plone site. Stopped the site, added a breakpoint in the adapter.py mentioned above. I get this once the reference count gets over 40, edited a bit for more clarity. There are twelve referrers. Most seem fine and normal: of course there will be references to this class.
It is the last referrer that is abnormal and keeps growing.

(Pdb++) factory
<class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>
(Pdb++) referrers = gc.get_referrers(factory)
(Pdb++) len(referrers)
12
(Pdb++) pp referrers
[<plone.i18n.negotiate.negotiate.NegotiateLanguage object at 0x11317b510>,

####################

 {'__class__': <class 'zope.interface.adapter.AdapterLookupBase'>,
  'default': None,
  'factory': <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  'name': '',
  'objects': (<PloneSite at /Plone>, <WSGIRequest, URL=http://localhost:8080/Plone>),
  'provided': <InterfaceClass plone.i18n.interfaces.INegotiateLanguage>,
  'referrers': <Recursion on list with id=4636053568>,
  'result': <plone.i18n.negotiate.negotiate.NegotiateLanguage object at 0x11317b510>,
  'self': <zope.interface.adapter.VerifyingAdapterLookup object at 0x114556cf0>},

####################

 {'ILanguageUtility': <InterfaceClass plone.i18n.interfaces.ILanguageUtility>,
  'INegotiateLanguage': <InterfaceClass plone.i18n.interfaces.INegotiateLanguage>,
  'NegotiateLanguage': <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  '__builtins__': {'ArithmeticError': <class 'ArithmeticError'>,
...
                   'zip': <class 'zip'>},
  '__cached__': '/Users/maurits/shared-eggs/cp311/plone.i18n-5.0.1-py3.11.egg/plone/i18n/negotiate/__pycache__/negotiate.cpython-311.pyc',
  '__doc__': None,
  '__file__': '/Users/maurits/shared-eggs/cp311/plone.i18n-5.0.1-py3.11.egg/plone/i18n/negotiate/negotiate.py',
  '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x113d128d0>,
  '__name__': 'plone.i18n.negotiate.negotiate',
  '__package__': 'plone.i18n.negotiate',
  '__spec__': ModuleSpec(name='plone.i18n.negotiate.negotiate', loader=<_frozen_importlib_external.SourceFileLoader object at 0x113d128d0>, origin='/Users/maurits/shared-eggs/cp311/plone.i18n-5.0.1-py3.11.egg/plone/i18n/negotiate/negotiate.py'),
  'getUtility': <function getUtility at 0x10ff88900>,
  'implementer': <class 'zope.interface.declarations.implementer'>},

####################

 (<class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>, <class 'object'>),

####################

 <attribute '__dict__' of 'NegotiateLanguage' objects>,

####################

 <attribute '__weakref__' of 'NegotiateLanguage' objects>,

####################

 directlyProvides(NegotiateLanguage),

####################

 (<class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>, <class 'type'>),
 classImplements(NegotiateLanguage, INegotiateLanguage),

####################

 (<class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>, File "/Users/maurits/shared-eggs/cp311/plone.i18n-5.0.1-py3.11.egg/plone/i18n/configure.zcml", line 9.2-13.8),

####################

 {'': <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>},

####################

 {(classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, IDefaultBrowserLayer)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, plone.app.z3cform.interfaces.IPloneFormLayer, plone.app.discussion.interfaces.IDiscussionLayer, plone.app.theming.interfaces.IThemingLayer, plone.app.contenttypes.interfaces.IPloneAppContenttypesLayer, plone.app.event.interfaces.IBrowserLayer, IDefaultBrowserLayer, ISubRequest)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, plone.app.z3cform.interfaces.IPloneFormLayer, plone.app.discussion.interfaces.IDiscussionLayer, plone.app.theming.interfaces.IThemingLayer, plone.app.contenttypes.interfaces.IPloneAppContenttypesLayer, plone.app.event.interfaces.IBrowserLayer, IDefaultBrowserLayer, ISubRequest)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, IDefaultBrowserLayer)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, plone.app.z3cform.interfaces.IPloneFormLayer, plone.app.discussion.interfaces.IDiscussionLayer, plone.app.theming.interfaces.IThemingLayer, plone.app.contenttypes.interfaces.IPloneAppContenttypesLayer, plone.app.event.interfaces.IBrowserLayer, IDefaultBrowserLayer, ISubRequest)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, IDefaultBrowserLayer)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, IDefaultBrowserLayer)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, plone.app.z3cform.interfaces.IPloneFormLayer, plone.app.discussion.interfaces.IDiscussionLayer, plone.app.theming.interfaces.IThemingLayer, plone.app.contenttypes.interfaces.IPloneAppContenttypesLayer, plone.app.event.interfaces.IBrowserLayer, IDefaultBrowserLayer, ISubRequest)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, IDefaultBrowserLayer)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, plone.app.z3cform.interfaces.IPloneFormLayer, plone.app.discussion.interfaces.IDiscussionLayer, plone.app.theming.interfaces.IThemingLayer, plone.app.contenttypes.interfaces.IPloneAppContenttypesLayer, plone.app.event.interfaces.IBrowserLayer, IDefaultBrowserLayer, ISubRequest)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, IDefaultBrowserLayer)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, plone.app.z3cform.interfaces.IPloneFormLayer, plone.app.discussion.interfaces.IDiscussionLayer, plone.app.theming.interfaces.IThemingLayer, plone.app.contenttypes.interfaces.IPloneAppContenttypesLayer, plone.app.event.interfaces.IBrowserLayer, IDefaultBrowserLayer, ISubRequest)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, IDefaultBrowserLayer)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, plone.app.z3cform.interfaces.IPloneFormLayer, plone.app.discussion.interfaces.IDiscussionLayer, plone.app.theming.interfaces.IThemingLayer, plone.app.contenttypes.interfaces.IPloneAppContenttypesLayer, plone.app.event.interfaces.IBrowserLayer, IDefaultBrowserLayer, ISubRequest)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, IDefaultBrowserLayer)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, plone.app.z3cform.interfaces.IPloneFormLayer, plone.app.discussion.interfaces.IDiscussionLayer, plone.app.theming.interfaces.IThemingLayer, plone.app.contenttypes.interfaces.IPloneAppContenttypesLayer, plone.app.event.interfaces.IBrowserLayer, IDefaultBrowserLayer, ISubRequest)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, IDefaultBrowserLayer)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, plone.app.z3cform.interfaces.IPloneFormLayer, plone.app.discussion.interfaces.IDiscussionLayer, plone.app.theming.interfaces.IThemingLayer, plone.app.contenttypes.interfaces.IPloneAppContenttypesLayer, plone.app.event.interfaces.IBrowserLayer, IDefaultBrowserLayer, ISubRequest)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, plone.app.z3cform.interfaces.IPloneFormLayer, plone.app.discussion.interfaces.IDiscussionLayer, plone.app.theming.interfaces.IThemingLayer, plone.app.contenttypes.interfaces.IPloneAppContenttypesLayer, plone.app.event.interfaces.IBrowserLayer, IDefaultBrowserLayer, ISubRequest)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, IDefaultBrowserLayer)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, plone.app.z3cform.interfaces.IPloneFormLayer, plone.app.discussion.interfaces.IDiscussionLayer, plone.app.theming.interfaces.IThemingLayer, plone.app.contenttypes.interfaces.IPloneAppContenttypesLayer, plone.app.event.interfaces.IBrowserLayer, IDefaultBrowserLayer, ISubRequest)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, IDefaultBrowserLayer)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, plone.app.z3cform.interfaces.IPloneFormLayer, plone.app.discussion.interfaces.IDiscussionLayer, plone.app.theming.interfaces.IThemingLayer, plone.app.contenttypes.interfaces.IPloneAppContenttypesLayer, plone.app.event.interfaces.IBrowserLayer, IDefaultBrowserLayer, ISubRequest)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, IDefaultBrowserLayer)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, plone.app.z3cform.interfaces.IPloneFormLayer, plone.app.discussion.interfaces.IDiscussionLayer, plone.app.theming.interfaces.IThemingLayer, plone.app.contenttypes.interfaces.IPloneAppContenttypesLayer, plone.app.event.interfaces.IBrowserLayer, IDefaultBrowserLayer, ISubRequest)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, IDefaultBrowserLayer)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, IDefaultBrowserLayer)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, plone.app.z3cform.interfaces.IPloneFormLayer, plone.app.discussion.interfaces.IDiscussionLayer, plone.app.theming.interfaces.IThemingLayer, plone.app.contenttypes.interfaces.IPloneAppContenttypesLayer, plone.app.event.interfaces.IBrowserLayer, IDefaultBrowserLayer, ISubRequest)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>,
  (classImplements(?), directlyProvides(WSGIRequest, IPloneFormLayer, IDiscussionLayer, IThemingLayer, IPloneAppContenttypesLayer, IBrowserLayer, IDefaultBrowserLayer)): <class 'plone.i18n.negotiate.negotiate.NegotiateLanguage'>}]

@mauritsvanrees
Copy link
Sponsor Member Author

It looks like instances of NegotiateLanguage store a reference to the request, and also get stored in an attribute of the request. That sounds suspicious for creating a cycle that can't be garbage collected, but it should be cleaned up when the request's clear method is called at the end of processing the request, so this might not be relevant.

I was about to say it is not instances, it is the class itself that has this many references. But then it dawned on me that of course an instance will have a reference to the class. So with 100 instances, each instance will have just a few references, but the class will have at least 100. So this could mean that the requests somehow linger in memory, indeed because of a circular reference. Let's investigate.

The NegotiateLanguage class is a multi adapter that is only called on this line.

If I comment out this line and let the function immediately return None, then the reference count of the factory stays at 14 all the time. With this change, a non-multilingual Plone site still seems to work fine, but a multilingual site gives various problems, naturally.

If I change the line to just getMultiAdapter((getSite(), request), INegotiateLanguage), so without assigning this to settings, and then immediately return None, then the reference count does increase.

So simply getting this multi adapter, without even using it, is enough. That seems suspicious.

Ah, but maybe we do something bad in the __init__ method of this adapter? Well, no, if I comment out everything and replace it with simply pass, then the reference count still goes up. When I put a breakpoint in the init method, it gets picked up, so I do have the right class.

So this looks to me more and more like something that may be wrong in the core of ZCA.

@d-maurer Is this something you understand?

Summary: the memory usage of Plone keeps increasing. The debug information page in the ZMI (http://localhost:8080/Control_Panel/DebugInfo/manage_main) shows the top reference counts to be for some adapters. Maybe only multi adapters. They keep increasing on every request. And even just doing the getMultiAdapter lookup without using the result already is enough to increase the reference count.

@d-maurer
Copy link

zope.interface does important caching -- but as far as I know not for adapters.

I would continue to analyze as follows:
For zope.testrunner I have implemented a cycle detection (--> gc-after-test with verbosity level 4 or higher).
You could write a test reproducing the problem and then use this feature to determine the cycle.
Alternatively, you could take over the relevant code from zope.testrunner and use it in your setup. You find it in zope.testrunner.runner starting with

        if (uses_refcounts and self.options.gc_after_test and
                self.options.verbose >= 4):

It essentially uses the DiGraph class from zope.testrunner.digraph (which contains an implementation for Tarjan's algorithm to detect the strongly connected graph components (the non trivial ones are the cycles)).

@d-maurer
Copy link

Maybe only multi adapters. They keep increasing on every request.

I do not think the ZCA (more precisly zope.interface) responsible: multi adapters are extremely frequent (any view is a multi adapter); memory leaks related to multi adapters in general should be fatal within very short timeframes.

Potentially, the problem is limited to adapters registered in a (local) persistent adapter registry (i.e. site specific adapters). I still do not believe zope.interface responsible because it manages only the registrations with the adapter factories; there is no reason whatsoever to cache the adapters themselves.

@mauritsvanrees
Copy link
Sponsor Member Author

I don't see anything bad in the related Plone code yet, so pointing to the ZCA may be logical. But the fact that this is not a wide spread problem for all multi adapters, indeed shows that the problem may be elsewhere.

Thanks for the hints. So that is the zope.testrunner code surrounding these lines. Looks interesting. I hope to try it out next week.

@d-maurer
Copy link

d-maurer commented Jul 28, 2023 via email

@d-maurer
Copy link

I agree with you that the referrer at the end of "#3829 (comment)" is very suspicious and points towards the ZCA.It seems to be a cache to speed up the adapter factory lookup.

I suggest that you compare 2 to the keys in the dict. Obviously, they are different. If the corresponding tuple components are different, expand them into their list of interfaces and compare those lists. They, too, should be different; identify the components that make the lists different.

Dynamic interface definitions could lead to unlimited growth of the cache above. When I remember right, z3c.form might define interfaces dynamically (for instance specific customizations).

Another potential cause could be inconsistent "iro"s (= "Interface Resolution Order"). The "iro" is for interface specifications the analog to the "mro" for class definitions. It determines the order of component interfaces of an interface specification. Of course, each component interface should appear exactly once and if component interface ID is derived from interface IB, then ID should appear before IB and if component interface ID is derived from interfaces IB1 and IB2 (and potentially further interfaces) and IB1 precedes IB2 in the derivation order, then IB1 must precede IB2 in the "iro". Of course, those requirements can not always be met. zope.interface has an option STRICT_IRO which causes an exception in that case, but the "iro"s of some Zope objects violate the condition and Zope cannot use this option. As a consequence, we might observe inconsistent "iro"s.

I hope that your key analysis of the cache reveals the cause for the cache growth. more precisely why requests which should lead to equal interface specifications, apparently use different interface specifications.

@d-maurer
Copy link

I suggest that you compare 2 to the keys in the dict.

I might be necessary to compare more than 2 keys.

I used the following script to try to reproduce and analyze the problem in my local Plone 5.2.10 instance:

from sys import path; path.append(".")
from wrap import wrap
from pprint import pprint as pp
from plone.i18n.interfaces import INegotiateLanguage
from plone.i18n.negotiate.negotiate import NegotiateLanguage
from zope.component import getMultiAdapter
from gc import get_referrers as gr

def filter(l):
  """filter list *l* removing uninteresting things."""
  def cond(x):
    # remove module dicts
    if isinstance(x, dict) and "__loader__" in x: return True
    # remove non dicts
    if not isinstance(x, dict): return True
  return [x for x in l if not cond(x)]

def chk():
  global app
  app = wrap(app); R = app.REQUEST
  v = R.traverse("partner/bp3")
  os = app.partner.bp3, R
  getMultiAdapter(os, INegotiateLanguage)
  return filter(gr(NegotiateLanguage))

pp(chk())
pp(chk())

The used wrap is similar to Testing.make_request.make_request but ensures that a Manager authorization happens. This is necessary as my portal is "members only".
My portal is at partner/bp3.

The script was not able to reproduce the problem. But it showed that the INegotiateLanguage adapter lookup was used for 2 different request interface signatures:

[<InterfaceClass plone.app.z3cform.interfaces.IPloneFormLayer>, <InterfaceClass plone.app.discussion.interfaces.IDiscussionLayer>, <InterfaceClass bfd.bp_backend.customercontent.pdfpreview.interfaces.IPreviewLayer>, <InterfaceClass plone.app.event.interfaces.IBrowserLayer>, <InterfaceClass plone.app.theming.interfaces.IThemingLayer>, <InterfaceClass plone.app.contenttypes.interfaces.IPloneAppContenttypesLayer>, <InterfaceClass plonetheme.barceloneta.interfaces.IBarcelonetaLayer>, <InterfaceClass zope.publisher.interfaces.browser.IDefaultBrowserLayer>, <InterfaceClass zope.publisher.interfaces.browser.IBrowserRequest>, <InterfaceClass zope.annotation.interfaces.IAttributeAnnotatable>]
>>> l1
[<InterfaceClass bfd.bfd_content.browser.interfaces.IThemeSpecific>, <InterfaceClass plone.app.z3cform.interfaces.IPloneFormLayer>, <InterfaceClass plone.app.discussion.interfaces.IDiscussionLayer>, <InterfaceClass bfd.bp_backend.customercontent.pdfpreview.interfaces.IPreviewLayer>, <InterfaceClass plone.app.event.interfaces.IBrowserLayer>, <InterfaceClass plone.app.theming.interfaces.IThemingLayer>, <InterfaceClass plone.app.contenttypes.interfaces.IPloneAppContenttypesLayer>, <InterfaceClass plonetheme.barceloneta.interfaces.IBarcelonetaLayer>, <InterfaceClass plone.theme.interfaces.IDefaultPloneLayer>, <InterfaceClass zope.publisher.interfaces.browser.IDefaultBrowserLayer>, <InterfaceClass zope.publisher.interfaces.browser.IBrowserRequest>, <InterfaceClass zope.annotation.interfaces.IAttributeAnnotatable>]
>>> set(l1) - set(l0)
{<InterfaceClass bfd.bfd_content.browser.interfaces.IThemeSpecific>, <InterfaceClass plone.theme.interfaces.IDefaultPloneLayer>}

Something adds additional interfaces to the request after the first adapter lookup. The second lookup then creates a new cache entry for the extended request interface set.

Should something like this also happen for you, you will need to compare additional keys.

@mamico
Copy link
Member

mamico commented Aug 13, 2023

My 5 cents.

I checked two different key, and I saw differences only on the first tuple element: classImplements(?). Loking into zope.interface code, I guess that this is the providedBy of the Plone Site Root.

So in a test view I memoized the providedBy of the portal and another content like this:

_p = []
_n = []
class A(BrowserView):
    def __call__(self):
        _p.append(providedBy(api.portal.get()))
        _n.append(providedBy(api.portal.get().news))  # tested also with other kind of content

        if len(_p) == 2:
            print(f"== p {_p[0] == _p[1]}")  # == p False
            print(f"== n {_n[0] == _n[1]}")  # == n True

            print(f"__eq__ p {_p[0].__eq__(_p[1])}") # __eq__ p NotImplemented
            print(f"__eq__ n {_n[0].__eq__(_n[1])}") # __eq__ n True

            print(f"hash p {hash(_p[0])} {hash(_p[1])} {hash(_p[0])==hash(_p[1])}")  # hash p 8742003969974 8742004544951 False
            print(f"hash n {hash(_n[0])} {hash(_n[1])} {hash(_n[0])==hash(_n[1])}")  # hash n 8742001634995 8742001634995 True

        return "A"

The second time I call the view, I see that the providedBy of the portal has "changed," otherwise the providedBy of the other content has not.

I also saw a NotImplemented on the __eq__ method in the first case.

I hope it helps.

@d-maurer
Copy link

d-maurer commented Aug 13, 2023 via email

@mamico
Copy link
Member

mamico commented Aug 14, 2023

You could expand the classImplements into the list of interfaces (applying list) and analyze the differences.

This is what I see:

(Pdb) _p[0] == _p[1]
False

(Pdb) len([i for i in _p[0].flattened()])
75
(Pdb) [i for i in _p[0].flattened()]
[<InterfaceClass plone.dexterity.schema.generated.Plone_5_1691832663_2_5337133_0_Plone_1_Site>, <InterfaceClass plone.app.contenttypes.behaviors.richtext.IRichText>, ...]
(Pdb) l0 = [i for i in _p[0].flattened()]
(Pdb) l1 = [i for i in _p[1].flattened()]
(Pdb) l0 == l1
True

(Pdb) len([i for i in _p[0].dependents.items()])
1
(Pdb) [i for i in _p[0].dependents.items()] == [i for i in _p[1].dependents.items()]
True

Pdb) len([i for i in _p[0].interfaces()])
48
(Pdb) [i for i in _p[0].interfaces()]
[<InterfaceClass plone.dexterity.schema.generated.Plone_5_1691832663_2_5337133_0_Plone_1_Site>, <InterfaceClass plone.app.contenttypes.behaviors.richtext.IRichText>, ...]
(Pdb) l0 = [i for i in _p[0].interfaces()]
(Pdb) l1 = [i for i in _p[1].interfaces()]
(Pdb) l0 == l1
True

I can't figure out how the equality operator is working. My suspicions are about the order in IRO or something that is missing (as I mentioned before, the __eq__ method returns NotImplemented here).

(Pdb) _p[0].__eq__(_p[1])
NotImplemented

(Pdb) _n[0].__eq__(_n[1])
True

@d-maurer
Copy link

d-maurer commented Aug 14, 2023 via email

@d-maurer
Copy link

I will try to find out why Implements behaves this way.

A comment in zope.interface.declarations.Implements indicates that identity based hashing and equality is wanted. Thus, the behavior is correct and plone.dexterity uses Implements wrongly: Implements is used for classes, i.e. for mostly static objects; there is a different set of declarations which target instances (--> Provides) and which cache created declarations (in a WeakValueDict). Almost surely, plone.dexterity should use Provides (instead of Implements) to avoid the cache growth.

@mamico
Copy link
Member

mamico commented Aug 20, 2023

I implemented @d-maurer's suggestion (my kudos) at plone/plone.dexterity#187.

Using a benchmark test similar to what @mauritsvanrees said earlier (6 languages, 10,000 requests per language, one thread), measuring RSS with collective.stats, the difference is quite impressive

image

@mauritsvanrees
Copy link
Sponsor Member Author

I cannot test it myself this week, but sounds great! Thank you both.

@mauritsvanrees
Copy link
Sponsor Member Author

Finally managed to test this locally. I hammered the search page with the command under the "Reproduce" section in the main issue comment, stopping after over 100,000 requests. The memory usage stays around 220 MB.

And in the debug control panel under the top 100 reference counts, the plone.i18n and plone.dexterity classes do not even show up.

This is awesome!
Thanks a lot @d-maurer and @mamico!

I have released plone.dexterity 3.0.3 with this fix. This will be released in Plone 6.0.7, expected within 2-3 weeks.

@mauritsvanrees
Copy link
Sponsor Member Author

The fix is live on a website since a few days. Results are looking good:

plone-memory-dexterity

Memory for both zeoclients used to go steadily up till 2 GB, then get automatically restarted. After the update it stays well below 1 GB.

Thanks again!

@gforcada
Copy link
Sponsor Contributor

Since 2012 when we did a rewrite of our website we have been seeing this memory pattern, but never had the time to dive into it.

I can say that indeed, this works like magic ✨ 💯

THANKS! to all involved in getting this reported/investigated/fixed! 🙇🏾

@mauritsvanrees
Copy link
Sponsor Member Author

We may want to back port this to Plone 5.2, which is scheduled to get its last maintenance release in October. But I do want to wait a bit and give people the chance to detect any regressions in Plone 6.

@mauritsvanrees
Copy link
Sponsor Member Author

We may want to back port this to Plone 5.2, which is scheduled to get its last maintenance release in October. But I do want to wait a bit and give people the chance to detect any regressions in Plone 6.

I checked the Debug Information on a multilingual 5.2 site, but the plone.i18n utility did not show up in the first 100 items. The dexterity one does have a couple thousand items, but it does not seem to go up often, and others classes have much higher numbers. So 5.2 seems not affected. The dexterity site root on 6.0 may have been a trigger.

Anyway, here is another screen shot of memory usage for the Plone 6 site:

sfia-memory-2

On the left of the graph is the old situation. Then the new code is used. Then in the middle two manual restarts that were needed for unrelated code. And in all it looks much healthier, at half the memory usage. As the admin says: "Thank you Plone team!"

@gforcada
Copy link
Sponsor Contributor

I have some different experience, we are still on 5.2 and 12 years ago we started with 4.3(?) I guess, but we always have been using dexterity, and not a multilingual site, but nonetheless our memory consumption was spiky until I backported this fix.

So 5.2 is indeed affected, maybe only if you are heavily using Dexterity though? 🤔

@d-maurer
Copy link

d-maurer commented Sep 25, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants