Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak with PubNub and gevent>=1.3.0 #1371

Closed
dmrlawson opened this issue Mar 25, 2019 · 6 comments

Comments

Projects
None yet
2 participants
@dmrlawson
Copy link

commented Mar 25, 2019

  • gevent version: >=1.3.0
  • Python version: cPython 2.7+
  • Operating System: Tested on Ubuntu 16.04 and 18.10

Description:

When streaming data using PubNub (https://github.com/pubnub/python), memory usage increases and performance slows over time.

What I've run:

Small example:

import gevent.monkey
gevent.monkey.patch_all()

import logging
import os
import psutil
import time

from pubnub.callbacks import SubscribeCallback
from pubnub.enums import PNStatusCategory
from pubnub.pnconfiguration import PNConfiguration
from pubnub.pubnub import PubNub

process = psutil.Process(os.getpid())


class MySubscribeCallback(SubscribeCallback):
	def __init__(self):
		super().__init__()
		self.done_one = False

	def presence(self, pubnub, presence):
		pass  # handle incoming presence data

	def status(self, pubnub, status):
		print("Status category", status.category)

	def message(self, pubnub, message):
		if int(time.time()) % 10 == 0:
			if not self.done_one:
				self.done_one = True
				print(f"{int(time.time())} mem={process.memory_info()[0]}")
		else:
			self.done_one = False

if __name__ == "__main__":
    pnconfig = PNConfiguration()
    # Public example feed (twitter messages)
    pnconfig.subscribe_key = "sub-c-78806dd4-42a6-11e4-aed8-02ee2ddab7fe"
    pubnub = PubNub(pnconfig)
    my_listener = MySubscribeCallback()
    pubnub.add_listener(my_listener)
    pubnub.subscribe().channels('pubnub-twitter').execute()

Running the above without gevent (commenting out the first two lines) shows no significant change in memory usage:

1553508170 mem=24907776
...
1553510460 mem=27566080

With gevent 1.2.2, similar results:

1553509080 mem=28160000
...
1553510530 mem=29835264

With gevent 1.3.0 memory usage increases steadily:

1553509240 mem=33239040
...
1553510580 mem=59940864

And the same with all versions of gevent up to and including the latest (1.4.0):

1553508170 mem=33284096
...
1553510640 mem=81190912

I was going to start by making an issue in the PubNub repo but they don't seem to accept issues.

Any ideas of other things I could try?

@jamadden

This comment has been minimized.

Copy link
Member

commented Mar 25, 2019

It would be helpful to get an idea of what's leaking. You can do this using objgraph; there's an example in #1318 (comment)

@dmrlawson

This comment has been minimized.

Copy link
Author

commented Mar 25, 2019

Sure, slightly modified the script:

import gevent.monkey
gevent.monkey.patch_all()

import logging
import os
import psutil
import time

from pubnub.callbacks import SubscribeCallback
from pubnub.enums import PNStatusCategory
from pubnub.pnconfiguration import PNConfiguration
from pubnub.pubnub import PubNub
import objgraph

process = psutil.Process(os.getpid())


class MySubscribeCallback(SubscribeCallback):
	count = 0

	def __init__(self):
		super(MySubscribeCallback, self).__init__()
		self.done_one = False

	def presence(self, pubnub, presence):
		pass  # handle incoming presence data

	def status(self, pubnub, status):
		print("Status category", status.category)

	def message(self, pubnub, message):
		if self.count % 100 == 0:
			print("{} mem={}".format(int(time.time()), process.memory_info()[0]))

		self.count += 1


if __name__ == "__main__":
    pnconfig = PNConfiguration()
    # Public example feed (twitter messages)
    pnconfig.subscribe_key = "sub-c-78806dd4-42a6-11e4-aed8-02ee2ddab7fe"
    pubnub = PubNub(pnconfig)
    my_listener = MySubscribeCallback()
    pubnub.add_listener(my_listener)
    pubnub.subscribe().channels('pubnub-twitter').execute()
    objgraph.show_growth(limit=3)

    while True:
    	objgraph.show_growth()
    	time.sleep(10)

With gevent 1.2.2:

function     5021     +5021
dict         1748     +1748
tuple        1412     +1412
frame                            97        +5
list                            850        +2
builtin_function_or_method      914        +2
tuple                          1414        +2
('Status category', 6)
1553524799 mem=24367104
1553524803 mem=24698880
1553524807 mem=25083904
list              865       +15
io                  6        +6
dict             1753        +5
classobj          120        +4
weakref          1106        +4
socket              3        +3
Waiter              5        +2
timer               5        +2
HTTPResponse        2        +2
type              522        +2
1553524811 mem=25350144
1553524814 mem=25350144
1553524819 mem=25337856
SplitResult       20       +18
1553524823 mem=25337856
...
1553524854 mem=25341952
traceback               2        +2
PublishMetadata         2        +1
Empty                   1        +1
SubscribeMessage        2        +1
1553524859 mem=25341952
...
1553524912 mem=25608192
list      881       +16
dict     1768       +15
1553524919 mem=25608192
...
1553524967 mem=25608192
list                  885        +4
SubscribeMessage        3        +1
PublishMetadata         3        +1
PollResult              1        +1
HTTPMessage             2        +1
poll                    1        +1
Event                   1        +1
1553524972 mem=25608192
...
1553525530 mem=25546752

With gevent 1.4:

function     5997     +5997
dict         2586     +2586
list         2122     +2122
list     2123        +1
('Status category', 6)
1553524801 mem=30294016
1553524805 mem=30748672
1553524809 mem=31019008
list                 2144       +21
SplitResult            12       +10
dict                 2594        +8
io                      6        +6
weakref              1344        +6
classobj              149        +4
SubscribeMessage        3        +3
socket                  3        +3
PublishMetadata         3        +3
Waiter                  3        +2
1553524813 mem=31019008
1553524816 mem=31281152
1553524821 mem=31281152
1553524826 mem=31281152
1553524829 mem=31551488
SplitResult       16        +4
traceback          2        +2
Empty              6        +1
1553524834 mem=31551488
...
1553524950 mem=33484800
builtin_function_or_method      947        +2
poll                              1        +1
PollResult                        1        +1
HTTPMessage                       2        +1
1553524956 mem=33861632
1553524960 mem=33652736
SplitResult       18        +2
1553524965 mem=33968128
...
1553525079 mem=35848192
weakref     1345        +1
1553525082 mem=36089856
...
1553525111 mem=37298176
SplitResult       19        +1
1553525115 mem=37076992
1553525121 mem=37720064
1553525125 mem=36970496
1553525131 mem=37474304
dict     2598        +4
list     2147        +3
1553525135 mem=37482496
...
1553525260 mem=39542784
dict     2599        +1
1553525265 mem=39612416
1553525269 mem=40202240
1553525273 mem=39202816
1553525278 mem=40361984
list     2151        +4
1553525283 mem=40419328
...
1553525351 mem=40521728
dict     2608        +9
list     2158        +7
1553525355 mem=41914368
1553525359 mem=40611840
SplitResult       20        +1
1553525364 mem=41218048
...
1553525533 mem=45264896

(... just indicates I've deleted some unnecessary lines, I've kept all of the objgraph lines)

So it looks like there are an increasing number of dictionaries over time?

@dmrlawson

This comment has been minimized.

Copy link
Author

commented Mar 26, 2019

I've used git bisect to find out when the problem started and I believe it to be commit 15fc1ec.

I'm not sure where to go from here.

@jamadden

This comment has been minimized.

Copy link
Member

commented Mar 26, 2019

I have also tracked it to that bit of code using a different method. It appears to be a C-level reference cycle that's not being broken by the garbage collector in the Frame objects. I think it hasn't shown up in our leak tests because this particular spawning path is making heavy use of closures in a way that's probably not specifically tested. (pubnub.request_handlers.requests_handler.RequestRequestHandler.async_request defines a closure method, passes that closure to an object constructor, spawns a thread using a method of that object, stores that thread on the return value---and that return value is part of the closure.)

As a mitigation, you can configure gevent not to track the spawn tree, which takes this code out of the equation.

@jamadden

This comment has been minimized.

Copy link
Member

commented Mar 26, 2019

It's simpler than that. It turns out that the callbacks spawn new threads which spawn new threads, etc, to a very deep level (I don't think there's a bottom to it, actually). Each level of nesting down the tree keeps some of the data related to its parent in the tree (when it copies spawner._spawning_stack_frames into self._spawning_stack_frames), which winds up growing roughly in proportion to the depth of the spawn tree. It was hard to see this in Python memory profilers or objgraph because the same list and (to a lesser extent) tuple objects keep getting re-used from the freelists, and the growth was very slow (just references to existing objects).

@dmrlawson

This comment has been minimized.

Copy link
Author

commented Mar 27, 2019

Thank you! I don't think we would have figured that out alone.

The suggested mitigation works for now and we'll update when there's a new release.

@jamadden jamadden closed this in e64647f Mar 27, 2019

jamadden added a commit that referenced this issue Mar 27, 2019

Merge pull request #1374 from gevent/issue1371
Avoid unbounded memory usage in deep spawn trees. Fixes #1371.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.