Requests memory leak #4601

Munroc · 2018-04-19T23:33:06Z

Summary.

Expected Result

Program running normally

Actual Result

Program consuming all ram till stops working

Reproduction Steps

Pseudocode:

def function():
    proxies = {
        'https': proxy
    }
    session = requests.Session()
    session.headers.update({'User-Agent': 'user - agent'})
    try:                                           #
        login = session.get(url, proxies=proxies)  # HERE IS WHERE MEMORY LEAKS
    except:                                        #
        return -1                                  #
    return 0

System Information

$ python -m requests.help

{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "2.6"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.6.3"
  },
  "platform": {
    "release": "10",
    "system": "Windows"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.18.4"
  },
  "system_ssl": {
    "version": "100020bf"
  },
  "urllib3": {
    "version": "1.22"
  },
  "using_pyopenssl": false
}

The text was updated successfully, but these errors were encountered:

sigmavirus24 · 2018-04-20T02:34:03Z

Please provide us with the output of

python -m requests.help

If that is unavailable on your version of Requests please provide some basic information about your system (Python version, operating system, etc).

Munroc · 2018-04-20T18:10:44Z

@sigmavirus24 Done

nateprewitt · 2018-04-20T18:20:20Z

Hey @Munroc, a couple quick questions about your threading implementation since it’s not included in the pseudo code.

Are you creating a new session for every thread and what size is the threadpool you're using?
What tool are you using to determine where the leak is coming from? Would you mind sharing the results?

We’ve had hints of memory leaks around sessions for a while now, but I’m not sure we’ve found a smoking gun or truly confirmed impact.

Munroc · 2018-04-20T19:30:29Z

@nateprewitt Hello, yes im creating a new session for every thread. The thread pool is 30. I have tryied with 2 - 200 threads and memory leaks anyway. Im not using a tool, i just did this changes to the function:
put return 0 before login = session.get and no memory leak. if i put return 0 after login = session.get memory starts leaking. If you want i can send you my source code is not too large.

initbar · 2018-05-21T01:31:32Z

@Munroc if we have the full code, then I think it would be easier to isolate the actual cause. But based on the code gist that was provided, I think it is very hard to conclude that there is a memory leak.

As you have mentioned, if you return immediately before calling session.get, then only proxies and session objects will exist in the memory (oversimplified.. but I hope you get the idea 😄). However, once you call session.get(url, proxies=proxies), the HTML of the url will be retrieved and locally saved to the login variable. Which means, each session.get call will "look like" they are leaking memory, but they are actually behaving normally by (memory) linearly increasing by the size of url result.

However, let's say that you were using threads and .join() them immediately afterwards. In that case, I think we need to look at how your threads were managed - and whether they were closed/cleaned properly.

initbar · 2018-10-14T21:56:51Z

@leoszn I think in your specific example, you're closing only the last Process object after generating multiple Process per urls elements.

Could you try daemonizing them using p.daemon = True and run them (so that once the main thread terminates, all the spawned child processes dies also)? Otherwise, store the spawned processes in a separate array and make sure to close all of them using a loop.

leoszn · 2018-10-15T02:55:14Z

@initbar

Do I need to run p.daemon = True in the loop or outside the loop before p.join() ? By the way do I still need p.join() after applying p.daemon = True ?

Badiboy · 2018-10-16T16:19:48Z

Ook, I was kicked from the new topic to this one, so let me join yours.
May be this issue provide more information and will step up the issue solving...

I'm running Telegram bot and noticed the free memory degradation when running bot for a long time. Firstly, I suspect my code; then I suspect bot and finally I came to requests. :)
I used len(gc.get_objects()) to identify that problem exists. I located the communication routines, then cleared all bot code and comes to the example that raises the count of gc objects on every iteration.

Expected Result

len(gc.get_objects()) should give the same result on every loop iteration

Actual Result

The value of len(gc.get_objects()) increases on every loop iteration.

Test N2
GetObjects len: 27959
Test N3
GetObjects len: 27960
Test N4
GetObjects len: 27961
Test N5
GetObjects len: 27962
Test N6
GetObjects len: 27963
Test N7
GetObjects len: 27964

Reproduction Steps

token = "XXX:XXX"
chat_id = '111'
proxy = {'https':'socks5h://ZZZ'} #You may need proxy to run this in Russia

from time import sleep
import gc, requests

def garbage_info():
    res = ""
    res += "\nGetObjects len: " + str(len(gc.get_objects()))
    return res

def tester():
    count = 0
    while(True):
        sleep(1)
        count += 1
        msg = "\nTest N{0}".format(count) + garbage_info()
        print(msg)

        method_url = r'sendMessage'
        payload = {'chat_id': str(chat_id), 'text': msg}

        request_url = "https://api.telegram.org/bot{0}/{1}".format(token, method_url)
        method_name = 'get'

        session = requests.session()
        req = requests.Request(
            method=method_name.upper(),
            url=request_url,
            params=payload
        )
        prep = session.prepare_request(req)

        settings = session.merge_environment_settings(
            prep.url, None, None, None, None)
#            prep.url, proxy, None, None, None)  #Change the line to enable proxy
        send_kwargs = {
            'timeout': None,
            'allow_redirects': None,
        }
        send_kwargs.update(settings)
        resp = session.send(prep, **send_kwargs)

        # For more clean output
        gc.collect()

tester()

System Information

{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": "2.3.1"
  },
  "idna": {
    "version": "2.7"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.6.6"
  },
  "platform": {
    "release": "4.15.0-36-generic",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "1010009f",
    "version": "17.5.0"
  },
  "requests": {
    "version": "2.19.1"
  },
  "system_ssl": {
    "version": "1010007f"
  },
  "urllib3": {
    "version": "1.23"
  },
  "using_pyopenssl": true
}

The same behaviour I had on Python 3.5.3 on Windows10.

initbar · 2018-10-17T14:42:42Z

@leoszn

@initbar

Do I need to run p.daemon = True in the loop or outside the loop before p.join() ? By the way do I still need p.join() after applying p.daemon = True ?

# ..
     for i in urls:
        p = Process(target=main, args=(i,))
        p.daemon = True  # before `.start`
        p.start()
# ..

As a minor note, you can still .join daemon processes -- but they are near-guaranteed to be killed when their parent process terminates (unless they somehow become unintentionally orphaned; in which case, please let me know! I've love to learn more about it).

Otherwise, you can store the Process objects separately as an array and join in the end:

# ..
processes = [ 
  Process(target=main, args=(i,))
  for i in urls
]
# start the process activity.

Badiboy · 2018-10-26T06:31:15Z

Expected Result

len(gc.get_objects()) should give the same result on every loop iteration

The reason of this behaviour was found in "requests" cache mechanism.

It works incorrect (suspected): it adds a cache record to every call to Telegram API URL (instead of caching it once). But it does not lead to the memory leak, because cache size is limited to 20 and cache is resetting after reaching this limit and the growing number of objects will be decreased back to initial value.

jotunskij · 2018-12-17T11:33:41Z

Similar issue. Requests eats memory when running in thread. Code to reproduce here:

import gc
from concurrent.futures import ThreadPoolExecutor, as_completed
import requests
from memory_profiler import profile

def run_thread_request(sess, run):
    response = sess.get('https://www.google.com')
    return

@profile
def main():
    sess = requests.session()
    with ThreadPoolExecutor(max_workers=1) as executor:
        print('Starting!')
        tasks = {executor.submit(run_thread_request, sess, run):
                    run for run in range(50)}
        for _ in as_completed(tasks):
            pass
    print('Done!')
    return

@profile
def calling():
    main()
    gc.collect()
    return

if __name__ == '__main__':
    calling()

In the code given above I pass a session object around, but if I replace it with just running requests.get nothing changes.

Output is:

➜  thread-test pipenv run python run.py
Starting!
Done!
Filename: run.py

Line #    Mem usage    Increment   Line Contents
================================================
    10     23.2 MiB     23.2 MiB   @profile
    11                             def main():
    12     23.2 MiB      0.0 MiB       sess = requests.session()
    13     23.2 MiB      0.0 MiB       with ThreadPoolExecutor(max_workers=1) as executor:
    14     23.2 MiB      0.0 MiB           print('Starting!')
    15     23.4 MiB      0.0 MiB           tasks = {executor.submit(run_thread_request, sess, run):
    16     23.4 MiB      0.0 MiB                       run for run in range(50)}
    17     25.8 MiB      2.4 MiB           for _ in as_completed(tasks):
    18     25.8 MiB      0.0 MiB               pass
    19     25.8 MiB      0.0 MiB       print('Done!')
    20     25.8 MiB      0.0 MiB       return


Filename: run.py

Line #    Mem usage    Increment   Line Contents
================================================
    22     23.2 MiB     23.2 MiB   @profile
    23                             def calling():
    24     25.8 MiB      2.6 MiB       main()
    25     25.8 MiB      0.0 MiB       gc.collect()
    26     25.8 MiB      0.0 MiB       return

And Pipfile looks like this:

[[source]]
url = "https://pypi.python.org/simple"
verify_ssl = true

[requires]
python_version = "3.6"

[packages]
requests = "==2.21.0"
memory-profiler = "==0.55.0"

pawel-lmcb · 2019-03-26T21:03:51Z

FWIW I am also experiencing a similar memory leak as @jotunskij here is more info

nicolargo/glances#1447

BarryThrill · 2019-05-05T09:15:02Z

I also do have same issue where using requests.get with threading actually eats up the memory by around 0.1 - 0.9 per requests and it is not "clearing" itself after the requests but saves it.

popjxc · 2019-06-13T06:16:44Z

Same here, any work around?

tallona · 2019-09-27T21:02:37Z

Edit
My issue looks to be due to using verify=False in requests, I've raised a bug under #5215

Having the same issue. I have a simple script that spawns a thread, this thread calls a function that runs a while loop, this loop queries an API to check a status value and then sleeps for 10 seconds and then the loop will run again until the script is stopped.

When using the requests.get function I can see the memory usage slowly creeping up via task manager by watching the spawned process.

But if I remove the requests.get call from the loop or use urllib3 directly to make the get request, there is very little if any creep of the memory usage.

I've watched this over a two hour period in both cases and when using requests.get the memory usage is at 1GB+ after two hours where as when using urllib3 the memory usage is at approx. 20mb after two hours.

Python 3.7.4 and requests 2.22.0

PedanticHacker · 2019-10-01T18:09:28Z

It seems Requests is still in beta stage having memory leaks like that. Come on, guys, patch this up! 😉👍

MuhammadAliShahzad · 2019-10-02T10:14:02Z

Any update on this? Simple POST request with a file upload also creates the similar issue of the memory leak.

far-rainbow · 2019-12-09T09:35:10Z

Same for me... leakage while on threadpool execution is on Windows python38 too.
requests 2.22.0

ghost · 2020-01-13T08:33:10Z

Same for me

ghost · 2020-01-16T07:37:14Z

Here is my memory leaking issue, anyone can help ? https://stackoverflow.com/questions/59746125/memory-keep-growing-when-using-mutil-thread-download-file

guyskk · 2020-03-24T15:57:27Z

Call Session.close() and Response.close() can avoid the memory leak.
And ssl will consume more memory so the memory leak will more remarkable when request https urls.

First I make 4 test cases:

requests + ssl (https://)
requests + non-ssl (http://)
aiohttp + ssl (https://)
aiohttp + non-ssl (http://)

Pseudo code:

def run(url):
    session = requests.session()
    response = session.get(url)

while True:
    for url in urls:  # about 5k urls of public websites
        # execute in thread pool, size=10
        thread_pool.submit(run, url)

# in another thread, record memory usage every seconds

Memory usage graph(y-axis: MB, x-axis: time), requests use lots of memory and memory increase very fast, while aiohttp memory usage is stable:

Then I add Session.close() and test again:

def run(url):
    session = requests.session()
    response = session.get(url)
    session.close()  # close session !!

Memory usage significant decreased, but memory usage still increase over time:

Finally I add Response.close() and test again:

def run(url):
    session = requests.session()
    response = session.get(url)
    session.close()  # close session !!
    response.close()  # close response !!

Memory usage decreased again, and not increase over time:

Compare aiohttp and requests shows memory leak is not caused by ssl, it's caused by connection resources not closed.

Useful scripts:

class MemoryReporter:
    def __init__(self, name):
        self.name = name
        self.file = open(f'memoryleak/memory_{name}.txt', 'w')
        self.thread = None

    def _get_memory(self):
        return psutil.Process().memory_info().rss

    def main(self):
        while True:
            t = time.time()
            v = self._get_memory()
            self.file.write(f'{t},{v}\n')
            self.file.flush()
            time.sleep(1)

    def start(self):
        self.thread = Thread(target=self.main, name=self.name, daemon=True)
        self.thread.start()


def plot_memory(name):
    filepath = 'memoryleak/memory_{}.txt'.format(name)
    df_mem = pd.read_csv(filepath, index_col=0, names=['t', 'v'])
    df_mem.index = pd.to_datetime(df_mem.index, unit='s')
    df_mem.v = df_mem.v / 1024 / 1024
    df_mem.plot(figsize=(16, 8))

System Information:

$ python -m requests.help
{
  "chardet": {
    "version": "3.0.4"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "2.8"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.7.4"
  },
  "platform": {
    "release": "18.0.0",
    "system": "Darwin"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.22.0"
  },
  "system_ssl": {
    "version": "1010104f"
  },
  "urllib3": {
    "version": "1.25.6"
  },
  "using_pyopenssl": false
}

see psf/requests#4601 (comment) for details

VeNoMouS · 2020-04-13T21:06:00Z

SSL leak problem is packaged OpenSSL <= 3.7.4 on Windows and OSX, its not releasing the memory from the context properly

https://github.com/VeNoMouS/cloudscraper/issues/143#issuecomment-613092377

…equests#4601

andre487 · 2023-01-09T09:57:30Z

I have the same problem. It appears only when I use proxies argument.

{'chardet': {'version': None},
 'charset_normalizer': {'version': '3.0.1'},
 'cryptography': {'version': ''},
 'idna': {'version': '3.4'},
 'implementation': {'name': 'CPython', 'version': '3.10.9'},
 'platform': {'release': '5.4.161-26.3', 'system': 'Linux'},
 'pyOpenSSL': {'openssl_version': '', 'version': None},
 'requests': {'version': '2.28.1'},
 'system_ssl': {'version': '1010113f'},
 'urllib3': {'version': '1.26.13'},
 'using_charset_normalizer': True,
 'using_pyopenssl': False}

constantind · 2023-01-11T15:31:04Z

Same happens with requests 2.27.1 and urllib3 1.26.13
if it helps tracemalloc shows increments:
stats top 10 every 500:
requests/utils.py:353: size=4600 B, count=60, average=77 B
diffs top 10 every 500:
urllib3/_collections.py:153: size=1344 B (+168 B), count=6 (+1), average=224 B
requests/utils.py:822: size=840 B (+168 B), count=5 (+1), average=168 B

sigmavirus24 added Needs Info Propose Close labels Apr 20, 2018

nateprewitt mentioned this issue May 12, 2018

Requests memory leak: Caused computer to crash #4644

Closed

sigmavirus24 mentioned this issue Oct 16, 2018

Memory leaks suspected (gc objects increases) #4826

Closed

hemberger mentioned this issue Nov 28, 2018

Memory usage steadily increasing when running for a longer time MechanicalSoup/MechanicalSoup#253

Closed

dsanghan mentioned this issue Nov 15, 2019

Memory leak in fetch ecederstrand/exchangelib#675

Closed

Tom-Willemsen mentioned this issue Mar 2, 2020

json_bourne: resource leak and other issues ISISComputingGroup/IBEX#5280

Closed

imWildCat mentioned this issue Mar 28, 2020

内存泄漏? imWildCat/scylla#113

Open

coreGreenberet added a commit to coreGreenberet/dashboard-api-python that referenced this issue Apr 12, 2020

added calls to response.close() to fix a memory leak

bc2898a

see psf/requests#4601 (comment) for details

coreGreenberet mentioned this issue Apr 12, 2020

added calls to response.close() to fix a memory leak meraki/dashboard-api-python#79

Merged

ecederstrand added a commit to ecederstrand/exchangelib that referenced this issue May 11, 2020

Close response objects in an attempt to fix memory leakage. See psf/r…

a51552e

…equests#4601

samschott mentioned this issue Mar 8, 2021

Improve resource management for network sessions samschott/maestral#340

Merged

amotl mentioned this issue Jun 3, 2021

mqttwarn is eating up all memory mqtt-tools/mqttwarn#478

Closed

pmara-r7 mentioned this issue Sep 21, 2021

SOAR-6425 Trendmicro Deepsecurity Fix Memory Leak- Request Session rapid7/insightconnect-plugins#1012

Merged

17 tasks

wiktorn mentioned this issue Mar 20, 2022

Memory leak because session is not closed osmcode/pyosmium#195

Closed

This comment was marked as off-topic.

Sign in to view

coletdjnz mentioned this issue Jan 3, 2024

Memory leaks introduced with requests handler yt-dlp/yt-dlp#8922

Closed

10 tasks

c4lm mentioned this issue Jun 14, 2024

Memory Continually Increasing Intermittently conductor-sdk/conductor-python#267

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Requests memory leak #4601

Requests memory leak #4601

Munroc commented Apr 19, 2018 •

edited

sigmavirus24 commented Apr 20, 2018

Munroc commented Apr 20, 2018

nateprewitt commented Apr 20, 2018

Munroc commented Apr 20, 2018

initbar commented May 21, 2018 •

edited

initbar commented Oct 14, 2018 •

edited

leoszn commented Oct 15, 2018 •

edited

Badiboy commented Oct 16, 2018 •

edited

initbar commented Oct 17, 2018

Badiboy commented Oct 26, 2018

Expected Result

jotunskij commented Dec 17, 2018

pawel-lmcb commented Mar 26, 2019

BarryThrill commented May 5, 2019

popjxc commented Jun 13, 2019

tallona commented Sep 27, 2019 •

edited

PedanticHacker commented Oct 1, 2019

MuhammadAliShahzad commented Oct 2, 2019

far-rainbow commented Dec 9, 2019 •

edited

ghost commented Jan 13, 2020

ghost commented Jan 16, 2020

guyskk commented Mar 24, 2020 •

edited

VeNoMouS commented Apr 13, 2020 •

edited

andre487 commented Jan 9, 2023

constantind commented Jan 11, 2023 •

edited

This comment was marked as off-topic.

Requests memory leak #4601

Requests memory leak #4601

Comments

Munroc commented Apr 19, 2018 • edited

Expected Result

Actual Result

Reproduction Steps

System Information

sigmavirus24 commented Apr 20, 2018

Munroc commented Apr 20, 2018

nateprewitt commented Apr 20, 2018

Munroc commented Apr 20, 2018

initbar commented May 21, 2018 • edited

initbar commented Oct 14, 2018 • edited

leoszn commented Oct 15, 2018 • edited

Badiboy commented Oct 16, 2018 • edited

Expected Result

Actual Result

Reproduction Steps

System Information

initbar commented Oct 17, 2018

Badiboy commented Oct 26, 2018

Expected Result

jotunskij commented Dec 17, 2018

pawel-lmcb commented Mar 26, 2019

BarryThrill commented May 5, 2019

popjxc commented Jun 13, 2019

tallona commented Sep 27, 2019 • edited

PedanticHacker commented Oct 1, 2019

MuhammadAliShahzad commented Oct 2, 2019

far-rainbow commented Dec 9, 2019 • edited

ghost commented Jan 13, 2020

ghost commented Jan 16, 2020

guyskk commented Mar 24, 2020 • edited

VeNoMouS commented Apr 13, 2020 • edited

andre487 commented Jan 9, 2023

constantind commented Jan 11, 2023 • edited

This comment was marked as off-topic.

Munroc commented Apr 19, 2018 •

edited

initbar commented May 21, 2018 •

edited

initbar commented Oct 14, 2018 •

edited

leoszn commented Oct 15, 2018 •

edited

Badiboy commented Oct 16, 2018 •

edited

tallona commented Sep 27, 2019 •

edited

far-rainbow commented Dec 9, 2019 •

edited

guyskk commented Mar 24, 2020 •

edited

VeNoMouS commented Apr 13, 2020 •

edited

constantind commented Jan 11, 2023 •

edited