fix leak of memory in cache - add settings.CACHE_SIZE_LIMIT #1140

chebotarevmichael · 2023-02-08T17:26:19Z

PROBLEM: leak of memory.

import dateparser
from datetime import datetime

# ~1.5GB of leaked memory after function finish
def hard_leak():
    for i in range(3000):
        # every call == -0.55MB of leaked memory
        dateparser.parse('dasdasd', settings={'RELATIVE_BASE': datetime.utcnow()})


# ~27MB of leaked memory after function finish
def light_leak():
    for i in range(3000):
        # every call == -0.01MB of leaked memory
        dateparser.parse('12.01.2021', settings={'RELATIVE_BASE': datetime.utcnow()})

After each calling of dateparser.parse new item is added to cache dictionaries:

    _split_regex_cache = {}
    _sorted_words_cache = {}
    _split_relative_regex_cache = {}
    _sorted_relative_strings_cache = {}
    _match_relative_regex_cache = {}

After 3000 calls we will found 3000 items in each of dictionaries, and we have lost few memory. We are forced to stop using this module.

SOLUTION: add a limit (CACHE_SIZE_LIMIT) for max items in caches.

Gallaecio

I wonder if something like cachetools.LRUCache would be a better choice here, but this seems like an improvement nonetheless.

add settings.CACHE_SIZE_LIMIT

28142e6

Gallaecio approved these changes Mar 15, 2023

View reviewed changes

serhii73 approved these changes Mar 15, 2023

View reviewed changes

serhii73 merged commit a11d128 into scrapinghub:master Mar 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix leak of memory in cache - add settings.CACHE_SIZE_LIMIT #1140

fix leak of memory in cache - add settings.CACHE_SIZE_LIMIT #1140

chebotarevmichael commented Feb 8, 2023

Gallaecio left a comment

fix leak of memory in cache - add settings.CACHE_SIZE_LIMIT #1140

fix leak of memory in cache - add settings.CACHE_SIZE_LIMIT #1140

Conversation

chebotarevmichael commented Feb 8, 2023

Gallaecio left a comment

Choose a reason for hiding this comment