Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-Memory cache fills up indefinitely by default #40

Closed
halcy opened this issue Dec 18, 2015 · 4 comments
Closed

In-Memory cache fills up indefinitely by default #40

halcy opened this issue Dec 18, 2015 · 4 comments

Comments

@halcy
Copy link

halcy commented Dec 18, 2015

Right now, the in-memory cache only ever gets fuller. When pulling a lot of data from the API, this can start to be a problem. I'd propose either explicitly mentioning this and how to reset the cache in documentation, or (ideally) having a maximum size for the cache / maximum number of stored objects, with some expulsion policy (LRU?), and a sane default value to start off with (but this might be hard to do properly).

@robrua
Copy link
Member

robrua commented Dec 19, 2015

Hey - been planning on updating the caching stuff eventually to have configurable expiration periods for each type + some eviction policy. Haven't gotten around to it yet. For now you can just set the datastore to a new instance of Cache to clear it.

@jjmaldonis
Copy link
Member

After almost two years we finally get to close this issue. This was fixed / added over a series of recent commits.

Some details:

The data sinks now support expiration timeouts! I.e. you can specify how long you want data to "live" in the cache or diskstore. If a piece of data is accessed that has expired, it will be removed from the data sink and the data will be refreshed. Currently, data will not be removed from the data sink until it is accessed, and it's up to you as the user to remove "old" data if it's taking up too much space / memory. You can do this via one of the following two methods: The settings (available via cass.configuration.settings) now has 1) .clear_sinks and 2) .expire_sinks which 1) remove things from all data sinks and 2) remove all expired data from all data sinks. You can specify how long you want data to live in a datasink in your settings.

@martinzlocha
Copy link

martinzlocha commented Dec 5, 2017

Hi @jjmaldonis, I have encountered the same issue but even after doing .clear_sinks and .expire_sinks the issue persists. After maybe 5 minutes the memory usage grows from 70-120MB.

Here is some code which I am using to reproduce the issue, I have removed all other parts which could cause memory leaks:

def crawl_region(region):
    print('Starting crawling {}'.format(region))

    last_index = 0
    summoners = []
    challenger_league = cass.get_challenger_league(queue=Queue.ranked_solo_fives, region=region)
    master_league = cass.get_master_league(queue=Queue.ranked_solo_fives, region=region)

    for s in challenger_league:
        summoners.append(int(s.summoner.id))

    for s in master_league:
        summoners.append(int(s.summoner.id))

    while last_index < len(summoners):
        try:
            summoner = Summoner(id=summoners[last_index], region=region)
            last_index += 1

            end = datetime.now()
            start = end - timedelta(days=7)

            match_history = cass.get_match_history(summoner, queues={Queue.ranked_solo_fives, Queue.blind_fives, 
                                                                     Queue.ranked_flex_fives, Queue.normal_draft_fives}, 
                                                   begin_time=start, end_time=end, region=region)

            for match in match_history:
                participants = [participant for team in match.teams for participant in team.participants]
                for p in participants:
                    side = 0 if p.team.side == Side.red else 1
                    role = derive_position(p.timeline.lane, p.timeline.role)
                    
                cass.configuration.settings.clear_sinks()
                cass.configuration.settings.expire_sinks()
        except:
            cass.configuration.settings.clear_sinks()
            cass.configuration.settings.expire_sinks()

To get this code to work I had to wrap part of the expire_sinks method using a try catch block, namely:

        for sink in self.pipeline._sinks:
            for type in types:
                try:
                    sink.expire(type)
                except:
                    pass

This was done because I got the following error:

  File "C:\ProgramData\Anaconda3\lib\site-packages\cassiopeia\datastores\cache.py", line 137, in expire
    self._cache.expire(type)
AttributeError: 'Cache' object has no attribute 'expire'

Even after I dug around the code in the library and set the default expiration in cache.py to 0 the issue persists.

Versions used:
cassiopeia: 3.0.15
merakiommons: 1.0.2
python: 3.6 (anaconda)

@jjmaldonis
Copy link
Member

Thanks for the detailed bug report. We'll get a new release out asap to fix this.

@jjmaldonis jjmaldonis reopened this Dec 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants