Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LRU eviction after item is stale for fixed time period (N days later, etc.) #59

Closed
elistevens opened this issue Dec 9, 2017 · 9 comments

Comments

@elistevens
Copy link
Contributor

elistevens commented Dec 9, 2017

Edit: After further consideration, I realized that what I'm really looking for is more akin to a LRU eviction policy that only evicts if the last access_time is more than N (in our case, 90 days).

How difficult is it to add custom expiration policies?

@grantjenks
Copy link
Owner

A little context; here's the cache table:

        sql('CREATE TABLE IF NOT EXISTS Cache ('
            ' rowid INTEGER PRIMARY KEY,'
            ' key BLOB,'
            ' raw INTEGER,'
            ' version INTEGER DEFAULT 0,'
            ' store_time REAL,'
            ' expire_time REAL,'
            ' access_time REAL,'
            ' access_count INTEGER DEFAULT 0,'
            ' tag BLOB,'
            ' size INTEGER DEFAULT 0,'
            ' mode INTEGER DEFAULT 0,'
            ' filename TEXT,'
            ' value BLOB)'
        )

Nothing uses the "version" field now. That should be ignored.

The Disk uses "key", "raw", "size", "mode", "filename", and "value" fields.

So the metadata is "store_time", "expire_time", "access_time" (used only by LRU eviction policy) and "access_count" (used only by LFU eviction policy).

I think you want to only change the "expire_time". I wonder if there's value in allowing users to change the "store_time", "tag" and other metadata-ish fields.

When you have time, write a snippet illustrating how you would want it to work and we can iterate from there.

@grantjenks
Copy link
Owner

Related: #56

@elistevens elistevens changed the title Implement function to update expiration time for a single key without having to re-save value LRU eviction after item is stale for fixed time period (N days later, etc.) Dec 11, 2017
@elistevens
Copy link
Contributor Author

For our use case, I'd like to have all eviction happen in a separate, nightly cron process.

Can we have one cache handle with all expiration turned off, and then only the culling cron will have the specific limitations we want enabled, and then we call .expire()?

@grantjenks
Copy link
Owner

Yes. You would set the cull_limit to 0 (https://github.com/grantjenks/python-diskcache/blob/v2.9.0/diskcache/core.py#L66) then add a new eviction policy (https://github.com/grantjenks/python-diskcache/blob/v2.9.0/diskcache/core.py#L82)

I'm about to add ".cull()" in addition to ".expire()". See #52 for background. Currently, "expire" only removes items that have expired. It does not meet size constraints. The idea of "cull()" will be to remove expired items and then apply the eviction policy to meet size constraints.

@elistevens
Copy link
Contributor Author

elistevens commented Dec 11, 2017

So the new eviction policy would be something like...

    'least-recently-used-older-than-90': {
        'init': (
            'CREATE INDEX IF NOT EXISTS Cache_access_time ON'
            ' Cache (access_time)'
        ),
        'get': 'access_time = ((julianday("now") - 2440587.5) * 86400.0)',
        'cull': 'SELECT %s FROM Cache 
            WHERE access_time < ((julianday("now") - 2440587.5 - 90) * 86400.0) 
            ORDER BY access_time 
            LIMIT ?',
    },

then? (Edit: fixed clause ordering)

@grantjenks
Copy link
Owner

That's about right. I think you have to put the "WHERE ..." clause before the "ORDER BY ..." clause though.

The 'cull' key is used in only one place at https://github.com/grantjenks/python-diskcache/blob/master/diskcache/core.py#L705. My only concern, looking at that code, is that two queries are made and I don't think we could guarantee that they return the same rows because your concept of "now" changes slightly between the two queries. Maybe it should be changed to use format strings with the same "now=now" passed into each of these queries.

@elistevens
Copy link
Contributor Author

elistevens commented Dec 11, 2017

Yeah, I see the timing window where something isn't in rows because that got queried first, but is in the DELETE query, because that happens a short time later.

Baking in now=time.time() or w/e (edit: right, now is a param to _cull, gotcha) to the cull SQL could work, but it seems like that might break existing custom eviction policies. Is that actually an issue?

@grantjenks grantjenks mentioned this issue Dec 11, 2017
6 tasks
@grantjenks
Copy link
Owner

I don't think it's a big issue but I'm willing to bump to v3 to get new-style string format parameters. I can't remember why I chose the old-style.

v3 issues/features: #60

@grantjenks
Copy link
Owner

Committed at 24fadab. To be deployed in v3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants