Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idea for time issues/features #62

Open
sqlalchemy-bot opened this issue May 20, 2014 · 5 comments
Open

idea for time issues/features #62

sqlalchemy-bot opened this issue May 20, 2014 · 5 comments
Labels

Comments

@sqlalchemy-bot
Copy link

Migrated issue, originally created by jvanasco (jvanasco)

I came up with a use-case and idea that might tie together a few existing tickets.

It also might be a terrible idea.

Existing Tickets -

https://bitbucket.org/zzzeek/dogpile.cache/issue/37/expose-cache-age
https://bitbucket.org/zzzeek/dogpile.cache/issue/45/update-expiration-time-on-get

Use Case -

Given:

A) We have some "write" operations that are more frequent on some objects than others.
B) Our objects can be expensive to generate
C) We want to balance performance and clarity of code

Two options come to mind:

  1. use multiple cache keys: high-write area, low-write area
  2. use a single object, update it

The first option can be less-readable

The second option has the caveat that writing will extend the cache expiry.

The general idea I have is this:

get_raw returns a CacheHit object that has attributes for payload and timestamp_expiry. it possibly has an attribute for timestamp_last_update.

CacheHit (or dogpile) has a method for soft_update -- which will set a modified payload without updating the expiry. alternatively, a new expiry time could happen as well.

This would allow people to keep the original expiry time ( let's say 10 minutes ) but have the ability to "update" the value of the payload within that time. They payload would still expire in 10minutes ( unless explicitly extended ).

In the use-cases of comments and surveys, this might allow a developer to increment the 'count' of respondents many times over the span of a minute... yet still require a sync to the backend datastore every 10 minutes.

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

do you mean "multiple cache regions" for #1?

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

The region.backend.get and region.backend.set methods give you the CachedValue object. You can write a new one that keeps the original creation time.

@sqlalchemy-bot
Copy link
Author

jvanasco (jvanasco) wrote:

This is weird, I see 3 responses in my email, but your detailed example isn't on bitbucket...

Using the previous example of a Survey, my idea is that the conceptual object would be split into 2 K/V payloads:

• read-only ( The core survey data )
• high writes ( The number of respondents )

They could be in a single region or multiple region. Either way, 2 keys would be needed for splitting up the data.

If the backend's get/set methods allow for direct CachedValue access, writing this functionality could be entirely in 'userland' without library modification.

A detailed use-case would be something like this...

i'll use the term "UPDATE" to describe the functionality of preserving the original creation time

# cache region default is 5:00

2014-05-20 12:00:00 - GET "count_responses:1" # Fails
2014-05-20 12:00:01 - SET "count_respsonse:1" = 100 # set to 100; cache is set to 5:00

2014-05-20 12:00:30 - GET "count_responses:1" # returns 100
2014-05-20 12:04:30 - UPDATE "count_responses:1" = 101 # increment by 1

2014-05-20 12:00:31 - GET "count_responses:1" # returns 101
2014-05-20 12:04:31 - UPDATE "count_responses:1" = 102 # increment by 1

2014-05-20 12:00:32 - GET "count_responses:1" # returns 106
2014-05-20 12:04:32 - UPDATE "count_responses:1" = 107 # increment by 1

2014-05-20 12:05:30 - GET "count_responses:1" # Fails 
2014-05-20 12:00:01 - SET "count_respsonse:1" = 110 # set to 110.  Because of race conditions in a clustered environment, we only incremented the value to 107, even though we probably incremented 10 times.  The reason is because many writes may have operated on stale data often (ie, 3 clients try to increment 105->106, instead of 105->106, 106->107, 107->108)

I think this approach could handle the issues that @sontek brought up in #37 and #45

I got pinged with some bitbucket notices last week from those threads, and i foresee similar needs, so I've been thinking of ways to tackle the problem.

[2015-11] Clarified the above example

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

the detailed example is not here because I deleted it, it doesn't work :) you still need to be able to game the "created" timestamp in the cache.

@sqlalchemy-bot
Copy link
Author

Michael Bayer (zzzeek) wrote:

OK der I think what I had does work, very hard to get my head around these. If it is 12:30, and you updated the cache at 12:28, the "modulus" approach will have it such that the value will be invalidated. If OTOH the cache was updated at 12:31, and it is now 12:32, then the value will not be invalidated until 12:40 - but that is fine, right?

Here's what it was:

#!python


def every_ten_minutes():
    ten_minutes = 60 * 10

    return time.time() % ten_minutes

@region.cache_on_arguments(expiration_time=every_ten_minutes)
def my_expensive_thing(x, y):
    return expensive_lookup(x, y)

the test I ran to show the motion was:

#!python

>>> while True:
...     print datetime.datetime.today(), time.time() % 600
...     time.sleep(30)
... 
2014-05-20 18:07:39.896828 459.896888018
2014-05-20 18:08:09.897290 489.897324085
2014-05-20 18:08:39.896959 519.896992922
2014-05-20 18:09:09.896386 549.896426916
2014-05-20 18:09:39.896064 579.896094084
2014-05-20 18:10:09.896433 9.89649200439
2014-05-20 18:10:39.896937 39.8969950676
2014-05-20 18:11:09.897409 69.8974680901
2014-05-20 18:11:39.897835 99.8978750706

that is, at 18:10, nothing that is older than 18:10 can survive, even if the cache was just updated the previous minute.

When i first read this issue this is what came to me in an insight and then as I was typing it out I lost it :). But I think this works?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants