Skip to content
forked from dask/cachey

Caching based on computation time and storage space

License

Notifications You must be signed in to change notification settings

hunterjackson/cachey

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Caching for Analytic Computations

Humans repeat stuff. Caching helps.

Normal caching policies like LRU aren't well suited for analytic computations where both the cost of recomputation and the cost of storge routinely vary by one milllion or more. Consider the following computations

# Want this
np.std(x)        # tiny result, costly to recompute

# Don't want this
np.transpose(x)  # huge result, cheap to recompute

Cachey tries to hold on to values that have the following characteristics

  1. Expensive to recompute (in seconds)
  2. Cheap to store (in bytes)
  3. Frequently used
  4. Recenty used

It accomplishes this by adding the following to each items score on each access

score += compute_time / num_bytes * (1 + eps) ** tick_time

For some small value of epsilon (which determines the memory halflife.) This has units of inverse bandwidth, has exponential decay of old results and roughly linear amplification of repeated results.

Example

>>> from cachey import Cache
>>> c = Cache(1e9, 1)  # 1 GB, cut off anything with cost 1 or less

>>> c.put('x', 'some value', cost=3)
>>> c.put('y', 'other value', cost=2)

>>> c.get('x')
'some value'

This also has a memoize method

>>> memo_f = c.memoize(f)

Status

Cachey is new and not robust.

About

Caching based on computation time and storage space

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%