Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to reuse a cache #269

Open
basnijholt opened this issue Sep 3, 2020 · 3 comments
Open

How to reuse a cache #269

basnijholt opened this issue Sep 3, 2020 · 3 comments

Comments

@basnijholt
Copy link
Contributor

basnijholt commented Sep 3, 2020

When using memoization (not with functools.lru_cache because #268) I am unable to get loky to use the cache.

I guess this is because ex.submit(f, ...) repickles f each time. Is it possible to tell loky to not do that?

In this example below, I show that a concurrent.futures.ProcessPoolExecutor uses the cache, while loky doesn't do this.

from concurrent.futures import ProcessPoolExecutor
import time
import loky


def memoize(f):
    memo = {}

    def helper(x):
        if x not in memo:
            memo[x] = f(x)
        return memo[x]

    return helper


@memoize
def g(x):
    time.sleep(5)


def f(x):
    g(1)
    return x


with loky.reusable_executor.get_reusable_executor(max_workers=1) as ex:
    t = time.time()
    ex.submit(f, 10).result()
    print(time.time() - t)
    t = time.time()
    ex.submit(f, 10).result()
    print(time.time() - t)

# prints
# 5.490137338638306
# 5.018247604370117 <---- cache isn't reused



with ProcessPoolExecutor(max_workers=1) as ex:
    t = time.time()
    (ex.submit(f, 10).result())
    print(time.time() - t)
    t = time.time()
    (ex.submit(f, 10).result())
    print(time.time() - t)

# prints
# 5.012995958328247
# 0.002056598663330078 <---- used the cache (because it forked the process and doesn't need to repickle)
@ogrisel
Copy link
Collaborator

ogrisel commented Sep 24, 2020

Instead of using a local dict to store the cache entries you should use a module attribute. module attributes (apart from those defined in the __main__ module) are pickled by reference instead of by value, so that should work. Each worker process would have it's own cache.

@ogrisel
Copy link
Collaborator

ogrisel commented Sep 24, 2020

This issue made me think about improving the cloudpickle pull request: cloudpipe/cloudpickle#309 (comment) . It might be possible to implement re-usable lru_cache for interactively defined functions but this is not trivial work.

@basnijholt
Copy link
Contributor Author

basnijholt commented Sep 25, 2020

It would be great to make lru_cache work.

For now, I have fixed it by making a cache that is shared in memory: docs, source.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants