Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Caching and Memoization easy and powerful #1179

Closed
lrq3000 opened this issue Mar 22, 2020 · 8 comments
Closed

Make Caching and Memoization easy and powerful #1179

lrq3000 opened this issue Mar 22, 2020 · 8 comments
Labels
type: discussion Requiring community discussion type: enhancement Minor feature or improvement to an existing feature

Comments

@lrq3000
Copy link

lrq3000 commented Mar 22, 2020

Panel is a wonderful piece of software, I love it, thanks a lot for making it!

However there is one thing that bugs me out, and it's the lack of caching and memoization. A simple, but I think very common case, is to cache the fetching of an external csv file. So far it seems that everytime a user reloads the page, the whole script is re-executed and the csv file must be redownloaded, which makes the loading quite slow (the user gets a blank page for about a minute, because I have multiple such csv files).

I know streamlit has support for this case, and also hvplot through datashader, but it's a quite complicated case and not as streamlined. Also, it doesn't allow as far as I understand to serve multiple different users with the same cache constructed by the first user, which would be ideal in terms of loading time.

Is there a caching and if possible memoization feature, maybe undocumented, in Panel? If not, is such a feature planned, or is there a known easy workaround (eg, by using another module)?

Thanks a lot in advance,
Best regards,
Stephen

@philippjfr philippjfr added type: discussion Requiring community discussion type: enhancement Minor feature or improvement to an existing feature labels Mar 22, 2020
@philippjfr
Copy link
Member

The initial plans for param.depends and pn.depends based APIs was always that we would eventually add a memoization decorator so this is most certainly in scope. That being said you make a good point that in many cases it would be desirable to memoize across user sessions. In the moment this can be achieved by manually reading and writing to pn.state.cache but if we want to provide some inbuilt form of memoization we will have to tinker a bit because at least with panel serve the script (or notebook) you are serving is re-executed for each user and simple memoization will therefore not work. I definitely think this is a very important feature though and I'd also like to write a whole user guide about performance tips once this is in place.

@lrq3000
Copy link
Author

lrq3000 commented Mar 23, 2020

@philippjfr Thank you very much for your fast reply! Those plans sound great, I would love to see Panel support caching and memoization!

Meanwhile, I tried to workaround by using cachier or joblib's Memory, and although both work correctly to reuse the cache between Jupyter Notebook runs (after restarting the kernel), they don't with Panel launched with a Bokeh server, as each cache gets a unique handler instead of reusing the same one (eg: bk_script_1489.my_function() where bk_script_1489. is the part added by Bokeh).

Do you have any idea how I may work around this (forcing all caching requests to use the same cache without bokeh's unique id prepending) by any chance?

@lrq3000
Copy link
Author

lrq3000 commented Mar 23, 2020

Update: if anyone needs a workaround in the meantime, I have found that simple_cache and cache (but the latter does not have a licence) work well with pandas dataframes and Panel/Bokeh standalone server. I implemented my solution with simple_cache, and it works well to reuse the same cache across Boker server sessions/users (so different users will reuse the same cache, that's nice!).

@Nithanaroy
Copy link

Yes! A pattern which automatically caches the source data frame caching would really help. For now was able to achieve this manually using pn.state.cache.

class MyExplorer(param.Parameterized):

    def __init__(self, **kwargs):
        self.df = pn.state.cache["data"] if "data" in pn.state.cache else load_input()
        pn.state.cache["data"] = self.df

    @param.depends("...")
    def make_view():
        plot_df = transform(self.df)
        return hv.Curve...

explorer = MyExplorer(name="")
dashboard = pn.Column(explorer.param, explorer.make_view)

@lrq3000
Copy link
Author

lrq3000 commented Mar 28, 2020

Oh thank you for the example, very helpful! I think your code snippet can be easily converted to a simple caching function decorator, which would be better than my current solution because then no other dependency would be needed :-)

@MarcSkovMadsen MarcSkovMadsen changed the title Memoization and caching? Make Memoization and Caching easy and powerful May 28, 2021
@MarcSkovMadsen
Copy link
Collaborator

MarcSkovMadsen commented May 28, 2021

Background

I have now for a time been using DiskCache for memoization and caching. It is so easy and powerful to use. It persists your data to disk, i.e. speeds up your development process because your app/ server reloads so much faster. And the experience for users is so great.

Requirements

My requirements for an easier to use/ more powerful caching would be

  1. Provides easy way to memoize
  2. The arguments to memoize are like those for functools.lru_cache and diskcache.Cache.set.
  3. It provides way to set expiration.
  4. It provides way to cache globally across sessions
  5. It provides way to cache per session
  6. It provides way to during development/ debugging cache on a hash of the function code. (I.e. cleared when changing code of function). This truly speeds up your development process
  7. Provides way to persist cache easily and without configuration of external services
  8. It's highly performant
  9. The caching is pluggable and can be extended with/ integrated caching packages/ functions for redis etc. I.e. an initial configuration only extend the caching functionality. And no other changes required.
  10. Can cache most used used data apps objects like Pandas DataFrames, Machine Learning Models, DL Models and HoloViews objects. (Streamlit have been struggling and had lot of bug reports)

Solution

I would suggest building it into pn.bind, pn.depends and param.depends.

Api

pn.bind(my_func, input_value=input_widget, cache=True)
pn.bind(my_func, input_value=input_widget, cache=True, cache_options={"expire": 60}) # expires every 60s
pn.bind(my_func, input_value=input_widget, cache=True, cache_options={"caches": ["panel", "diskcache"]})

@pn.depends(input_value=input_widget, cache=True)
def my_func(input_value):
    ...

@MarcSkovMadsen MarcSkovMadsen changed the title Make Memoization and Caching easy and powerful Make Caching and Memoization easy and powerful May 28, 2021
@philippjfr
Copy link
Member

Thanks for that proposal @MarcSkovMadsen. I strongly agree with that and in fact when we first designed param.depends it was always planned that we would eventually build support for memoization into it.

@philippjfr philippjfr added this to the v0.12.0 milestone May 28, 2021
@philippjfr philippjfr modified the milestones: v0.12.0, next Jun 29, 2021
@philippjfr philippjfr modified the milestones: next, 0.13.0 Aug 12, 2021
@philippjfr philippjfr modified the milestones: v0.13.0, next Apr 4, 2022
@philippjfr philippjfr modified the milestones: next, Version 0.14.0 Aug 13, 2022
@philippjfr
Copy link
Member

As of #2411 we now have a panel.cache function which allows caching the return values of functions with options for different eviction policies (least-recently-used, leaf-frequently-used, last-in-first-out), time-to-live and disk caching.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: discussion Requiring community discussion type: enhancement Minor feature or improvement to an existing feature
Projects
None yet
Development

No branches or pull requests

4 participants