:::{note} Tutorial 8. **Scaling**
:icon: false

#### Async, threading, profiling and caching


:::

In [None]:
import asyncio
import aiohttp
import time

import pandas as pd
import panel as pn

pn.extension('tabulator')

In [None]:
pn.config.authorize_callback = authorized

## Async

Let us say we have some action that is triggered by a widget, such as a button, and while we are computing the results we want to provide feedback to the user. Using imperative programming this involves writing callbacks that update the current state of our components. This is complex and really we prefer to write reactive components. This is where *generator functions* come in.

:::{important}
A *generator* function is a function that use `yield` to *return* results as they are produced during the execution. It is not allowed to `return` anything, but can use `return` to *break* the execution. For an introduction to *generator functions* check out [Real Python | Introduction to generator functions](https://realpython.com/introduction-to-python-generators/).
:::

In the example below we add a `Button` to trigger some calculation. Initially the calculation hasn't yet run, so we check the value provided by the `Button` indicating whether a calculation has been triggered and while it is `False` we `yield` some text and `return`. However, when the `Button` is clicked the function is called again with `run=True` and we kick off some calculation. As this calculation progresses we can `yield` updates and then once the calculation is successful we `yield` again with the final result:


In [None]:
run = pn.widgets.Button(name="Press to run calculation", align='center')

def runner(run):
    if not run:
        yield "Calculation did not run yet"
        return
    for i in range(101):
        time.sleep(0.01) # Some calculation
        yield pn.Column(
            f'Running ({i}/100%)', pn.indicators.Progress(value=i)
        )
    yield "Success ✅︎"

pn.Row(run, pn.bind(runner, run))

This provides a powerful mechanism for providing incrememental updates as we load some data, perform some data processing, etc.

This can also be combined with asynchronous processing, e.g. to dynamically stream in new data as it arrives:

In [None]:
import random

async def slideshow():
    index = 0
    while True:
        url = f"https://picsum.photos/800/300?image={index}"
        
        async with aiohttp.ClientSession() as session:
            async with session.get(url) as resp:
                img, _ = await asyncio.gather(resp.read(), asyncio.sleep(1))
                yield pn.pane.JPG(img)
        index = (index + 1) % 10

pn.Row(slideshow)

## Enable Automatic Threading

Using threading in Panel can either be enabled manually, e.g. by managing your own thread pool and dispatching concurrent tasks to it, or it can be managed by Panel itself by setting the `config.nthreads` parameter (or equivalently by setting it with `pn.extension(nthreads=...)`. This will start a `ThreadPoolExecutor` with the specified number of threads (or if set to `0` it will set the number of threads based on your system, i.e. `min(32, os.cpu_count() + 4)`).

Whenever an event is generated or a periodic callback fires Panel will then automatically dispatch the event to the executor. An event in this case refers to any action generated on the frontend such as the manipulation of a widget by a user or the interaction with a plot. If you are launching an application with `panel serve` you should enable this option configure this option on the CLI by setting `--num-threads`.

To demonstrate the effect of enabling threading take this example below:

```python
import panel as pn

pn.extension(nthreads=2)

def button_click(event):
    print(f'Button clicked for the {event.new}th time.')
    time.sleep(2) # Simulate long running operation
    print(f'Finished processing {event.new}th click.')

button = pn.widgets.Button(name='Click me!')

button.on_click(button_click)
```

When we click the button twice successively in a single-threaded context we will see the following output:

```
> Button clicked for the 1th time.
... 2 second wait
> Finished processing 1th click.
> Button clicked for the 2th time.
... 2 second wait
> Finished processing 2th click.
```

In a threaded context on the other hand the two clicks will be processed concurrently:

```
> Button clicked for the 1th time.
> Button clicked for the 2th time.
... 2 second wait
> Finished processing 1th click.
> Finished processing 2th click.
```


## Launch profiling

The launch profiler profiles the execution time of the initialization of a particular application. It can be enabled by setting a profiler using the commandline ``--profiler`` option to `panel serve`. Available profilers include:

- [`pyinstrument`](https://pyinstrument.readthedocs.io): A statistical profiler with nice visual output
- [`snakeviz`](https://jiffyclub.github.io/snakeviz/): SnakeViz is a browser based graphical viewer for the output of Python’s cProfile module and an alternative to using the standard library pstats module.
- [`memray`](https://bloomberg.github.io/memray/): memray is a memory profiler for Python. It can track memory allocations in Python code, in native extension modules, and in the Python interpreter itself.

Once enabled the launch profiler will profile each application separately and provide the profiler output generated by the selected profiling engine.

:::{image} ./assets/launch_profiler.png
:width: 80%
:::

## User profiling

In addition to profiling the launch step of an application it is often also important to get insight into the interactive performance of an application. For that reason Panel also provides the `pn.io.profile` decorator that can be added to any callback and will report the profiling results in the `/admin` panel. The `profile` helper takes to arguments, the name to record the profiling results under and the profiling `engine` to use.

```python
@pn.io.profile('clustering', engine='snakeviz')
def get_clustering(event):
    # some expensive calculation
    ...

widget.param.watch(my_callback, 'value')
```

:::{image} ./assets/user_profiling.png
:width: 80%
:::

The user profiling may also be used in an interactive session, e.g. we might decorate a simple callback with the `profile` decorator:

```python
import time

slider = pn.widgets.FloatSlider(name='Test')

@pn.io.profile('formatting')
def format_value(value):
    time.sleep(1)
    return f'Value: {value+1}'

pn.Row(slider, pn.bind(format_value, slider))
```

Then we can request the named profile 'formatting' using the `pn.state.get_profile` function:

```python
pn.state.get_profile('formatting')
```

## Caching

The `panel.state.cache` object is a simple dictionary that is shared between all sessions on a particular Panel server process. This makes it possible to load large datasets (or other objects you want to share) once and subsequently access the cached object.

To assign to the cache manually, simply put the data load or expensive calculation in an `if`/`else` block which checks whether the custom key is already present:

In [None]:
if 'data' in pn.state.cache:
    data = pn.state.cache['data']
else:
    pn.state.cache['data'] = data = pd.read_parquet('./windturbines.parq')

Alternatively, the `as_cached` helper function provides a slightly cleaner way to write the caching logic. Instead of writing a conditional statement you write a function that is executed only when the inputs to the function change. If provided, the `args` and `kwargs` will also be hashed making it easy to cache (or memoize) on the arguments to the function:


In [None]:
data = pn.state.as_cached('data', pd.read_parquet, path='./windturbines.parq', ttl=1200)

Now, the first time the app is loaded the data will be cached and subsequent sessions will simply look up the data in the cache, speeding up the process of rendering. If you want to warm up the cache before the first user visits the application you can also provide the `--warm` argument to the `panel serve` command, which will ensure the application is initialized as soon as it is launched. If you want to populate the cache in a separate script from your main application you may also provide the path to a setup script using the `--setup` argument to `panel serve`.

## Memoization

The `pn.cache` decorator provides an easy way to cache the outputs of a function depending on its inputs (i.e. `memoization`). If you've ever used the Python `@lru_cache` decorator you will be familiar with this concept. However the `pn.cache` functions support additional cache `policy`s apart from LRU (least-recently used), including `LFU` (least-frequently-used) and 'FIFO' (first-in-first-out). This means that if the specified number of `max_items` is reached Panel will automatically evict items from the cache based on this `policy`. Additionally items can be deleted from the cache based on a `ttl` (time-to-live) value given in seconds.

### Caching in memory

The `pn.cache` decorator can easily be combined with the different Panel APIs including `pn.bind` and `pn.depends` providing a powerful way to speed up your applications.


In [None]:
@pn.cache(max_items=10, policy='LFU')
def load_data(path):
    return ... # Load some data

Once you have decorated your function with `pn.cache` any call to `load_data` will be cached in memory until `max_items` value is reached (i.e. you have loaded 10 different `path` values). At that point the `policy` will determine which item is evicted.

The `pn.cache` decorator can easily be combined with `pn.bind` to speed up rendering of your reactive components:

In [None]:
import pandas as pd
import panel as pn

select = pn.widgets.Select(options={
    'Penguins': 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv',
    'Diamonds': 'https://raw.githucbusercontent.com/mwaskom/seaborn-data/master/diamonds.csv',
    'Titanic': 'https://raw.githubusercontent.com/mwaskom/seaborn-data/master/titanic.csv',
    'MPG': 'https://raw.githubusercontent.com/mwaskom/seaborn-data/mastser/mpg.csv'
})

@pn.cache
def fetch_data(url):
    return pd.read_csv(url)

pn.Column(select, pn.widgets.Tabulator(pn.bind(fetch_data, select), page_size=10))


### Disk caching

If you have `diskcache` installed you can also cache the results to disk by setting `to_disk=True`. The `diskcache` library will then cache the value to the supplied `cache_path` (defaulting to `./cache`). Making use of disk caching allows you to cache items even if the server is restarted.

### Clearing the cache

Once a function has been decorated with `pn.cache` you can easily clear the cache by calling `.clear()` on that function, e.g. in the example above you could call `load_data.clear()`. If you want to clear all caches you may also call `pn.state.clear_caches()`.

### Per-session caching

By default any functions decorated or wrapped with `pn.cache` will use a global cache that will be reused across multiple sessions, i.e. multiple users visiting your app will all share the same cache. If instead you want a session-local cache, that only reuses cached outputs for the duration of each visit to your application, you can set `pn.cache(..., per_session=True)`.
