## Advanced Callbacks

For the latest code revision, fork the code repo: https://github.com/mannyjrod/Dash_Apps_Sandlot

Author: Emmanuel Rodriguez

[emmanueljrodriguez.com/](https://emmanueljrodriguez.com/)

9JAN2024, Renton, Seattle, WA

To do:

1. Setup RedisCache to complete example code. Example 3 - Caching & Signaling

### Sharing Data Between Callbacks

Ref: https://dash.plotly.com/sharing-data-between-callbacks

#### CAUTION: Global Variables Will Break Your App, Use *Local* Variables Instead

Dash apps are meant to be accessed by multiple users from their differing respective locations.

As such, Dash is designed to work in multi-user environments where multiple people view the application at the same time and have **independent sessions**.

Therefore, if the app is coded to where a global variable is modified, then one user's session could set the variable to some value which would affect the next user's session.

##### Example: Global Variable Modified

In [1]:
# An example where a global variable is modified via the callback function:
import pandas as pd
from dash import Dash, dcc, html, Input, Output, callback

# Data
df = pd.DataFrame({
    'student_id': range(1, 11),
    'score': [1, 5, 2, 5, 2, 3, 1, 5, 1, 5]
})

# Initialize the app
app = Dash(__name__)

# App layout
app.layout = html.Div([
    dcc.Dropdown(list(range(1, 6)), 1, id='score'),
    'was scored by this many students:',
    html.Div(id='output'),
])

# Callback Controls to build the interaction
# Recall: A Dash callback has two parts: 

# 1. Callback decorator - identifies the relevant components, defined in the layout section.
@callback(Output('output', 'children'), Input('score', 'value'))

# 2. Callback function - defines how those Dash components interact
def update_output(value):
    global df
    df = df[df['score'] == value] # The df's 'score' row is sliced into, by setting the value of the 'student_id' as the input value. The resulting df is a df with an array of student scores that are equal to the input value; i.e., the same scores, with the number of times it shows in the original df.
    return len(df) # The length of the df gives the number of students that scored this particular score.

# Run the app
if __name__ == '__main__':
    app.run_server(debug=True, use_reloader=False)

Dash is running on http://127.0.0.1:8050/

 * Serving Flask app "__main__" (lazy loading)
 * Environment: production
   Use a production WSGI server instead.
 * Debug mode: on


The callback returns the correct output the very first time it is called, but once the global `df` variable is modified on the first iteration, any subsequent callback that uses that data (the DataFrame `df`) is not using the original data anymore, rather it is using the amended `df` variable.

##### Example: Local Variable Modified

In [2]:
# An example where a local variable is modified, by reassigning the filtered dataframe to a new variable inside the callback

# Data
df = pd.DataFrame({
    'student_id': range(1,11),
    'score': [1, 5, 2, 5, 2, 3, 1, 5, 1, 5]
})

# Initialize the app
app = Dash(__name__)

# App layout
app.layout = html.Div([
    dcc.Dropdown(list(range(1, 6)), 1, id='score'),
    'was scored by this many students:',
    html.Div(id='output'),
])

# Callback Controls

# 1. Callback decorator
@callback(Output('output', 'children'), Input('score', 'value'))

# 2. Callback function
def update_output(value):
    filtered_df = df[df['score'] == value] # This creates a new *local* variable, which will be amended everytime the callback is called based off the original input data.
    return len(filtered_df) # The output is the length of the dataframe, the local variable, which translates to the number of times students scored this particular score value.

# Run the app
if __name__ == '__main__':
    app.run_server(debug=True, use_reloader=False)

Dash is running on http://127.0.0.1:8050/

Dash is running on http://127.0.0.1:8050/

 * Serving Flask app "__main__" (lazy loading)
 * Environment: production
   Use a production WSGI server instead.
 * Debug mode: on


The app now uses a local variable, by re-assigning the filtered dataframe to a new variable inside the callback.

### Storing Shared Data

To share data across multiple processes or servers, data should be stored somewhere that is accessible to each of the processes.

Three main places to store data:
* In the user's browser session, using `dcc.Store`
* On the disk, such as a file or database
* In **server-side** memory (RAM) shared across processes and servers such as a Redis database.

#### Example 1: Storing Data in the Browser with `dcc.Store`

If processing a dataset takes a long time and different outputs use this dataset, `dcc.Store` can be used to store the processed data as an *intermediate value* that can then be used as an input in multiple callbacks to generate different outputs.

* The processed data is converted to a string like JSON or base64 encoded binary data for storage; i.e., this data is *cached*, and will therefore only be available in the user's current session.

In [6]:
# Import packages
from dash import Dash, html, dcc, Output, Input, callback, dash_table
import pandas as pd
import plotly.express as px

# Define stylesheets
external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

# Initialize app
app = Dash(__name__, external_stylesheets = external_stylesheets)

# Create layout
app.layout = html.Div([
    html.H1('Sharing Data Between Callbacks', style={'textAlign':'center'}),
    html.Div([
        dcc.Dropdown(id='data-set-chosen', multi=False, value='gapminder',
                    options=[{'label':'Country Data', 'value':'gapminder'},
                            {'label':'Restaurant Tips', 'value':'tips'},
                            {'label':'Flowers', 'value':'iris'}])
    ], className='row', style={'width':'50%'}),
    
    html.Div([
        html.Div(id='graph-placeholder', children=[], className='six columns'),
    ], className='row'),
    
    html.Div([
        html.Div(id='table-placeholder', children=[])
    ], className='row'),
    
    # Store the data inside the user's current browser session
    dcc.Store(id='store-data', data=[], storage_type='memory') # Local
])

# Callbacks and controls

# Callback decorator #1 - Input: Choose dataset, Output: Stored dataset
@callback(
    Output('store-data', 'data'),
    Input('data-set-chosen','value')
)
# Callback function #1 - Define how the inputs affect the outputs
# Store dataset based on the user-selected dataset.
def store_data(value):
    # 'Dummy' large dataset with a multitude of rows
    if value == 'gapminder':
        dataset = px.data.gapminder()
    elif value == 'tips':
        dataset = px.data.tips()
    elif value == 'iris':
        dataset = px.data.iris()
    return dataset.to_dict('records')

# Callback decorator #2 - Create graph based on the user-selected dataset.
@callback(
    Output('graph-placeholder', 'children'),
    Input('store-data', 'data')
)
# Callback function #2
def create_graph1(data):
    print(type(data))
    dff = pd.DataFrame(data)
    print(dff.head()) # Print the first 5 rows of the selected dataset; this is updated at every function call 
    # (when the dataset selected is changed)
    print(type(dff))
    
    # If the string 'country' is found in the dataframe, then the 'Country Data' dataset has been chosen
    # Create line graph
    if 'country' in dff.columns:
        fig1 = px.line(dff, x='year', y='lifeExp', color='continent')
        return dcc.Graph(figure=fig1)
    
    # If the string 'total_bill' is found in the dataframe, then the 'Restaurant Tips' dataset has been chosen
    # Create bar graph
    elif 'total_bill' in dff.columns:
        fig2 = px.bar(dff, x='day', y='tip', color='sex')
        return dcc.Graph(figure=fig2)
    
    # If the string 'sepal_length' is found in the dataframe, then the 'Flowers' dataset has been chosen
    # Create a scatter graph
    elif 'sepal_length' in dff.columns:
        fig3 = px.scatter(dff, x='sepal_width', y='petal_width', color='species')
        return dcc.Graph(figure=fig3)
    
# Callback #3 - Create a table for the associated dataset
@callback(
    Output('table-placeholder', 'children'),
    Input('store-data', 'data')
)
# Callback function
def create_table1(data):
    dff = pd.DataFrame(data)
    
    my_table = dash_table.DataTable(
        columns=[{"name":i, "id":i} for i in dff.columns],
        data=dff.to_dict('records')
    )
    return my_table

# Run the app
if __name__ == '__main__':
    app.run_server(debug=True, use_reloader=False)

Dash is running on http://127.0.0.1:8050/

Dash is running on http://127.0.0.1:8050/

Dash is running on http://127.0.0.1:8050/

Dash is running on http://127.0.0.1:8050/

Dash is running on http://127.0.0.1:8050/

Dash is running on http://127.0.0.1:8050/

 * Serving Flask app "__main__" (lazy loading)
 * Environment: production
   Use a production WSGI server instead.
 * Debug mode: on
<class 'list'>
       country continent  year  lifeExp       pop   gdpPercap iso_alpha  \
0  Afghanistan      Asia  1952   28.801   8425333  779.445314       AFG   
1  Afghanistan      Asia  1957   30.332   9240934  820.853030       AFG   
2  Afghanistan      Asia  1962   31.997  10267083  853.100710       AFG   
3  Afghanistan      Asia  1967   34.020  11537966  836.197138       AFG   
4  Afghanistan      Asia  1972   36.088  13079460  739.981106       AFG   

   iso_num  
0        4  
1        4  
2        4  
3        4  
4        4  
<class 'pandas.core.frame.DataFrame'>
<class 'list'>
   total_bil

Note: The data is serialized into a JSON string before being placed into storage. Also note, how the processed data gets stored in `dcc.Store` by assigning the data as its output, and then the same data gets used by multiple callbacks by using the same `dcc.Store` as an input.

#### Example 3 - Caching and Signaling

This example uses Redis via Flask-Cache for storing "global variables" on the server-side in a database. This data is accessed through a function (`global_store()`), the output of which is cached and keyed by its input arguments.

In [14]:
# Caching and Signaling
# Cache (computing) - a cache is a software component that stores data so that future requests for that data can be 
# served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere.

# Import libraries
import os, copy, time

from dash import Dash, dcc, html, Input, Output, callback

import numpy as np
import pandas as pd
from flask_caching import Cache # Flask-Caching is an extension to Flask that adds caching support for various backends to 
# any Flask application.

# Set style sheets
external_stylesheets = [
    # Dash CSS
    'https://codepen.io/chriddyp/pen/bWLwgP.css',
    # Loading screen CSS
    'https://codepen.io/chriddyp/pen/brPBPO.css']

# Initialize the app
app = Dash(__name__, external_stylesheets = external_stylesheets)
server = app.server

# Configure Flask-Caching
# Notes on the configuration objects used:
# 'CACHE_TYPE' - specifies which type of caching object to use. This string will be imported and instantiated.
# 'redis' or 'RedisCache' is a built-in cache backend.
# Note that "backend" refers to the parts of the code that allow the app to operate but that cannot be accessed by a user.

CACHE_CONFIG = {
    # try 'FileSystemCache' if you don't want to setup RedisCache
    #'CACHE_TYPE':'FileSystemCache'
    #'CACHE_DIR':r"<my cache directory>"
    'CACHE_TYPE': 'RedisCache',
    'CACHE_REDIS_URL': os.environ.get('REDIS_URL','redis://localhost:6379')
}
cache = Cache() # Instantiate Cache using the 'Cache()' class
cache.init_app(app.server, config=CACHE_CONFIG) # Setup Cache instance using the 'init_app' method 

# Data
N = 100 # Variable

df = pd.DataFrame({
    'category': (
        (['apples']*5*N) +
        (['oranges']*10*N) +
        (['figs']*20*N) +
        (['pineapples']*15*N)
    )
})

df['x'] = np.random.randn(len(df['category']))
df['y'] = np.random.randn(len(df['category']))

# App layout

app.layout = html.Div([
    dcc.Dropdown(df['category'].unique(), 'apples', id='dropdown'),
    html.Div([
        html.Div(dcc.Graph(id='graph-1'), className="six columns"),
        html.Div(dcc.Graph(id='graph-2'), className="six columns"),
    ], className="row"),
    html.Div([
        html.Div(dcc.Graph(id='graph-3'), className="six columns"),
        html.Div(dcc.Graph(id='graph-4'), className="six columns"),
    ], className="row"),
    
    # signal value to trigger callbacks
    dcc.Store(id='signal')
])

# Perform expensive computations in this "global store," these computations are cached
# in a *globally* available redis memory store which is available across processes and 
# for all time.

@cache.memoize() # Call the 'memoize()' function
# The theory behind memoization is that if a function needs to be called several times in one request, it would
# only be calculated the *first time* that function is called with those arguments, so as to avoid from hitting
# the database every time this information is needed / the function is called.

def global_store(value):
    # Simulate expensive query
    print(f'Computing value with {value}')
    time.sleep(3) # A 3 second delay simulates an expensive process.
    return df[df['category'] == value]

def generate_figure(value, figure):
    fig = copy.deecopy(figure)
    filtered_dataframe = global_store(value)
    fig['data'][0]['x'] = filtered_dataframe['x']
    fig['data'][0]['y'] = filtered_dataframe['y']
    fig['layout'] = {'margin': {'l':20, 'r':10, 'b':20, 't':10}}
    return fig

# Callbacks - controls to control the interactions
@callback(Output('signal', 'data'), Input('dropdown', 'value'))
def compute_value(value):
    # Compute value and send a signal when done.
    global_store(value)
    return value

# When the computation is complete, the signal is sent and the following four callbacks are executed in parallel
# to render the graphs.
# Each of these callbacks retrieves the data from the "global server-side store": the Redis cache.

@callback(Output('graph-1', 'figure'), Input('signal', 'data'))
def update_graph_1(value):
    # generate_figure gets data from `global_store`.
    # The data in `global_store` has already been computed by the 
    # `compute_value` callback and the result is stored in the
    # *global* redis cached.
    return generate_figure(value, {
        'data': [{
            'type': 'scatter',
            'mode': 'markers',
            'marker': {
                'opacity': 0.5,
                'size': 14,
                'line': {'border': 'thin darkgrey solid'}
            }
        }]
    })

@callback(Output('graph-2', 'figure'), Input('signal', 'data'))
def update_graph_2(value):
    return generate_figure(value, {
        'data': [{
            'type': 'scatter',
            'mode': 'lines',
            'line': {'shape': 'spline', 'width': 0.5},
        }]
    })

@callback(Output('graph-3', 'figure'), Input('signal', 'data'))
def update_graph_3(value):
    return generate_figure(value, {
        'data': [{
            'type': 'histogram2d',
        }]
    })

@callback(Output('graph-4', 'figure'), Input('signal', 'data'))
def update_graph_4(value):
    return generate_figure(value, {
        'data': [{
            'type': 'histogram2dcontour',
        }]
    })

# Run the app
if __name__ == '__main__':
    app.run_server(debug=True, use_reloader=False, processes=6, threaded=False)

    # Set `processes=6` so that multiple callbacks can be executed in parallel. 
    # Because we are running the server with multiple processes, set `threaded=False`; a Flask
    # server can't be both multi-process and multi-threaded.

RuntimeError: no redis module found

In [12]:
help(Cache())

NameError: name 'Cache' is not defined