Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reactive expressions / blocks #49

Closed
kmader opened this issue Jun 27, 2017 · 8 comments
Closed

Add reactive expressions / blocks #49

kmader opened this issue Jun 27, 2017 · 8 comments

Comments

@kmader
Copy link

kmader commented Jun 27, 2017

One of the best features in shiny is the ability to make reactive expression tying together a number of different tasks so that multiple outputs could depend on the same intermediate step. This is particularly useful for example below when having a table and a plot depend on the same input which is being filtered/transformed using some of the reactive components. It also allows many of the longer expressions to be rewritten more modularly with fewer inputs

clean_data <- reactive({
       raw_df %>% subset(age>input$min_age) %>% subset(count<input$max_count)
})
output$result_table<-renderDataTable({clean_data()})
output$result_plot<-renderPlot({ggplot(clean_data(), aes(x = age, y = count))+geom_jitter()})

Clearly in dash this is a bit trickier, but presumably something like a new type of Dependency

@app.callback(
    dash.dependencies.ReactiveExpression('clean_data'),
    [dash.dependencies.Input('crossfilter-xaxis-column', 'value')])
def clean_data(xaxis_val):
    return raw_df.query('xaxis=={}'.format(xaxis_val))

@app.callback(
    dash.dependencies.Output('crossfilter-indicator-scatter', 'figure'),
    [dash.dependencies.ReactiveExpression('clean_data')])
def show_plot(in_data):
   return dict(data = [], layout = go.Layout()
@chriddyp
Copy link
Member

chriddyp commented Jun 27, 2017

October 12th, 2017 Edit: This topic is now it's own part of the user guide. Please see http://plot.ly/dash/sharing-data-between-callbacks


Ah, that's interesting. Thanks for sharing! I'm not sure if these arguments apply to Dash, but I'm curious to learn more. Some notes:

  • Note that chained dependencies / intermediate inputs / intermediate steps are already supported (see the "Multiple Outputs" section in the user guide here: https://plot.ly/dash/getting-started-part-2)
  • For the sake of code modularity, you can just use regular functions like this:
global_df = pd.read_csv('...')
app.layout = html.Div([
    dcc.Graph(id='graph'), 
    html.Table(id='table'),
    dcc.Dropdown(id='dropdown')
])

def clean_data(df, value):
     # some expensive clean data step
     [...]
     return cleaned_df

@app.callback(Output('graph', 'figure'), [Input('dropdown', 'value'])
def update_graph(value):
    dff = clean_data(global_df, value)
    figure = create_figure(dff) 
    return figure

@app.callback(Output('table', 'figure'), [Input('dropdown', 'value'])
def update_table(value):
    dff = clean_data(global_df, value)
    table = create_table(dff) 
    return table
  • In this case, we're performing the clean_data step twice when the dropdown changes. In the case of something like a shared reactive expression, this could potentially only be done once. However, in Dash, all of these callbacks are executed in parallel on the server so you wouldn't end up being faster (as long as you aren't request bound).
  • If performance was really an issue, then you could add caching around clean_data, so that long expensive computations are only performed once (see https://plot.ly/dash/performance for more details)
  • If we did something like intermediate expressions, we'd have to send the intermediate data back to the client (the browser), which would incur a network delay cost
  • You can sort of already do this by just serializing your data as a string and displaying it in a hidden div. Again, this will incur a network delay cost, so this solution might not be any faster than just performing the calculation (albeit twice) in 2 different callbacks (which will be executed in parallel) and/or caching the intermediate values

Here's how you would do this in a hidden div:

global_df = pd.read_csv('...')
app.layout = html.Div([
    dcc.Graph(id='graph'), 
    html.Table(id='table'),
    dcc.Dropdown(id='dropdown'),
    html.Div(id='intermediate-value', style={'display': 'none'})
])

@app.callback(Output('intermediate-value', 'children'), [Input('dropdown', 'value')])
def clean_data(value):
     # some expensive clean data step
     cleaned_df = your_expensive_clean_or_compute_step(value)
     return cleaned_df.to_json() # or, more generally, json.dumps(cleaned_df)

@app.callback(Output('graph', 'figure'), [Input('intermediate-value', 'children'])
def update_graph(jsonified_cleaned_data):
    dff = pd.read_json(jsonified_cleaned_data) # or, more generally json.loads(jsonified_cleaned_data)
    figure = create_figure(dff) 
    return figure

@app.callback(Output('table', 'children'), [Input('intermediate-value', 'children'])
def update_table(jsonified_cleaned_data):
    dff = pd.read_json(jsonified_cleaned_data) # or, more generally json.loads(jsonified_cleaned_data)
    table = create_table(dff) 
    return table

Finally, note that when you run just app.run_server() only a single process is running which means that only one request can be made at a time. If you run the app with gunicorn or, for development purposes, just add app.run_server(processes=4), then multiple requests can happen at the same time. This means that callbacks will be executed in parallel (reducing the time cost of shared values).

@kmader
Copy link
Author

kmader commented Jun 28, 2017

Thanks the Output('intermediate-value', 'children') seems to be the closest match. The additional step of de/serialization is a bit clumsy but maybe a good start. The primary issue I was having in a current use case is that I had quite repetitive code with the same input arguments being copy and pasted across multiple callbacks.

On the performance side, the additional benefit of having a 'ReactiveExpressions' is the caching could be invisibly globally handled by Dash rather than having it on a function-by-function basis (what Shiny does). I'll need to study the code a bit better to see if there is anything else that might work.

@kmader kmader changed the title Add reactive blocks Add reactive expressions / blocks Jun 28, 2017
@chriddyp
Copy link
Member

For future readers, I have written up some solutions to this problem in a new section of the user guide: http://plot.ly/dash/sharing-data-between-callbacks

@gaw89
Copy link

gaw89 commented Nov 14, 2017

@chriddyp and @kmader, the hidden Div appears to be a bad idea in practice. I just spent the better (or worse) part of 2 days trying to figure out why my app was working flawlessly when run on my desktop but stumbled when pushed to the server (RHEL 7.1). As it turns out, the issue had to do with my use of a hidden Div to pass data between callbacks. When the DataFrame reached a certain number of rows (861), it would fail to execute the callbacks that depended on those Divs. I am guessing this has something to do with a size limit on Divs in HTML, but I am not certain.

I will try to post a reproducible example here in the next couple of days.

So far, Dash has been fantastic! But this has been a massive frustration. Live and learn...

@chriddyp
Copy link
Member

As it turns out, the issue had to do with my use of a hidden Div to pass data between callbacks. When the DataFrame reached a certain number of rows (861), it would fail to execute the callbacks that depended on those Divs. I am guessing this has something to do with a size limit on Divs in HTML, but I am not certain.

I will try to post a reproducible example here in the next couple of days.

Please try to recreate a reproducible example. I have used this method with 5MB of data successfully before, I'm not aware of any inherent limitations. Another issue could be a request or response size limitation on the server that you are deploying it on (frequently the default is like 1MB).

@gaw89
Copy link

gaw89 commented Nov 14, 2017

I'll try to recreate an example. It is probably the request/response size limitation as you indicate. I'm a newb web-dev, hence why Dash has been fantastic for me!

I'm also checking with my server admin to see if they're imposing some kind of size limit.

Thanks for the speedy response.

@chriddyp
Copy link
Member

chriddyp commented Jun 8, 2018

Closing this. We have to do things differently than other frameworks because we support multiple-processes. We have documented several ways to pass intermediate data around in https://dash.plot.ly/sharing-state-between-callbacks and this will only get better with declarative client-side transformations (#266 ) and support for multiple outputs (#80 )

@berndtlindner
Copy link

berndtlindner commented Aug 28, 2019

As someone who works a lot in Shiny, and am trying out dash, really expected this equivalency in dash in some form or the other. This is kind of a game changer for me.
I disagree that dash overcomes this with multi-processing, specifically the statement.

In this case, we're performing the clean_data step twice when the dropdown changes. In the case of something like a shared reactive expression, this could potentially only be done once. However, in Dash, all of these callbacks are executed in parallel on the server so you wouldn't end up being faster (as long as you aren't request bound).

What if the function (e.g. clean_data) is 1) performed more than 4 (number of core/processors available) times and 2) what if it is a long running and/or memory/computationally expensive algorithm?

ORN-git pushed a commit to WOIDMO/Disease-models-V1 that referenced this issue Aug 13, 2020
…r the figures.

Tactic to transfer long Monte Carlo run between callbacks is following
plotly/dash#49 (comment)
Which is using Hidden div inside the app that stores the intermediate value (using JSON)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants