Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Callback Graph #1179

Merged
merged 62 commits into from
Sep 3, 2020
Merged

Improve Callback Graph #1179

merged 62 commits into from
Sep 3, 2020

Conversation

jjaraalm
Copy link
Contributor

@jjaraalm jjaraalm commented Apr 4, 2020

Attempt to improve and significantly expand the features available in the callback graph to improve its usefulness in debugging and testing. Will try to cover the ideas given in #1176. Comments/suggestions would be welcome, especially for styling.

Current Status

captured (1)

captured (2)

Changes

  • Replaced viz.js with cytoscape.js via react-cytoscape. This results in some minor visual and layout changes.
  • CallbackGraphContainer now subscribes to layout and paths so it can introspect the current state.
  • Added new reducers to compute profiling information and report all changed props (not just those that are inputs).

Issues

  • State layout is finicky. dagre was giving horrible results even when cycles were pruned away. Using breadthfirst for now, but layout properties may need to be tailored to viewport size instead of being fixed.
  • Outputs that return no_update are still being flashed during execution highlighting. No way to detect them right now.

Contributor Checklist

  • I have broken down my PR scope into the following TODO tasks
    • Zoom in/out functionality for large/complex callback chains
    • Callback graph accessible from a dedicated url.
    • Distinguish between clientside and serverside callbacks using different node colors.
    • Add state information to the callback graph
    • Report profiling information
      • Number of times a callback was invoked per node
      • Time spent (total or average) in the callback. Maybe link to the node color?
      • Label each edge with the number of times that input triggered a callback
    • Live introspection
      • Components, properties, and callback inputs/outputs are inspect-able in real time.
      • Component properties are modifiable.
    • Live debugging
      • Highlight currently executing callbacks and modified outputs
      • Probe the last input/output by hovering/focusing an edge
      • Manually emit a property changed event to trigger callback paths
  • I have run the tests locally and they passed. (refer to testing section in contributing)
  • I have added tests, or extended existing tests, to cover any new features or bugs fixed in this PR

optionals

  • I have added entry in the CHANGELOG.md
  • If this PR needs a follow-up in dash docs, community thread, I have mentioned the relevant URLS as follow
    • this github #PR number updates the dash docs
    • here is the show and tell thread in plotly dash community

@chriddyp
Copy link
Member

chriddyp commented Apr 4, 2020

Love where this is going @jjaraalm !

@nicolaskruchten
Copy link
Member

Amazing work! I’m happy to see GraphVis go in favour of a more modern alternative that enables rich interactions :) thanks for doing this!

@jjaraalm
Copy link
Contributor Author

jjaraalm commented Apr 5, 2020

@chriddyp @nicolaskruchten Thanks! It's a fun project, and I think it could be useful too!

@chriddyp
Copy link
Member

chriddyp commented Apr 6, 2020

That profiling screenshot is so good!

It would be cool if we could time the callback execution on the backend and pass that time delta up with the _update-component API call. Then, we could distinguish between "total time", "time processing backend", "time over network", "time to render".Then, folks could see if simple callbacks on the backend are causing a delay because they're sending a tremendous amount of data over the wire (which would be a great candidate for a clientside callback!) vs a complex backend task that sends a simple payload. I think we'd have to expand the _update-component API response to include extra meta information beyond just the component props to do this.

@jjaraalm
Copy link
Contributor Author

jjaraalm commented Apr 9, 2020

Agreed, a better breakdown of the various timings would be helpful. Especially since right now, I'm only looking at the delay between request and response. Even simple callbacks (repeatedly clicking a button) can look expensive if they happen often and the queue gets backed up. This shouldn't be too hard to do and I'll look into it. Glad you all are interested!

@chriddyp chriddyp self-requested a review April 9, 2020 23:47
@chriddyp
Copy link
Member

chriddyp commented Apr 9, 2020

Awesome stuff @jjaraalm ! Feel free to ping me if you have any questions 👍

@chriddyp chriddyp self-assigned this Apr 9, 2020
@chriddyp
Copy link
Member

chriddyp commented Apr 9, 2020

(FYI I am assigning myself just to denote to the rest of our team that I'll be your ambassador for this pull request)

@jjaraalm
Copy link
Contributor Author

@chriddyp sounds good.

I've added support for reporting server-side timings using Server-Timing headers. This way (in modern browsers) we get timing reported both in dev-tools on a per-request basis and through introspection as an aggregate total.

Users can tag and report timings for resources using a new API like:

@app.callback(Output('slow-store', 'data'), [Input('style-output', 'style'), Input('out', 'children')])
def combine_data(style, value):

    time.sleep(0.1)
    dash.callback_context.record_timing('task_1', 0.1, 'The first task.')

    time.sleep(0.7)
    dash.callback_context.record_timing('task_2', 0.7)

    time.sleep(0.2)
    dash.callback_context.record_timing('task_3', 0.2, 'Cleanup task.')
    
    return dict(style=style, value=value)

which translates into the HTTP headers:

Server-Timing: dash_total;dur=1002
Server-Timing: task_1;desc="The first task.";dur=100
Server-Timing: task_2;dur=700
Server-Timing: task_3;desc="Cleanup task.";dur=200

The dash_total resource is automatically computed on all requests.

Distinguishing between "queue time" and "network time" seems difficult and I'm not quite sure how to do it. The difference between totalTime - totalComputeTime should be the combination of both though. I'll take a look into rendering time next.

devtools-timing
resource_introspection

@chriddyp
Copy link
Member

I've added support for reporting server-side timings using Server-Timing headers.

Brilliant idea! I was not aware of these headers before now, this is very nice.

Re dash_total: this is very nice! Let's make this configurable via an argument like dev_tools_clientside_profiling so that it is only included if debug=True and then turned off when deployed with e.g. gunicorn. This is for security reasons as it's unadvised to share any information about the backend to the front-end in production.


Distinguishing between "queue time" and "network time" seems difficult and I'm not quite sure how to do it. The difference between totalTime - totalComputeTime should be the combination of both though.

Yeah good point, this is a tricky one to record but also to communicate to our users.

I suppose that there are two queues:

  1. Browser queue: The browser will only send around 6 (https://stackoverflow.com/questions/561046/) requests to the server at a time, so if a user has a Dash app with 60 charts, 10 batches of requests will be made.
  2. The server itself will queue things up, whether that's being done with flask or gunicorn or a load balancer.

So, when we display totalTime - totalComputeTime to our users, I suppose we label this like "Network & Request Queue Time".

In some cases, the queuing time will dominate the time. In most cases, I think the payload size itself will dominate. So, another metric we could display could be the size of the request. We can get this from the content-length header. It would be nice to display this in shorthand like kb or mb
image


I'll take a look into rendering time next.

Thinking a bit about this now, this one actually might be pretty tricky. I think the main components that a user might care about is the graph component as will handle the most amount of data on the page. We do asynchronous rendering here, so even if we knew when React "finished" rendering the component, we wouldn't be able to capture the actual rendering time.

This might be a rabbit hole, so feel free to stay out of it for now, I think the other performance breakdowns will be the most valuable for our users!


Playing around with this locally, another thing that comes to mind is conditionally re-rendering the callback tree based off of which components are visible on the page. So, if you have a large multi-page application, you would only see the callbacks that are associated with outputs or inputs that are currently rendered on the page. This data is available in the paths object in the store.


We merged & released some deeper changes to the dash-renderer code in #1103. It looks like this introduced some conflicts in this PR, sorry about that!


This is looking really great otherwise and I know the community will love it. Are you interested in continuing to add new features to this or would you be looking to try to merge this in soon and follow up with more features? Either way is fine!

@jjaraalm
Copy link
Contributor Author

@chriddyp I'm in no rush to get this merged in. I'd love to add more features, but just need to find time to work on it at some point. I didn't realize that Graphs were async! Good point, I was just going to try and monitor to see when react reported everything complete. It would be nice to know, but sounds like we should postpone it for now.

Re: #1103, I've been waiting on that so I'm more than happy to accommodate. Interestingly, I noticed that #1103 (or some other change in dev) messes with the viz.js callback graph and makes it very unstable for me. Looks like it's being continually rebuilt and there's some non-deterministic code (probably in viz.js) that jiggles the layout. Not sure if anyone's opened a bug report on it.

@alexcjohnson
Copy link
Collaborator

Yep, that was me in #1103 mucking with the existing callback graph implementation - had to tweak it a bit in order to not totally choke on the dict IDs. Sorry about the merge conflicts! Feel free to file a bug report about the instability you're seeing, but my personal feeling is what you have here is already such an improvement that we should work to get it merged without adding any more features than it already has; then we can add all these other ideas in future PRs.

@chriddyp
Copy link
Member

but my personal feeling is what you have here is already such an improvement that we should work to get it merged without adding any more features than it already has; then we can add all these other ideas in future PRs.

I agree! The work done in this PR already improves things 10x, I would love to see the community start to use this new version 👍

@nicolaskruchten
Copy link
Member

I'd love to see a screenshot of what this looks like when multiple props are involved in a single component, similar to the cities-dropdown here: https://github.com/nicolaskruchten/dash_callback_chain/blob/master/example.png

@rpkyle rpkyle self-requested a review August 15, 2020 16:27
@rpkyle
Copy link
Contributor

rpkyle commented Aug 15, 2020

@chriddyp sounds good.

I've added support for reporting server-side timings using Server-Timing headers. This way (in modern browsers) we get timing reported both in dev-tools on a per-request basis and through introspection as an aggregate total.

Server-Timing: dash_total;dur=1002
Server-Timing: task_1;desc="The first task.";dur=100
Server-Timing: task_2;dur=700
Server-Timing: task_3;desc="Cleanup task.";dur=200

The dash_total resource is automatically computed on all requests.

@jjaraalm Thanks for all your efforts to improve the callback graph implementation -- the enhanced context provided by this PR will be very helpful for debugging purposes.

One brief question about dash_total; I see it in Server-Timing above, but I'm unsure where it's added to the headers. Will this happen elsewhere, or might we want to include something like the following just before for name, info ...?

response.headers.add("Server-Timing", 'dash_total;dur={}'.format(dash_total["dur"]))

@alexcjohnson
Copy link
Collaborator

I see it in Server-Timing above, but I'm unsure where it's added to the headers.

@rpkyle the current implementation actually has this called __dash_server - it's created here, finalized here, and processed along with the custom times here.

Copy link
Contributor

@Marc-Andre-Rivet Marc-Andre-Rivet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good except for the linting failure 💃

c.parentElement.removeChild(c);
});
"""
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New feature for dash.testing - percy_snapshot adds a kwarg convert_canvases that turns every <canvas> into an <img> with matching contents, then puts the canvases back afterward. This lets us capture Cytoscape, among other things (cc @xhlulu)
Screen Shot 2020-08-29 at 11 39 51 PM
Sizing isn't quite as expected, but I assume that's just Percy rendering as a different screen size from what's used in circleci.

name: 'cose',
padding: 10,
animate: false
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicolaskruchten I couldn't come up with a single layout that always seemed better than the others, so made a dropdown to choose among three. Another side benefit of this is if you drag nodes around to tweak the layout, and you want to get back to the original, you can change to a different algo and then back to the original.

from dash.dependencies import Output, Input


def test_dvct001_callback_timing(dash_thread_server):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rpkyle this test should cover parity between Python and R for the Server-Timing header, including record_timing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants