Provide an update_display message #209

rgbkrk · 2016-10-03T18:20:43Z

The purpose of this message would be to provide an update to a display_data field, without need of full duplex communications.

@ellisonbg suggested that this would make it so that progress bars can be implemented without needing full widgets/comms support. That's exactly the kind of feedback loop I want to be able to provide to users as well.

For me, I want this for informing about any background threads or processes including spark jobs, database queries, and external processes (see ipython/ipython#9969).

Potential approach

On display_data, provide transient field to identify a display_id:

{
    "content": {
      "data": {
        "text/plain": "[==                ]"
      },
      "transient": {
        "display_id": "0000"
      }
    }
}

Followed by an update_display_data:

{
    "content": {
      "data": {
        "text/plain": "[==============    ]"
      },
      "transient": {
        "display_id": "0000"
      }
    }
}

In the notebook, this should also result in storing the last result of the update.

The text was updated successfully, but these errors were encountered:

rgbkrk · 2016-10-03T18:37:28Z

Alternative - top level key:

{
    "content": {
      "data": {
        "text/plain": "[==                ]"
      },
      "display_id": "0000-1111-2222-3333"
    }
}

This makes it so the display_id is not persisted at runtime.

rgbkrk · 2016-10-03T18:53:05Z

Another suggestion from @minrk: we could have another entry in content for all transient/live data:

{
    "content": {
      "data": {
        "text/plain": "[==                ]"
      },
      "transient": {
        "display_id": "0000-1111-2222-3333"
      }
    }
}

rgbkrk · 2016-10-03T19:18:39Z

The jupyter js widgets currently use transient data to get bootstrapped on the page by setting the model id in a mimebundle

{
  "application/vnd.jupyter.widget": model_id
}

The transient section seems like a good use case for those as well.

rgbkrk · 2016-10-03T19:19:04Z

/cc @jasongrout @SylvainCorlay

minrk · 2016-10-03T19:19:21Z

I like not persisting the display-id, so the top-level key or transient area makes sense to me. These don't really make sense as persisted info (much like it doesn't make sense to persist widget model IDs in the document).

Related details from side conversation:

update_display would modify the display_data item not just on the page, but in the notebook state

Further questions:

should this be a new message, or should this be info added to display_data itself?
What should happen to an update message if the target can't be found (e.g. target output cleared)? Start a new output, or discard as having no destination? This may point to the answer for whether it should be a new message or not.

rgbkrk · 2016-10-03T19:25:26Z

Now that I'm thinking about model IDs for jupyter widgets, it would be handy to have the transient/live dict/map available in general on all the messages that have mime bundles.

rgbkrk · 2016-10-03T19:26:11Z

What should happen to an update message if the target can't be found (e.g. target output cleared)? Start a new output, or discard as having no destination? This may point to the answer for whether it should be a new message or not.

Well that one leaves me a bit stumped. I'm not sure what the desired behavior is.

minrk · 2016-10-04T09:12:17Z

I think it depends a bit on how we define the semantics of update. Do we allow publishing deltas, or is it always replacing the whole thing? If it's always replacing the whole thing, then it seems like the simplest modification would be a regular display_data message with the following modifications to handling:

if display_id is defined:
    if display_id found on the page:
        update match in-place
    else:
        regular display, record display_id for future lookup
else:
    regular display

rgbkrk · 2016-10-04T09:17:03Z

Ah, I like that. I'm definitely thinking of replacing the whole thing. Doing it on the same display_data by id makes sense to me.

I feel like deltas belong in comms or another message type.

minrk · 2016-10-04T10:55:06Z

Yup, there's always comms for the complex case.

At the protocol level, we could already have this, because each display_data already has an id: the msg_id. But in practice this is difficult to locate, as it's not typically available in the kernel's APIs.

minrk · 2016-10-04T10:56:44Z

While this repo only defines protocols, it might be worth considering what IPython APIs could look like for updating an output, since you would have to create and track an ID somehow. Sending raw messages is always an option, but presumably we want to cover at least some use cases without that.

rgbkrk · 2016-10-05T20:33:20Z

The way I've cheated in the past week for a single thread where I want to update a prior cell:

Which only helps me if I'm ok with only appending.

rgbkrk · 2016-10-05T20:35:13Z

The reason of course that I can't necessarily get and set header dynamically is that parent_header is an attribute on the ZMQInteractiveShell:

def set_parent(self, parent):
        """Set the parent header for associating output with its triggering input"""
        self.parent_header = parent
        self.displayhook.set_parent(parent)
        self.display_pub.set_parent(parent)
        if hasattr(self, '_data_pub'):
            self.data_pub.set_parent(parent)
        try:
            sys.stdout.set_parent(parent)
        except AttributeError:
            pass
        try:
            sys.stderr.set_parent(parent)
        except AttributeError:
            pass

rgbkrk · 2016-10-05T20:37:44Z

Thinking on the IPython API for this:

bundle = {
    'text/html': '<b>whoa</b>',
}
IPython.display.display(bundle, raw=True, transient={'id': '1234'})

takluyver · 2016-10-05T21:05:46Z

I would imagine we would also want a higher level API, which I envisage something like this:

from IPython.display import display, HTML, Updatable
d = Updatable(HTML('<b>whoa</b>'))
display(d)
# ...
d.replace(HTML('<i>awesome</i>'))

rgbkrk · 2016-10-05T21:09:28Z

I would imagine we would also want a higher level API, which I envisage something like this

❤️ Ah, I'm liking that.

Could have a traitlets based version too:

updatable.data = HTML('<i>awesome</i>')

takluyver · 2016-10-05T21:54:07Z

I thought about that interface too, but if I'm doing something primarily for the side effect I prefer it to be a function/method call rather than attribute/item assignment.

rgbkrk · 2016-10-05T22:05:11Z

I prefer the function/method call as well.

minrk · 2016-10-21T18:23:12Z

I have a prototype implementation working:

and came up with a few questions in the meantime.

The main one is:

Should display(obj) ever be able to update instead of displaying anew?

That determines where the display_id information might be stored/looked up during the display-formatter process. I'm currently inclined to say no, that objects can't define and store their display_id such that a naked display will result in an update instead of a new display.

Another question is to decide between two semantic behaviors:

display with id, where a display is updated if it's already on the page, or handled normally if it's the first time the display_id is seen. (what I sketched above and have implemented).
separate explicitly display, which always puts something new on the page, and update, which never puts something on the page.

The key difference for me is what should happen if someone thinks they are updating something, but the original display has been cleared. If the answer is that nothing should be displayed, then explicit update seems like the way to go. If it should get displayed, then it seems like display-with-id is the right choice.

For the first case, I would think the implementation would be:

display(obj, display_id='xxx')
display(obj_again, display_id='xxx')

where the protocol change is adding the display_id to display_data message.

which for me is "Put me on the page, but don't make duplicates".

The second might look like:

display(obj, display_id='xxx')
update_display(obj_again, display_id='xxx')

and the protocol change is the same as above, also adding an update_display_data message for the update calls.

which is more "1. put me on the page, no matter what, 2. update me if I'm on the page, otherwise do nothing"

I can honestly imagine both being surprising, depending on user expectations, but I have a slight leaning toward the first one.

jasongrout · 2016-10-24T17:42:09Z

What usecases are we envisioning for updating output? So far, I see in the discussion above:

Progress bars
Updating status of computations (for example, distributed computations)

I could also imagine simple live charting for something like matplotlib.

Also, it seems that the discussion is about updating any display area on a page. Another thought that satisfies the progress bar case is to scope updates to messages with the same parent header, so that the display id namespace is scoped to a single execution request. We do run into issues with how parent headers are treated in threaded computations, but if we solve that, it feels a bit cleaner. You could use an output widget for the more advanced usecase of updating outputs on other areas of the page.

rgbkrk · 2016-10-24T17:48:02Z

Beyond live charting, there's also the simpler precursor: updating a single plot.

jasongrout · 2016-10-24T18:12:55Z

For both the progress bars and updating a single plot, I lean towards the display/update-only interaction. Especially if the output has been cleared - I don't want to keep seeing something pop back up. And especially not popping up in my current cell output when the original one was a few cells away.

In what situations does your situation one, the 'display without duplicates' make sense?

theengineear · 2016-10-24T19:07:36Z

Should display(obj) ever be able to update instead of displaying anew?

I don't think so. If you need to do this, you just (1) keep the reference to the displayed object (2) update that reference and (3) re-display, right?

I lean towards the display/update-only interaction.

👍 from me.

To be clear, what happens if you make multiple display(obj, display_id='xxx') calls and then a single
update_display(obj_again, display_id='xxx') call?

Is it OK to reuse the display_id?
If it is OK to reuse the display_id, are both updated?

parente · 2016-11-01T02:13:05Z

In jupyter/jupyter#212 (comment), I mentioned @mariusvniekerk and I would comment on options for a better Spark progress bar in the spylon library. We discussed the message and APIs proposed here today, and both agree they look like a nice way to render richer progress information, regardless of the final implementation chosen. Great stuff. 👍

Another question is to decide between two semantic behaviors:
...
The key difference for me is what should happen if someone thinks they are updating something, but the original display has been cleared.

From a user's standpoint, the answer might depend on the context. For example, say I kick off a long running Spark job using a library capable of showing progress during execution. Assume my kernel remains busy throughout. If I clear the notebook output during execution, I'd like the progress bar to reappear on the next update. My Spark job is still running and my kernel is still busy, and so I'd like my notebook display to reflect that fact.

Now imagine I'm using a library that updates plots of streaming data using a background thread in the kernel. If I clear my cell outputs in this scenario, I do not want my plots to reappear again. I'm using clear as a convenient way of stopping all plot updates instead of having to both clear and run an additional cell to tell the library to stop making updates.

I think combining @minrk's two API options above can satisfy the two use cases. In the progress bar scenario, a library would use:

# in a thread loop
display(progress_obj, display_id='some_spark_job_id')

repeatedly to create, update, or recreate the progress bar view whether the user clears the notebook or not. In the plotting scenario, a library would use:

# once 
display(plot_obj, display_id='some_plot_id')
# in a thread loop
update_display(plot_object_again, display_id='

to create the initial display area for the plot, and thereafter only update that display for as long as it exists.

parente · 2016-11-01T02:31:44Z

On a related note, having update_display support in any form will play nicely with the dashboard extensions.

display initial output in various cells (HTML tables, PNG plots, JS widgets, ...)
Drag-drop those cells into a layout
Run a loop or thread that directs display_updates to appropriate cells / display IDs

The third step requires use of comm messages and reasonably complex libraries wrapping them today (e.g., ipywidgets, declarativewidgets). I imagine use of update_display will be much easier within notebooks themselves, especially if kernels provide API wrappers like the IPython display addition spec'ed above.

rgbkrk · 2016-11-01T03:34:34Z

Excellent user/developer story!

It looks like the third step doesn't require comm messages - only bookkeeping around display IDs.

minrk · 2016-11-01T10:05:42Z

Another thought that satisfies the progress bar case is to scope updates to messages with the same parent header, so that the display id namespace is scoped to a single execution request

I don't think it will work to scope updates to a single parent even for progress bars, because we are already seeing progress bars that are updated in the background in places like dask. If we do scope it to parents in the protocol, that really means that we need to track parents in the kernel so they can be swapped out to update, which seems more complicated than tracking display-id updates in the frontend.

Is it OK to reuse the display_id? If it is OK to reuse the display_id, are both updated?

The display-only-once idea is that a second display reusing an id is equivalent to an update. If we are discarding the 'display-only-once' idea, I'd say it's not okay, and would leave the behavior undefined at the protocol/spec level. At the implementation level, I'd probably make the last display to claim an ID win, possibly with a js-console warning about display-id collisions. I would specifically want to avoid implementing an approximation of the widgets' multi-view-one-model concept. If you want that, you should be using widgets.

So the current proposal seems to be:

display(obj, display_id='...') with a new id puts a new output on the page with display_id
display(obj, display_id='...') re-using a display_id is either:
- equivalent to an update
- exceptional / undefined
update_display(obj, display_id='...') always updates, never creates new elements

A progress bar API would require using methods in order to track display_id:

class ProgressBar(object):
    def display(self):
        self._display_id = new_id()
        display(self, display_id=self._display_id)

    def update(self):
        update_display(self, display_id=self._display_id)

Another question:

can a plain display(obj) set the display_id? I ask this mainly because it affects the IPython display machinery due to some not-very-extensible choices in the APIs, making it hard to propagate new information that's not in (data, metadata), which display-id would not be.

parente · 2016-11-02T14:28:39Z

So the current proposal seems to be:

display(obj, display_id='...') with a new id puts a new output on the page with display_id

display(obj, display_id='...') re-using a display_id is either:

equivalent to an update

exceptional / undefined
update_display(obj, display_id='...') always updates, never creates new elements

I'd like #2 to be "equivalent to an update" to handle the progress bar use case I mentioned above. Without this behavior, the library would need to detect that the display was cleared, generate a new display ID, and then go back to updating it. Libraries that don't want the automatic recreation behavior can use display once to create, and thereafter only use update (the plot use case I mentioned.)

minrk · 2016-11-02T14:30:24Z

@parente sounds good to me.

minrk · 2016-11-08T01:11:37Z

Example implementation in Notebook Classic

rgbkrk · 2016-11-08T01:19:16Z

How are you sending display data with a transient dict from the IPython side?

minrk · 2016-11-08T18:43:52Z

PRs for IPython implementation: ipython/ipykernel#204 ipython/ipython#10048

minrk · 2016-11-12T21:56:52Z

Detail that hasn't been brought up yet: execute_result is described as ~ display_data plus execution_count. Can execute_results use display_id as well, or is this confined to display_data?

rgbkrk · 2016-11-12T22:42:43Z

With the implementation in the PR for nteract, it would be super easy to do this for execute_result and I'd love to see it if it's not too hard on the ipykernel, ipython, jupyter notebook side.

minrk · 2016-11-12T23:34:09Z

I think it's the same for nbclassic. It's probably on the IPython side where it would be hardest to assign a display_id to a result.

SylvainCorlay · 2016-11-13T00:27:01Z

@RgBrk do you think that this protocol could be implemented on the top of comms?

rgbkrk · 2016-11-13T04:13:19Z

Anything can be implemented on top of the comms with the right receiver on the frontend side.

This provides a path for one way updates that are persisted to the notebook that any kernel author can implement in a fairly simple fashion.

rgbkrk · 2016-11-13T12:11:13Z

Mentioned in chat with Min:

Kyle: How would I need to setup my object for providing metadata or transient when found via repr for execute_result?

Min: That's part of why I was asking about "can objects specify their display_id" Because this would inform it. E.g. a new Obj._ipython_display_id_(). Without that, you can't really do it. I realized, since display_id isn't tied to mimetypes, we don't have a general issue like I was talking about earlier.

Kyle: Ok yeah

Min: That's still true a bit for transient in general

Kyle: I like the new tab completion helper

rgbkrk · 2016-11-14T19:11:09Z

#225 is ready for review

DTAIEB · 2016-11-15T22:23:20Z

I recently implemented a Progress Monitor for Spark Jobs but had to work around 2 issues:

The entities I'm monitoring are complex. Each Spark job has multiple stages and each stage has its own progress bar denoting the inner tasks. The UI calls for stages in a particular job to be grouped together. Having each stage in its own display output area makes it very difficult to update each stage individually because the current implementation is a "replace all" instead of a "delta". Work around is to have a second display output that run a js snippet to update individual dom elements on the page.
Spark Job can be initiated from anywhere in the cell code, which means that the progress monitor may come in the middle of other print statements, which makes it very confusing. It would be good to have a way to specify the position of the display in the cell output e.g. first. Work around was to attach a pre_run_cell event handler and create the display placeholder at the very beginning.

Here's a screenshot of my POC:

rgbkrk · 2016-11-17T03:03:45Z

Yeah, the Spark status is definitely a complex nested type. Would this be a decent manifest of what the layout/mapping is?

// array of jobs
[
  // job
  {
    id: jobID,
    stages: [
      { id: stageID, progress: { capacity: 5, value: 5 } } } 
    ]
  }
]

As for why it's a replacement, all the current mimetypes in the notebook are pure strings when sent across and there's not a current diffing strategy. For custom mimetypes I could imagine us handling deltas.

My personal opinion for this one is that we can have either a mimetype for progress or spark info that would get rendered appropriately as the JSON data changes (allowing something like React or Phosphor handle progressive renders that keep state for users) or have a mimetype that is more amenable to diffing and rendering on a virtual DOM. I'll post some more thoughts on this in a little bit.

alex-the-man · 2016-11-17T03:18:20Z

Borrowing Databrick's screenshot. They have a really nice way to report progress of jobs and stages.
Stages are folded into jobs. User can click on the arrow to unfold jobs and look at progress broken down at stage level.

About the renderer approach, is it possible for a renderer maintain to maintain the folding states on update_display? Can renderers persist their folding states into the notebook?

rgbkrk · 2016-11-17T04:25:36Z

is it possible for a renderer maintain to maintain the folding states on update_display

Yeah. I can make a prototype right now.

Can renderers persist their folding states into the notebook?

No.

DTAIEB · 2016-11-17T19:25:14Z

@rgbkrk Yes, the layout you describe is correct, although there are more info that goes in each buckets. Also there can be a lot of jobs, stages, tasks so I've been thinking about an minified overall progress bar that aggregate all the steps with an option for the user to expand individual jobs. Probably fancier than what was originally thought, I know :-) Along the same lines, execution progress bar is a generic problem. I think it would be beneficial to have a generic execution progress bar that is more natively integrated to the cell (e.g. small), with a simple update API and perhaps a cancel button.

rgbkrk · 2016-11-29T19:00:20Z

Let's open up a new issue for exploring the progress bars since this issue (updating displays) is done. I made a bit of progress on a virtual DOM renderer that could maintain state, need to keep iterating.

rgbkrk · 2016-11-29T19:00:26Z

Closing this issue since we've checked all the boxes across ipykernel, ipython, jupyter/notebook, jupyter_client, and nteract.

rgbkrk mentioned this issue Oct 11, 2016

Routing ThreadPoolExecutor results to originating parent_header/cell ipython/ipython#9969

Open

rgbkrk mentioned this issue Oct 19, 2016

Improving Jupyter for Spark jupyter/jupyter#212

Closed

jasongrout mentioned this issue Nov 8, 2016

Updatable outputs jupyterlab/jupyterlab#1227

Closed

This was referenced Nov 8, 2016

support transient field, update_display_data ipython/ipykernel#204

Merged

Implement update_display ipython/ipython#10048

Merged

rgbkrk mentioned this issue Nov 8, 2016

refactor(outputs): moved output appending to redux nteract/nteract#1158

Merged

minrk mentioned this issue Nov 8, 2016

support routing updating outputs with display_id jupyter/notebook#1879

Merged

rgbkrk mentioned this issue Nov 9, 2016

feat(display): allow updating displays nteract/nteract#1162

Merged

18 tasks

rgbkrk mentioned this issue Nov 14, 2016

Updatable Displays #225

Merged

3 tasks

rgbkrk closed this as completed Nov 29, 2016

jdetle mentioned this issue Dec 8, 2016

Encore uses different progressbar MDAnalysis/mdanalysis#1114

Closed

parente mentioned this issue Jan 20, 2017

Button which rerunns all cells in jupyter notebook does not work on dashboard server jupyter/dashboards_server#300

Closed

minrk added this to the 5.0 milestone Feb 7, 2017

Provide an update_display message #209

Provide an update_display message #209

Comments

rgbkrk commented Oct 3, 2016 • edited Loading

Potential approach

rgbkrk commented Oct 3, 2016 • edited Loading

rgbkrk commented Oct 3, 2016

rgbkrk commented Oct 3, 2016 • edited Loading

rgbkrk commented Oct 3, 2016

minrk commented Oct 3, 2016 • edited Loading

rgbkrk commented Oct 3, 2016

rgbkrk commented Oct 3, 2016

minrk commented Oct 4, 2016

rgbkrk commented Oct 4, 2016

minrk commented Oct 4, 2016 • edited Loading

minrk commented Oct 4, 2016 • edited Loading

rgbkrk commented Oct 5, 2016

rgbkrk commented Oct 5, 2016 • edited Loading

rgbkrk commented Oct 5, 2016 • edited Loading

takluyver commented Oct 5, 2016

rgbkrk commented Oct 5, 2016 • edited Loading

takluyver commented Oct 5, 2016

rgbkrk commented Oct 5, 2016

minrk commented Oct 21, 2016

jasongrout commented Oct 24, 2016

rgbkrk commented Oct 24, 2016

jasongrout commented Oct 24, 2016

theengineear commented Oct 24, 2016

parente commented Nov 1, 2016

parente commented Nov 1, 2016

rgbkrk commented Nov 1, 2016

minrk commented Nov 1, 2016

parente commented Nov 2, 2016

minrk commented Nov 2, 2016

minrk commented Nov 8, 2016

rgbkrk commented Nov 8, 2016 • edited Loading

minrk commented Nov 8, 2016

minrk commented Nov 12, 2016

rgbkrk commented Nov 12, 2016

minrk commented Nov 12, 2016

SylvainCorlay commented Nov 13, 2016

rgbkrk commented Nov 13, 2016 • edited Loading

rgbkrk commented Nov 13, 2016

rgbkrk commented Nov 14, 2016

DTAIEB commented Nov 15, 2016

rgbkrk commented Nov 17, 2016 • edited Loading

alex-the-man commented Nov 17, 2016

rgbkrk commented Nov 17, 2016

DTAIEB commented Nov 17, 2016

rgbkrk commented Nov 29, 2016

rgbkrk commented Nov 29, 2016

rgbkrk commented Oct 3, 2016 •

edited

Loading

rgbkrk commented Oct 3, 2016 •

edited

Loading

rgbkrk commented Oct 3, 2016 •

edited

Loading

minrk commented Oct 3, 2016 •

edited

Loading

minrk commented Oct 4, 2016 •

edited

Loading

minrk commented Oct 4, 2016 •

edited

Loading

rgbkrk commented Oct 5, 2016 •

edited

Loading

rgbkrk commented Oct 5, 2016 •

edited

Loading

rgbkrk commented Oct 5, 2016 •

edited

Loading

rgbkrk commented Nov 8, 2016 •

edited

Loading

rgbkrk commented Nov 13, 2016 •

edited

Loading

rgbkrk commented Nov 17, 2016 •

edited

Loading