Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide an update_display message #209

Closed
rgbkrk opened this issue Oct 3, 2016 · 46 comments
Closed

Provide an update_display message #209

rgbkrk opened this issue Oct 3, 2016 · 46 comments
Milestone

Comments

@rgbkrk
Copy link
Member

rgbkrk commented Oct 3, 2016

The purpose of this message would be to provide an update to a display_data field, without need of full duplex communications.

@ellisonbg suggested that this would make it so that progress bars can be implemented without needing full widgets/comms support. That's exactly the kind of feedback loop I want to be able to provide to users as well.

For me, I want this for informing about any background threads or processes including spark jobs, database queries, and external processes (see ipython/ipython#9969).

Potential approach

On display_data, provide transient field to identify a display_id:

{
    "content": {
      "data": {
        "text/plain": "[==                ]"
      },
      "transient": {
        "display_id": "0000"
      }
    }
}

Followed by an update_display_data:

{
    "content": {
      "data": {
        "text/plain": "[==============    ]"
      },
      "transient": {
        "display_id": "0000"
      }
    }
}

In the notebook, this should also result in storing the last result of the update.

@rgbkrk
Copy link
Member Author

rgbkrk commented Oct 3, 2016

Alternative - top level key:

{
    "content": {
      "data": {
        "text/plain": "[==                ]"
      },
      "display_id": "0000-1111-2222-3333"
    }
}

This makes it so the display_id is not persisted at runtime.

@rgbkrk
Copy link
Member Author

rgbkrk commented Oct 3, 2016

Another suggestion from @minrk: we could have another entry in content for all transient/live data:

{
    "content": {
      "data": {
        "text/plain": "[==                ]"
      },
      "transient": {
        "display_id": "0000-1111-2222-3333"
      }
    }
}

@rgbkrk
Copy link
Member Author

rgbkrk commented Oct 3, 2016

The jupyter js widgets currently use transient data to get bootstrapped on the page by setting the model id in a mimebundle

{
  "application/vnd.jupyter.widget": model_id
}

The transient section seems like a good use case for those as well.

@rgbkrk
Copy link
Member Author

rgbkrk commented Oct 3, 2016

/cc @jasongrout @SylvainCorlay

@minrk
Copy link
Member

minrk commented Oct 3, 2016

I like not persisting the display-id, so the top-level key or transient area makes sense to me. These don't really make sense as persisted info (much like it doesn't make sense to persist widget model IDs in the document).

Related details from side conversation:

  • update_display would modify the display_data item not just on the page, but in the notebook state

Further questions:

  • should this be a new message, or should this be info added to display_data itself?
    What should happen to an update message if the target can't be found (e.g. target output cleared)? Start a new output, or discard as having no destination? This may point to the answer for whether it should be a new message or not.

@rgbkrk
Copy link
Member Author

rgbkrk commented Oct 3, 2016

Now that I'm thinking about model IDs for jupyter widgets, it would be handy to have the transient/live dict/map available in general on all the messages that have mime bundles.

@rgbkrk
Copy link
Member Author

rgbkrk commented Oct 3, 2016

What should happen to an update message if the target can't be found (e.g. target output cleared)? Start a new output, or discard as having no destination? This may point to the answer for whether it should be a new message or not.

Well that one leaves me a bit stumped. I'm not sure what the desired behavior is.

@minrk
Copy link
Member

minrk commented Oct 4, 2016

I think it depends a bit on how we define the semantics of update. Do we allow publishing deltas, or is it always replacing the whole thing? If it's always replacing the whole thing, then it seems like the simplest modification would be a regular display_data message with the following modifications to handling:

if display_id is defined:
    if display_id found on the page:
        update match in-place
    else:
        regular display, record display_id for future lookup
else:
    regular display

@rgbkrk
Copy link
Member Author

rgbkrk commented Oct 4, 2016

Ah, I like that. I'm definitely thinking of replacing the whole thing. Doing it on the same display_data by id makes sense to me.

I feel like deltas belong in comms or another message type.

@minrk
Copy link
Member

minrk commented Oct 4, 2016

Yup, there's always comms for the complex case.

At the protocol level, we could already have this, because each display_data already has an id: the msg_id. But in practice this is difficult to locate, as it's not typically available in the kernel's APIs.

@minrk
Copy link
Member

minrk commented Oct 4, 2016

While this repo only defines protocols, it might be worth considering what IPython APIs could look like for updating an output, since you would have to create and track an ID somehow. Sending raw messages is always an option, but presumably we want to cover at least some use cases without that.

@rgbkrk
Copy link
Member Author

rgbkrk commented Oct 5, 2016

The way I've cheated in the past week for a single thread where I want to update a prior cell:

screen shot 2016-10-05 at 1 32 28 pm

Which only helps me if I'm ok with only appending.

@rgbkrk
Copy link
Member Author

rgbkrk commented Oct 5, 2016

The reason of course that I can't necessarily get and set header dynamically is that parent_header is an attribute on the ZMQInteractiveShell:

def set_parent(self, parent):
        """Set the parent header for associating output with its triggering input"""
        self.parent_header = parent
        self.displayhook.set_parent(parent)
        self.display_pub.set_parent(parent)
        if hasattr(self, '_data_pub'):
            self.data_pub.set_parent(parent)
        try:
            sys.stdout.set_parent(parent)
        except AttributeError:
            pass
        try:
            sys.stderr.set_parent(parent)
        except AttributeError:
            pass

@rgbkrk
Copy link
Member Author

rgbkrk commented Oct 5, 2016

Thinking on the IPython API for this:

bundle = {
    'text/html': '<b>whoa</b>',
}
IPython.display.display(bundle, raw=True, transient={'id': '1234'})

@takluyver
Copy link
Member

I would imagine we would also want a higher level API, which I envisage something like this:

from IPython.display import display, HTML, Updatable
d = Updatable(HTML('<b>whoa</b>'))
display(d)
# ...
d.replace(HTML('<i>awesome</i>'))

@rgbkrk
Copy link
Member Author

rgbkrk commented Oct 5, 2016

I would imagine we would also want a higher level API, which I envisage something like this

❤️ Ah, I'm liking that.

Could have a traitlets based version too:

updatable.data = HTML('<i>awesome</i>')

@takluyver
Copy link
Member

I thought about that interface too, but if I'm doing something primarily for the side effect I prefer it to be a function/method call rather than attribute/item assignment.

@rgbkrk
Copy link
Member Author

rgbkrk commented Oct 5, 2016

I prefer the function/method call as well.

@minrk
Copy link
Member

minrk commented Oct 21, 2016

I have a prototype implementation working:

forkyle

and came up with a few questions in the meantime.

The main one is:

Should display(obj) ever be able to update instead of displaying anew?

That determines where the display_id information might be stored/looked up during the display-formatter process. I'm currently inclined to say no, that objects can't define and store their display_id such that a naked display will result in an update instead of a new display.

Another question is to decide between two semantic behaviors:

  1. display with id, where a display is updated if it's already on the page, or handled normally if it's the first time the display_id is seen. (what I sketched above and have implemented).
  2. separate explicitly display, which always puts something new on the page, and update, which never puts something on the page.

The key difference for me is what should happen if someone thinks they are updating something, but the original display has been cleared. If the answer is that nothing should be displayed, then explicit update seems like the way to go. If it should get displayed, then it seems like display-with-id is the right choice.

For the first case, I would think the implementation would be:

display(obj, display_id='xxx')
display(obj_again, display_id='xxx')

where the protocol change is adding the display_id to display_data message.

which for me is "Put me on the page, but don't make duplicates".

The second might look like:

display(obj, display_id='xxx')
update_display(obj_again, display_id='xxx')

and the protocol change is the same as above, also adding an update_display_data message for the update calls.

which is more "1. put me on the page, no matter what, 2. update me if I'm on the page, otherwise do nothing"

I can honestly imagine both being surprising, depending on user expectations, but I have a slight leaning toward the first one.

@jasongrout
Copy link
Member

What usecases are we envisioning for updating output? So far, I see in the discussion above:

  • Progress bars
  • Updating status of computations (for example, distributed computations)

I could also imagine simple live charting for something like matplotlib.

Also, it seems that the discussion is about updating any display area on a page. Another thought that satisfies the progress bar case is to scope updates to messages with the same parent header, so that the display id namespace is scoped to a single execution request. We do run into issues with how parent headers are treated in threaded computations, but if we solve that, it feels a bit cleaner. You could use an output widget for the more advanced usecase of updating outputs on other areas of the page.

@rgbkrk
Copy link
Member Author

rgbkrk commented Oct 24, 2016

Beyond live charting, there's also the simpler precursor: updating a single plot.

@jasongrout
Copy link
Member

For both the progress bars and updating a single plot, I lean towards the display/update-only interaction. Especially if the output has been cleared - I don't want to keep seeing something pop back up. And especially not popping up in my current cell output when the original one was a few cells away.

In what situations does your situation one, the 'display without duplicates' make sense?

@theengineear
Copy link

Should display(obj) ever be able to update instead of displaying anew?

I don't think so. If you need to do this, you just (1) keep the reference to the displayed object (2) update that reference and (3) re-display, right?

I lean towards the display/update-only interaction.

👍 from me.

To be clear, what happens if you make multiple display(obj, display_id='xxx') calls and then a single
update_display(obj_again, display_id='xxx') call?

Is it OK to reuse the display_id?
If it is OK to reuse the display_id, are both updated?

@parente
Copy link
Member

parente commented Nov 1, 2016

In jupyter/jupyter#212 (comment), I mentioned @mariusvniekerk and I would comment on options for a better Spark progress bar in the spylon library. We discussed the message and APIs proposed here today, and both agree they look like a nice way to render richer progress information, regardless of the final implementation chosen. Great stuff. 👍

Another question is to decide between two semantic behaviors:
...
The key difference for me is what should happen if someone thinks they are updating something, but the original display has been cleared.

From a user's standpoint, the answer might depend on the context. For example, say I kick off a long running Spark job using a library capable of showing progress during execution. Assume my kernel remains busy throughout. If I clear the notebook output during execution, I'd like the progress bar to reappear on the next update. My Spark job is still running and my kernel is still busy, and so I'd like my notebook display to reflect that fact.

Now imagine I'm using a library that updates plots of streaming data using a background thread in the kernel. If I clear my cell outputs in this scenario, I do not want my plots to reappear again. I'm using clear as a convenient way of stopping all plot updates instead of having to both clear and run an additional cell to tell the library to stop making updates.

I think combining @minrk's two API options above can satisfy the two use cases. In the progress bar scenario, a library would use:

# in a thread loop
display(progress_obj, display_id='some_spark_job_id')

repeatedly to create, update, or recreate the progress bar view whether the user clears the notebook or not. In the plotting scenario, a library would use:

# once 
display(plot_obj, display_id='some_plot_id')
# in a thread loop
update_display(plot_object_again, display_id='

to create the initial display area for the plot, and thereafter only update that display for as long as it exists.

@parente
Copy link
Member

parente commented Nov 1, 2016

On a related note, having update_display support in any form will play nicely with the dashboard extensions.

  1. display initial output in various cells (HTML tables, PNG plots, JS widgets, ...)
  2. Drag-drop those cells into a layout
  3. Run a loop or thread that directs display_updates to appropriate cells / display IDs

The third step requires use of comm messages and reasonably complex libraries wrapping them today (e.g., ipywidgets, declarativewidgets). I imagine use of update_display will be much easier within notebooks themselves, especially if kernels provide API wrappers like the IPython display addition spec'ed above.

@rgbkrk
Copy link
Member Author

rgbkrk commented Nov 1, 2016

Excellent user/developer story!

It looks like the third step doesn't require comm messages - only bookkeeping around display IDs.

@minrk
Copy link
Member

minrk commented Nov 1, 2016

Another thought that satisfies the progress bar case is to scope updates to messages with the same parent header, so that the display id namespace is scoped to a single execution request

I don't think it will work to scope updates to a single parent even for progress bars, because we are already seeing progress bars that are updated in the background in places like dask. If we do scope it to parents in the protocol, that really means that we need to track parents in the kernel so they can be swapped out to update, which seems more complicated than tracking display-id updates in the frontend.

Is it OK to reuse the display_id? If it is OK to reuse the display_id, are both updated?

The display-only-once idea is that a second display reusing an id is equivalent to an update. If we are discarding the 'display-only-once' idea, I'd say it's not okay, and would leave the behavior undefined at the protocol/spec level. At the implementation level, I'd probably make the last display to claim an ID win, possibly with a js-console warning about display-id collisions. I would specifically want to avoid implementing an approximation of the widgets' multi-view-one-model concept. If you want that, you should be using widgets.

So the current proposal seems to be:

  1. display(obj, display_id='...') with a new id puts a new output on the page with display_id
  2. display(obj, display_id='...') re-using a display_id is either:
    • equivalent to an update
    • exceptional / undefined
  3. update_display(obj, display_id='...') always updates, never creates new elements

A progress bar API would require using methods in order to track display_id:

class ProgressBar(object):
    def display(self):
        self._display_id = new_id()
        display(self, display_id=self._display_id)

    def update(self):
        update_display(self, display_id=self._display_id)

Another question:

  • can a plain display(obj) set the display_id? I ask this mainly because it affects the IPython display machinery due to some not-very-extensible choices in the APIs, making it hard to propagate new information that's not in (data, metadata), which display-id would not be.

@parente
Copy link
Member

parente commented Nov 2, 2016

So the current proposal seems to be:

  1. display(obj, display_id='...') with a new id puts a new output on the page with display_id
  2. display(obj, display_id='...') re-using a display_id is either:
    • equivalent to an update
    • exceptional / undefined
      update_display(obj, display_id='...') always updates, never creates new elements

I'd like #2 to be "equivalent to an update" to handle the progress bar use case I mentioned above. Without this behavior, the library would need to detect that the display was cleared, generate a new display ID, and then go back to updating it. Libraries that don't want the automatic recreation behavior can use display once to create, and thereafter only use update (the plot use case I mentioned.)

@minrk
Copy link
Member

minrk commented Nov 2, 2016

@parente sounds good to me.

@minrk
Copy link
Member

minrk commented Nov 8, 2016

@rgbkrk
Copy link
Member Author

rgbkrk commented Nov 8, 2016

How are you sending display data with a transient dict from the IPython side?

@minrk
Copy link
Member

minrk commented Nov 8, 2016

PRs for IPython implementation: ipython/ipykernel#204 ipython/ipython#10048

@minrk
Copy link
Member

minrk commented Nov 12, 2016

Detail that hasn't been brought up yet: execute_result is described as ~ display_data plus execution_count. Can execute_results use display_id as well, or is this confined to display_data?

@rgbkrk
Copy link
Member Author

rgbkrk commented Nov 12, 2016

With the implementation in the PR for nteract, it would be super easy to do this for execute_result and I'd love to see it if it's not too hard on the ipykernel, ipython, jupyter notebook side.

@minrk
Copy link
Member

minrk commented Nov 12, 2016

I think it's the same for nbclassic. It's probably on the IPython side where it would be hardest to assign a display_id to a result.

@SylvainCorlay
Copy link
Member

@RgBrk do you think that this protocol could be implemented on the top of comms?

@rgbkrk
Copy link
Member Author

rgbkrk commented Nov 13, 2016

Anything can be implemented on top of the comms with the right receiver on the frontend side.

This provides a path for one way updates that are persisted to the notebook that any kernel author can implement in a fairly simple fashion.

@rgbkrk
Copy link
Member Author

rgbkrk commented Nov 13, 2016

Mentioned in chat with Min:

Kyle: How would I need to setup my object for providing metadata or transient when found via repr for execute_result?

Min: That's part of why I was asking about "can objects specify their display_id" Because this would inform it. E.g. a new Obj._ipython_display_id_(). Without that, you can't really do it. I realized, since display_id isn't tied to mimetypes, we don't have a general issue like I was talking about earlier.

Kyle: Ok yeah

Min: That's still true a bit for transient in general

Kyle: I like the new tab completion helper

@rgbkrk rgbkrk mentioned this issue Nov 14, 2016
3 tasks
@rgbkrk
Copy link
Member Author

rgbkrk commented Nov 14, 2016

#225 is ready for review

@DTAIEB
Copy link

DTAIEB commented Nov 15, 2016

I recently implemented a Progress Monitor for Spark Jobs but had to work around 2 issues:

  1. The entities I'm monitoring are complex. Each Spark job has multiple stages and each stage has its own progress bar denoting the inner tasks. The UI calls for stages in a particular job to be grouped together. Having each stage in its own display output area makes it very difficult to update each stage individually because the current implementation is a "replace all" instead of a "delta". Work around is to have a second display output that run a js snippet to update individual dom elements on the page.
  2. Spark Job can be initiated from anywhere in the cell code, which means that the progress monitor may come in the middle of other print statements, which makes it very confusing. It would be good to have a way to specify the position of the display in the cell output e.g. first. Work around was to attach a pre_run_cell event handler and create the display placeholder at the very beginning.

Here's a screenshot of my POC:
screen shot 2016-11-15 at 5 20 23 pm

@rgbkrk
Copy link
Member Author

rgbkrk commented Nov 17, 2016

Yeah, the Spark status is definitely a complex nested type. Would this be a decent manifest of what the layout/mapping is?

// array of jobs
[
  // job
  {
    id: jobID,
    stages: [
      { id: stageID, progress: { capacity: 5, value: 5 } } } 
    ]
  }
]

As for why it's a replacement, all the current mimetypes in the notebook are pure strings when sent across and there's not a current diffing strategy. For custom mimetypes I could imagine us handling deltas.

My personal opinion for this one is that we can have either a mimetype for progress or spark info that would get rendered appropriately as the JSON data changes (allowing something like React or Phosphor handle progressive renders that keep state for users) or have a mimetype that is more amenable to diffing and rendering on a virtual DOM. I'll post some more thoughts on this in a little bit.

@alex-the-man
Copy link

Borrowing Databrick's screenshot. They have a really nice way to report progress of jobs and stages.
Stages are folded into jobs. User can click on the arrow to unfold jobs and look at progress broken down at stage level.

About the renderer approach, is it possible for a renderer maintain to maintain the folding states on update_display? Can renderers persist their folding states into the notebook?

image

@rgbkrk
Copy link
Member Author

rgbkrk commented Nov 17, 2016

is it possible for a renderer maintain to maintain the folding states on update_display

Yeah. I can make a prototype right now.

Can renderers persist their folding states into the notebook?

No.

@DTAIEB
Copy link

DTAIEB commented Nov 17, 2016

@rgbkrk Yes, the layout you describe is correct, although there are more info that goes in each buckets. Also there can be a lot of jobs, stages, tasks so I've been thinking about an minified overall progress bar that aggregate all the steps with an option for the user to expand individual jobs. Probably fancier than what was originally thought, I know :-) Along the same lines, execution progress bar is a generic problem. I think it would be beneficial to have a generic execution progress bar that is more natively integrated to the cell (e.g. small), with a simple update API and perhaps a cancel button.

@rgbkrk
Copy link
Member Author

rgbkrk commented Nov 29, 2016

Let's open up a new issue for exploring the progress bars since this issue (updating displays) is done. I made a bit of progress on a virtual DOM renderer that could maintain state, need to keep iterating.

@rgbkrk
Copy link
Member Author

rgbkrk commented Nov 29, 2016

Closing this issue since we've checked all the boxes across ipykernel, ipython, jupyter/notebook, jupyter_client, and nteract.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants