add sugar methods/properties to AsyncResult #1548

Merged
merged 8 commits into from Apr 15, 2012

Conversation

Projects
None yet
2 participants
@minrk
Member

minrk commented Apr 4, 2012

Adds the following attributes / methods:

  • ar.wall_time = received - submitted
  • ar.serial_time = sum of serial computation time
  • ar.elapsed = time since submission (wall_time if done)
  • ar.progress = (int) number of sub-tasks that have completed
  • len(ar) = # of tasks
  • ar.wait_interactive(): prints progress

These are simple methods derived from the metadata/timestamps already created. But I've been persuaded by @wesm's practice of including simple methods that do useful (and/or cool) things.

This also required/revealed some minor fixes/cleanup to clear_output in some cases:

  • dedent base core.displaypub.clear_output, so it's actually defined in the class
  • clear_output publishes '\r\b', so it will clear terminal-like frontends that don't understand full clear_output behavior.
  • core.display.clear_output() still works, even outside an IPython session.
@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk Apr 4, 2012

Member

At @fperez request, I had a go at addressing the flicker when using clear_output in the notebook. I used a simple timeout, which gets cleared/flushed immediately on the next output-related action.

I won't object to moving that to a separate PR, but I did it here as it is related, and this new code makes significant use of the clear;print;repeat pattern for which the flicker is most problematic.

Member

minrk commented Apr 4, 2012

At @fperez request, I had a go at addressing the flicker when using clear_output in the notebook. I used a simple timeout, which gets cleared/flushed immediately on the next output-related action.

I won't object to moving that to a separate PR, but I did it here as it is related, and this new code makes significant use of the clear;print;repeat pattern for which the flicker is most problematic.

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk Apr 4, 2012

Member

this notebook shows a few examples of print, the new AR.wait_interactive, and plotting with clear_output, all of which seem better behaved than previously.

Member

minrk commented Apr 4, 2012

this notebook shows a few examples of print, the new AR.wait_interactive, and plotting with clear_output, all of which seem better behaved than previously.

@minrk minrk referenced this pull request Apr 9, 2012

Merged

clear_output improvements #1563

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk Apr 9, 2012

Member

This PR now depends on PR #1563

Member

minrk commented Apr 9, 2012

This PR now depends on PR #1563

@fperez

This comment has been minimized.

Show comment
Hide comment
@fperez

fperez Apr 14, 2012

Member

@minrk, I merged #1563 but now it's producing some conflicts. Could you rebase this one? I'll be happy to review it quickly...

Member

fperez commented Apr 14, 2012

@minrk, I merged #1563 but now it's producing some conflicts. Could you rebase this one? I'll be happy to review it quickly...

@fperez

This comment has been minimized.

Show comment
Hide comment
@fperez

fperez Apr 14, 2012

Member

And as I commented in that other one, might as well make the notebook with examples part of the PR, so we beef up the notebooks we provide with the system for users. That will become even more useful once we start including their html for in the docs, along with a link to the original .ipynb file, as we'll be able to point users to those pages in the docs.

Member

fperez commented Apr 14, 2012

And as I commented in that other one, might as well make the notebook with examples part of the PR, so we beef up the notebooks we provide with the system for users. That will become even more useful once we start including their html for in the docs, along with a link to the original .ipynb file, as we'll be able to point users to those pages in the docs.

add sugar methods/properties to AsyncResult
* ar.wall_time = received - submitted
* ar.serial_time = sum of serial computation time
* ar.elapsed = time since submission (wall_time if done)
* ar.progress = (int) number of sub-tasks that have completed
* ar.wait_interactive(): prints progress
* len(ar) = # of tasks
@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk Apr 14, 2012

Member

rebased - I'll add the notebook in the morning, probably.

Member

minrk commented Apr 14, 2012

rebased - I'll add the notebook in the morning, probably.

@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk Apr 14, 2012

Member

Progress notebook added, demos:

  • clear_output/print/flush
  • AsyncResult.wait_interactive
  • clear_output/display(Figure)
  • HTML/JS progress bar
  • PyMC ProgressBar class
Member

minrk commented Apr 14, 2012

Progress notebook added, demos:

  • clear_output/print/flush
  • AsyncResult.wait_interactive
  • clear_output/display(Figure)
  • HTML/JS progress bar
  • PyMC ProgressBar class
@fperez

View changes

docs/examples/notebooks/Progress.ipynb
+ "collapsed": false,
+ "input": [
+ "from IPython.core.display import clear_output",
+ "for i in range(10):",

This comment has been minimized.

@fperez

fperez Apr 14, 2012

Member

It's missing an import time here.

@fperez

fperez Apr 14, 2012

Member

It's missing an import time here.

This comment has been minimized.

@minrk

minrk Apr 14, 2012

Member

ah, yes. I have import os,sys,time in my startup files, so I always forget to import them.

@minrk

minrk Apr 14, 2012

Member

ah, yes. I have import os,sys,time in my startup files, so I always forget to import them.

@fperez

View changes

docs/examples/notebooks/Progress.ipynb
+ "cell_type": "code",
+ "collapsed": false,
+ "input": [
+ "from scipy.special import jn",

This comment has been minimized.

@fperez

fperez Apr 14, 2012

Member

I'd rewrite this cell as:

from scipy.special import jn
x = linspace(0,5)
f, ax = plt.subplots()
ax.set_title("Bessel functions")

for n in range(1,10):
    time.sleep(1)
    ax.plot(x, jn(x,n))
    clear_output()
    display(f)

# close the figure at the end, so we don't get a duplicate
# of the last plot
plt.close()

It's a bit cleaner api-wise regarding matplotlib, and it completely eliminates the flicker. The trick is to do all plotting with the axis object, which doesn't force a draw call. Then, we can do clear_output right before the display call, and most likely (unless it's a super-complex plot), it will finish before any flickering happens.

You can mention this and update the comment below accordingly, indicating how this will eliminate completely all flickering in most common cases (though it can still appear for sufficiently complex plots).

@fperez

fperez Apr 14, 2012

Member

I'd rewrite this cell as:

from scipy.special import jn
x = linspace(0,5)
f, ax = plt.subplots()
ax.set_title("Bessel functions")

for n in range(1,10):
    time.sleep(1)
    ax.plot(x, jn(x,n))
    clear_output()
    display(f)

# close the figure at the end, so we don't get a duplicate
# of the last plot
plt.close()

It's a bit cleaner api-wise regarding matplotlib, and it completely eliminates the flicker. The trick is to do all plotting with the axis object, which doesn't force a draw call. Then, we can do clear_output right before the display call, and most likely (unless it's a super-complex plot), it will finish before any flickering happens.

You can mention this and update the comment below accordingly, indicating how this will eliminate completely all flickering in most common cases (though it can still appear for sufficiently complex plots).

This comment has been minimized.

@minrk

minrk Apr 14, 2012

Member

Thanks, I made this change, though it isn't true that it eliminates the flicker. Actually eliminating flicker will require changing how clear_output works in a more complicated way (don't resize height until after new content is drawn), because as it is now, it's still resizing to zero immediately before drawing the image, which takes more than zero time (though it is very quick).

@minrk

minrk Apr 14, 2012

Member

Thanks, I made this change, though it isn't true that it eliminates the flicker. Actually eliminating flicker will require changing how clear_output works in a more complicated way (don't resize height until after new content is drawn), because as it is now, it's still resizing to zero immediately before drawing the image, which takes more than zero time (though it is very quick).

This comment has been minimized.

@minrk

minrk Apr 14, 2012

Member

Hm, nevermind - the flicker does seem to be gone. That means that I misunderstood what was causing the flicker, and also suggests that calls to plot() results in some IOPub message, possibly an empty sys.stdout.flush()?

@minrk

minrk Apr 14, 2012

Member

Hm, nevermind - the flicker does seem to be gone. That means that I misunderstood what was causing the flicker, and also suggests that calls to plot() results in some IOPub message, possibly an empty sys.stdout.flush()?

@fperez

View changes

docs/examples/notebooks/Progress.ipynb
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Progress bars and clear_output",

This comment has been minimized.

@fperez

fperez Apr 14, 2012

Member

Perhaps change the title to 'Simple animations, progress bars and output area cleaning'? I'd also suggest renaming the file to something slightly more descriptive, say animations_progress_bars or similar (casing up to you). 'Progress' by itself as filename is a bit cryptic.

@fperez

fperez Apr 14, 2012

Member

Perhaps change the title to 'Simple animations, progress bars and output area cleaning'? I'd also suggest renaming the file to something slightly more descriptive, say animations_progress_bars or similar (casing up to you). 'Progress' by itself as filename is a bit cryptic.

+ return len(self.msg_ids)
+
+ #-------------------------------------
+ # Sugar methods and attributes

This comment has been minimized.

@fperez

fperez Apr 14, 2012

Member

This is fantastic, and the implementation looks great. I just think it needs mentioning somewhere in the docs, though... Otherwise users are never going to find out about these things. It could be just a paragraph pointing out their existence, but I think we do need something in the docs about these guys.

@fperez

fperez Apr 14, 2012

Member

This is fantastic, and the implementation looks great. I just think it needs mentioning somewhere in the docs, though... Otherwise users are never going to find out about these things. It could be just a paragraph pointing out their existence, but I think we do need something in the docs about these guys.

This comment has been minimized.

@minrk

minrk Apr 14, 2012

Member

I've been a bit reluctant to add things to the current docs, because I really want to trash them entirely and start from scratch, but obviously I should document new features as they are added.

@minrk

minrk Apr 14, 2012

Member

I've been a bit reluctant to add things to the current docs, because I really want to trash them entirely and start from scratch, but obviously I should document new features as they are added.

@fperez

This comment has been minimized.

Show comment
Hide comment
@fperez

fperez Apr 14, 2012

Member

Minor comments inline, easy to fix up and we'll be good to go. This is awesome.

Member

fperez commented Apr 14, 2012

Minor comments inline, easy to fix up and we'll be good to go. This is awesome.

add Animations and Progress example notebook
demos intermediate progress with:

* clear_output/print/flush
* AsyncResult.wait_interactive
* clear_output/display(Figure)
* HTML/JS progress bar
* PyMC ProgressBar class
@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk Apr 14, 2012

Member

I made your edits to the notebook, and started a doc for the AsyncResult object. I really want to make it clear that the AsyncResult object itself is where most of our API lives. At least the nifty parts.

I have one question we should resolve before merging. In wall_time, I use received - submitted, which is the actual roundtrip time for the Client. The only problem with this one is that the received timestamp is not especially reliable, particularly in interactive use cases. The reason being that this timestamp is made when the result is pulled off of the queue in the Client, as a result of Client.spin(). So if the result of the computation arrives either while the user is sitting idle at an interactive prompt or just performing some client-side computation, that time is added to the wall_time measure.

This is a larger issue for AsyncHubResults (those fetched from the Hub), where the received stamp could technically be days after the computation finished.

Possible partial solutions to this problem:

  • HubResults should use completed-submitted, ignoring result-reply overhead
  • all AsyncResults should use completed-submitted (most consistent, but probably not most useful/interesting)
  • HubResults should just not have a wall_time property
  • Add an analogue to received to the Hub's DB, and use that in HubResults
  • Leave it as-is, and let users deal with wall_time rarely being useful in HubResults (it would still be accurate if the HubResult is requested and waited upon while the computation is pending)

I'm personally inclined towards either one of the last two, but I want there to be a convenient

There are a couple of other timings that might be useful to add:

  • last completed - first started (actual time spent working, excluding overhead - I don't know what it's name should be)
  • last completed - first submitted (wall_time excluding reply overhead, which may not be interesting or meaningful)
  • last received - first started (excludes start overhead, which could have just been waiting for other jobs in the queue)
  • idle time (total time spent on each engine not during a computation) - this might be computed per-engine to draw one of those pipeline concurrency figures.

I don't know how many of these I should actually include. Perhaps I should just make a method that makes it easy to do any of the last/first X - first/last Y deltas.

Member

minrk commented Apr 14, 2012

I made your edits to the notebook, and started a doc for the AsyncResult object. I really want to make it clear that the AsyncResult object itself is where most of our API lives. At least the nifty parts.

I have one question we should resolve before merging. In wall_time, I use received - submitted, which is the actual roundtrip time for the Client. The only problem with this one is that the received timestamp is not especially reliable, particularly in interactive use cases. The reason being that this timestamp is made when the result is pulled off of the queue in the Client, as a result of Client.spin(). So if the result of the computation arrives either while the user is sitting idle at an interactive prompt or just performing some client-side computation, that time is added to the wall_time measure.

This is a larger issue for AsyncHubResults (those fetched from the Hub), where the received stamp could technically be days after the computation finished.

Possible partial solutions to this problem:

  • HubResults should use completed-submitted, ignoring result-reply overhead
  • all AsyncResults should use completed-submitted (most consistent, but probably not most useful/interesting)
  • HubResults should just not have a wall_time property
  • Add an analogue to received to the Hub's DB, and use that in HubResults
  • Leave it as-is, and let users deal with wall_time rarely being useful in HubResults (it would still be accurate if the HubResult is requested and waited upon while the computation is pending)

I'm personally inclined towards either one of the last two, but I want there to be a convenient

There are a couple of other timings that might be useful to add:

  • last completed - first started (actual time spent working, excluding overhead - I don't know what it's name should be)
  • last completed - first submitted (wall_time excluding reply overhead, which may not be interesting or meaningful)
  • last received - first started (excludes start overhead, which could have just been waiting for other jobs in the queue)
  • idle time (total time spent on each engine not during a computation) - this might be computed per-engine to draw one of those pipeline concurrency figures.

I don't know how many of these I should actually include. Perhaps I should just make a method that makes it easy to do any of the last/first X - first/last Y deltas.

@fperez

This comment has been minimized.

Show comment
Hide comment
@fperez

fperez Apr 14, 2012

Ouch, no... Let's avoid filenames with spaces in them in the repo, they always end up causing trouble for automated handling. I know the underscores look ugly in files that are much more 'gui' oriented, but spaces in filenames really are a timebomb waiting to happen for scripted work.

Ouch, no... Let's avoid filenames with spaces in them in the repo, they always end up causing trouble for automated handling. I know the underscores look ugly in files that are much more 'gui' oriented, but spaces in filenames really are a timebomb waiting to happen for scripted work.

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk Apr 14, 2012

Owner

Seriously? The notebook is a GUI tool, and GUI files should have reasonable names. Underscores are a gross hack to appease pathetic tools.

Owner

minrk replied Apr 14, 2012

Seriously? The notebook is a GUI tool, and GUI files should have reasonable names. Underscores are a gross hack to appease pathetic tools.

This comment has been minimized.

Show comment
Hide comment
@fperez

fperez Apr 14, 2012

But managing directories with filenames that have spaces in them with automated tools is extraordinarily error prone. I don't want to end up having to debug one day a script for doc generation (for example) because something downstream choked on a space. I've been bitten by that particular problem enough times in my life that I simply don't want to waste my time with that kind of problem. Practicality beats purity, in this case...

But managing directories with filenames that have spaces in them with automated tools is extraordinarily error prone. I don't want to end up having to debug one day a script for doc generation (for example) because something downstream choked on a space. I've been bitten by that particular problem enough times in my life that I simply don't want to waste my time with that kind of problem. Practicality beats purity, in this case...

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk Apr 14, 2012

Owner

I don't agree at all, and have much less patience or sympathy for shitty tools, but I will make the change.

I would think that handling simple sensible filenames is a reasonable expectation in this century.

Owner

minrk replied Apr 14, 2012

I don't agree at all, and have much less patience or sympathy for shitty tools, but I will make the change.

I would think that handling simple sensible filenames is a reasonable expectation in this century.

This comment has been minimized.

Show comment
Hide comment
@fperez

fperez Apr 14, 2012

If you really feel strongly about it, leave it as-is. If we ever run into problems I'll just rename things outright; I'm not trying to upset you over a minor change, just trying to avoid possible annoyances down the road.

If you really feel strongly about it, leave it as-is. If we ever run into problems I'll just rename things outright; I'm not trying to upset you over a minor change, just trying to avoid possible annoyances down the road.

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk Apr 14, 2012

Owner

It's not a big deal - I just get annoyed at how bad command-line tools are sometimes, and how *ix standard practices like this one tend to be based on how these tools suck at basic functionality.

I do think that if this is to be standard practice in the notebook, then we should proabably unescape underscores, and draw regular spaces in the notebook list. A listing full of underscores that are really spaces is pretty ugly, and simply doesn't fit in with the web today, where spaces are handled frequently and without issue.

Owner

minrk replied Apr 14, 2012

It's not a big deal - I just get annoyed at how bad command-line tools are sometimes, and how *ix standard practices like this one tend to be based on how these tools suck at basic functionality.

I do think that if this is to be standard practice in the notebook, then we should proabably unescape underscores, and draw regular spaces in the notebook list. A listing full of underscores that are really spaces is pretty ugly, and simply doesn't fit in with the web today, where spaces are handled frequently and without issue.

minrk added some commits Apr 14, 2012

add 'received' timestamp to DB
allows 'wall_time' to make sense in cases other than simple waiting AsyncResult.
add AsyncResult.timedelta
for computing various comparisons of timestamps in AsyncResults
remove spaces in progress notebook filename
appease crappy tools that can't deal with reasonable filenames.
@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk Apr 14, 2012

Member

Okay, review addressed I think:

  • spaces removed from filename
  • new properties documented
  • some basic tests added
  • I went with adding received to the Hub's DB for resolving the wall_time issue for HubResults
  • I added AsyncResult.timedelta() for comparing a pair of timestamp sets, which is used in AR.wall_time.

Did I miss anything else?

Member

minrk commented Apr 14, 2012

Okay, review addressed I think:

  • spaces removed from filename
  • new properties documented
  • some basic tests added
  • I went with adding received to the Hub's DB for resolving the wall_time issue for HubResults
  • I added AsyncResult.timedelta() for comparing a pair of timestamp sets, which is used in AR.wall_time.

Did I miss anything else?

@fperez

This comment has been minimized.

Show comment
Hide comment
@fperez

fperez Apr 14, 2012

awesome, thanks for all these tests!

awesome, thanks for all these tests!

@fperez

This comment has been minimized.

Show comment
Hide comment
@fperez

fperez Apr 14, 2012

And this is also great.

And this is also great.

add Client.spin_thread()
runs Client.spin() in a background thread at a set interval
@minrk

This comment has been minimized.

Show comment
Hide comment
@minrk

minrk Apr 15, 2012

Member

Added Client.spin_thread(interval) / stop_spin_thread() for running spin in a background thread, to keep zmq queue clear.

Member

minrk commented Apr 15, 2012

Added Client.spin_thread(interval) / stop_spin_thread() for running spin in a background thread, to keep zmq queue clear.

@fperez

This comment has been minimized.

Show comment
Hide comment
@fperez

fperez Apr 15, 2012

Member

Great, merging now. Awesome job, thanks!

Member

fperez commented Apr 15, 2012

Great, merging now. Awesome job, thanks!

fperez added a commit that referenced this pull request Apr 15, 2012

Merge pull request #1548 from minrk/ar_sugar
Add sugar methods/properties to AsyncResult that are generically useful:

* `ar.wall_time` = received - submitted
* `ar.serial_time` = sum of serial computation time
* `ar.elapsed` = time since submission (wall_time if done)
* `ar.progress` = (int) number of sub-tasks that have completed
* `len(ar)` = # of tasks
* `ar.wait_interactive()`: prints progress

These are simple methods derived from the metadata/timestamps already created.  But I've been persuaded by @wesm's practice of including simple methods that do useful (and/or cool) things.

This also required/revealed some minor fixes/cleanup to clear_output in some cases:

* dedent base `core.displaypub.clear_output`, so it's actually defined in the class
* clear_output publishes `'\r\b'`, so it will clear terminal-like frontends that don't understand full clear_output behavior.
* `core.display.clear_output()` still works, even outside an IPython session.

Added a new notebook that shows how to use these new methods and how to do simple animations/progress bars using `clear_output()`.

Added `Client.spin_thread(interval)` / `stop_spin_thread()` for running spin in a background thread, to keep zmq queue clear.  This can be used to ensure that timing information is as accurate as possible (at the cost of having a background thread active).

@fperez fperez merged commit e0b4311 into ipython:master Apr 15, 2012

mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014

Merge pull request #1548 from minrk/ar_sugar
Add sugar methods/properties to AsyncResult that are generically useful:

* `ar.wall_time` = received - submitted
* `ar.serial_time` = sum of serial computation time
* `ar.elapsed` = time since submission (wall_time if done)
* `ar.progress` = (int) number of sub-tasks that have completed
* `len(ar)` = # of tasks
* `ar.wait_interactive()`: prints progress

These are simple methods derived from the metadata/timestamps already created.  But I've been persuaded by @wesm's practice of including simple methods that do useful (and/or cool) things.

This also required/revealed some minor fixes/cleanup to clear_output in some cases:

* dedent base `core.displaypub.clear_output`, so it's actually defined in the class
* clear_output publishes `'\r\b'`, so it will clear terminal-like frontends that don't understand full clear_output behavior.
* `core.display.clear_output()` still works, even outside an IPython session.

Added a new notebook that shows how to use these new methods and how to do simple animations/progress bars using `clear_output()`.

Added `Client.spin_thread(interval)` / `stop_spin_thread()` for running spin in a background thread, to keep zmq queue clear.  This can be used to ensure that timing information is as accurate as possible (at the cost of having a background thread active).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment