Skip to content

add sugar methods/properties to AsyncResult #1548

Merged
merged 8 commits into from Apr 15, 2012

2 participants

@minrk
IPython member
minrk commented Apr 4, 2012

Adds the following attributes / methods:

  • ar.wall_time = received - submitted
  • ar.serial_time = sum of serial computation time
  • ar.elapsed = time since submission (wall_time if done)
  • ar.progress = (int) number of sub-tasks that have completed
  • len(ar) = # of tasks
  • ar.wait_interactive(): prints progress

These are simple methods derived from the metadata/timestamps already created. But I've been persuaded by @wesm's practice of including simple methods that do useful (and/or cool) things.

This also required/revealed some minor fixes/cleanup to clear_output in some cases:

  • dedent base core.displaypub.clear_output, so it's actually defined in the class
  • clear_output publishes '\r\b', so it will clear terminal-like frontends that don't understand full clear_output behavior.
  • core.display.clear_output() still works, even outside an IPython session.
@minrk
IPython member
minrk commented Apr 4, 2012

At @fperez request, I had a go at addressing the flicker when using clear_output in the notebook. I used a simple timeout, which gets cleared/flushed immediately on the next output-related action.

I won't object to moving that to a separate PR, but I did it here as it is related, and this new code makes significant use of the clear;print;repeat pattern for which the flicker is most problematic.

@minrk
IPython member
minrk commented Apr 4, 2012

this notebook shows a few examples of print, the new AR.wait_interactive, and plotting with clear_output, all of which seem better behaved than previously.

@minrk minrk referenced this pull request Apr 9, 2012
Merged

clear_output improvements #1563

@minrk
IPython member
minrk commented Apr 9, 2012

This PR now depends on PR #1563

@fperez
IPython member
fperez commented Apr 14, 2012

@minrk, I merged #1563 but now it's producing some conflicts. Could you rebase this one? I'll be happy to review it quickly...

@fperez
IPython member
fperez commented Apr 14, 2012

And as I commented in that other one, might as well make the notebook with examples part of the PR, so we beef up the notebooks we provide with the system for users. That will become even more useful once we start including their html for in the docs, along with a link to the original .ipynb file, as we'll be able to point users to those pages in the docs.

@minrk minrk add sugar methods/properties to AsyncResult
* ar.wall_time = received - submitted
* ar.serial_time = sum of serial computation time
* ar.elapsed = time since submission (wall_time if done)
* ar.progress = (int) number of sub-tasks that have completed
* ar.wait_interactive(): prints progress
* len(ar) = # of tasks
148c775
@minrk
IPython member
minrk commented Apr 14, 2012

rebased - I'll add the notebook in the morning, probably.

@minrk
IPython member
minrk commented Apr 14, 2012

Progress notebook added, demos:

  • clear_output/print/flush
  • AsyncResult.wait_interactive
  • clear_output/display(Figure)
  • HTML/JS progress bar
  • PyMC ProgressBar class
@fperez fperez and 1 other commented on an outdated diff Apr 14, 2012
docs/examples/notebooks/Progress.ipynb
+ "What the notebook *does* support is explicit `clear_output`, and you can use this to replace previous",
+ "output (specifying stdout/stderr or the special IPython display outputs)."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "A simple example printing our progress iterating through a list:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "collapsed": false,
+ "input": [
+ "from IPython.core.display import clear_output",
+ "for i in range(10):",
@fperez
IPython member
fperez added a note Apr 14, 2012

It's missing an import time here.

@minrk
IPython member
minrk added a note Apr 14, 2012

ah, yes. I have import os,sys,time in my startup files, so I always forget to import them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@fperez fperez and 1 other commented on an outdated diff Apr 14, 2012
docs/examples/notebooks/Progress.ipynb
+ },
+ {
+ "cell_type": "code",
+ "collapsed": false,
+ "input": [
+ "%pylab inline"
+ ],
+ "language": "python",
+ "outputs": [],
+ "prompt_number": 6
+ },
+ {
+ "cell_type": "code",
+ "collapsed": false,
+ "input": [
+ "from scipy.special import jn",
@fperez
IPython member
fperez added a note Apr 14, 2012

I'd rewrite this cell as:

from scipy.special import jn
x = linspace(0,5)
f, ax = plt.subplots()
ax.set_title("Bessel functions")

for n in range(1,10):
    time.sleep(1)
    ax.plot(x, jn(x,n))
    clear_output()
    display(f)

# close the figure at the end, so we don't get a duplicate
# of the last plot
plt.close()

It's a bit cleaner api-wise regarding matplotlib, and it completely eliminates the flicker. The trick is to do all plotting with the axis object, which doesn't force a draw call. Then, we can do clear_output right before the display call, and most likely (unless it's a super-complex plot), it will finish before any flickering happens.

You can mention this and update the comment below accordingly, indicating how this will eliminate completely all flickering in most common cases (though it can still appear for sufficiently complex plots).

@minrk
IPython member
minrk added a note Apr 14, 2012

Thanks, I made this change, though it isn't true that it eliminates the flicker. Actually eliminating flicker will require changing how clear_output works in a more complicated way (don't resize height until after new content is drawn), because as it is now, it's still resizing to zero immediately before drawing the image, which takes more than zero time (though it is very quick).

@minrk
IPython member
minrk added a note Apr 14, 2012

Hm, nevermind - the flicker does seem to be gone. That means that I misunderstood what was causing the flicker, and also suggests that calls to plot() results in some IOPub message, possibly an empty sys.stdout.flush()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@fperez fperez commented on an outdated diff Apr 14, 2012
docs/examples/notebooks/Progress.ipynb
@@ -0,0 +1,226 @@
+{
+ "metadata": {
+ "name": "Progress"
+ },
+ "nbformat": 3,
+ "worksheets": [
+ {
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Progress bars and clear_output",
@fperez
IPython member
fperez added a note Apr 14, 2012

Perhaps change the title to 'Simple animations, progress bars and output area cleaning'? I'd also suggest renaming the file to something slightly more descriptive, say animations_progress_bars or similar (casing up to you). 'Progress' by itself as filename is a bit cryptic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@fperez fperez commented on the diff Apr 14, 2012
IPython/parallel/client/asyncresult.py
@@ -255,7 +259,76 @@ def __iter__(self):
# already done
for r in rlist:
yield r
-
+
+ def __len__(self):
+ return len(self.msg_ids)
+
+ #-------------------------------------
+ # Sugar methods and attributes
@fperez
IPython member
fperez added a note Apr 14, 2012

This is fantastic, and the implementation looks great. I just think it needs mentioning somewhere in the docs, though... Otherwise users are never going to find out about these things. It could be just a paragraph pointing out their existence, but I think we do need something in the docs about these guys.

@minrk
IPython member
minrk added a note Apr 14, 2012

I've been a bit reluctant to add things to the current docs, because I really want to trash them entirely and start from scratch, but obviously I should document new features as they are added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@fperez
IPython member
fperez commented Apr 14, 2012

Minor comments inline, easy to fix up and we'll be good to go. This is awesome.

@minrk minrk add Animations and Progress example notebook
demos intermediate progress with:

* clear_output/print/flush
* AsyncResult.wait_interactive
* clear_output/display(Figure)
* HTML/JS progress bar
* PyMC ProgressBar class
f5e3cc0
@minrk
IPython member
minrk commented Apr 14, 2012

I made your edits to the notebook, and started a doc for the AsyncResult object. I really want to make it clear that the AsyncResult object itself is where most of our API lives. At least the nifty parts.

I have one question we should resolve before merging. In wall_time, I use received - submitted, which is the actual roundtrip time for the Client. The only problem with this one is that the received timestamp is not especially reliable, particularly in interactive use cases. The reason being that this timestamp is made when the result is pulled off of the queue in the Client, as a result of Client.spin(). So if the result of the computation arrives either while the user is sitting idle at an interactive prompt or just performing some client-side computation, that time is added to the wall_time measure.

This is a larger issue for AsyncHubResults (those fetched from the Hub), where the received stamp could technically be days after the computation finished.

Possible partial solutions to this problem:

  • HubResults should use completed-submitted, ignoring result-reply overhead
  • all AsyncResults should use completed-submitted (most consistent, but probably not most useful/interesting)
  • HubResults should just not have a wall_time property
  • Add an analogue to received to the Hub's DB, and use that in HubResults
  • Leave it as-is, and let users deal with wall_time rarely being useful in HubResults (it would still be accurate if the HubResult is requested and waited upon while the computation is pending)

I'm personally inclined towards either one of the last two, but I want there to be a convenient

There are a couple of other timings that might be useful to add:

  • last completed - first started (actual time spent working, excluding overhead - I don't know what it's name should be)
  • last completed - first submitted (wall_time excluding reply overhead, which may not be interesting or meaningful)
  • last received - first started (excludes start overhead, which could have just been waiting for other jobs in the queue)
  • idle time (total time spent on each engine not during a computation) - this might be computed per-engine to draw one of those pipeline concurrency figures.

I don't know how many of these I should actually include. Perhaps I should just make a method that makes it easy to do any of the last/first X - first/last Y deltas.

@fperez

Ouch, no... Let's avoid filenames with spaces in them in the repo, they always end up causing trouble for automated handling. I know the underscores look ugly in files that are much more 'gui' oriented, but spaces in filenames really are a timebomb waiting to happen for scripted work.

IPython member

Seriously? The notebook is a GUI tool, and GUI files should have reasonable names. Underscores are a gross hack to appease pathetic tools.

IPython member

But managing directories with filenames that have spaces in them with automated tools is extraordinarily error prone. I don't want to end up having to debug one day a script for doc generation (for example) because something downstream choked on a space. I've been bitten by that particular problem enough times in my life that I simply don't want to waste my time with that kind of problem. Practicality beats purity, in this case...

IPython member

I don't agree at all, and have much less patience or sympathy for shitty tools, but I will make the change.

I would think that handling simple sensible filenames is a reasonable expectation in this century.

IPython member

If you really feel strongly about it, leave it as-is. If we ever run into problems I'll just rename things outright; I'm not trying to upset you over a minor change, just trying to avoid possible annoyances down the road.

IPython member

It's not a big deal - I just get annoyed at how bad command-line tools are sometimes, and how *ix standard practices like this one tend to be based on how these tools suck at basic functionality.

I do think that if this is to be standard practice in the notebook, then we should proabably unescape underscores, and draw regular spaces in the notebook list. A listing full of underscores that are really spaces is pretty ugly, and simply doesn't fit in with the web today, where spaces are handled frequently and without issue.

minrk added some commits Apr 14, 2012
@minrk minrk add 'received' timestamp to DB
allows 'wall_time' to make sense in cases other than simple waiting AsyncResult.
bd8a8ec
@minrk minrk add AsyncResult.timedelta
for computing various comparisons of timestamps in AsyncResults
d004201
@minrk minrk remove spaces in progress notebook filename
appease crappy tools that can't deal with reasonable filenames.
978ac5b
@minrk minrk test new AsyncResult properties 66cc8f8
@minrk minrk document new AsyncResult properties 89a00db
@minrk
IPython member
minrk commented Apr 14, 2012

Okay, review addressed I think:

  • spaces removed from filename
  • new properties documented
  • some basic tests added
  • I went with adding received to the Hub's DB for resolving the wall_time issue for HubResults
  • I added AsyncResult.timedelta() for comparing a pair of timestamp sets, which is used in AR.wall_time.

Did I miss anything else?

@fperez

awesome, thanks for all these tests!

@fperez

And this is also great.

@minrk minrk add Client.spin_thread()
runs Client.spin() in a background thread at a set interval
472cd24
@minrk
IPython member
minrk commented Apr 15, 2012

Added Client.spin_thread(interval) / stop_spin_thread() for running spin in a background thread, to keep zmq queue clear.

@fperez
IPython member
fperez commented Apr 15, 2012

Great, merging now. Awesome job, thanks!

@fperez fperez merged commit e0b4311 into ipython:master Apr 15, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.