Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add sugar methods/properties to AsyncResult #1548

Merged
merged 8 commits into from Apr 15, 2012
Merged

Conversation

minrk
Copy link
Member

@minrk minrk commented Apr 4, 2012

Adds the following attributes / methods:

  • ar.wall_time = received - submitted
  • ar.serial_time = sum of serial computation time
  • ar.elapsed = time since submission (wall_time if done)
  • ar.progress = (int) number of sub-tasks that have completed
  • len(ar) = # of tasks
  • ar.wait_interactive(): prints progress

These are simple methods derived from the metadata/timestamps already created. But I've been persuaded by @wesm's practice of including simple methods that do useful (and/or cool) things.

This also required/revealed some minor fixes/cleanup to clear_output in some cases:

  • dedent base core.displaypub.clear_output, so it's actually defined in the class
  • clear_output publishes '\r\b', so it will clear terminal-like frontends that don't understand full clear_output behavior.
  • core.display.clear_output() still works, even outside an IPython session.

@minrk
Copy link
Member Author

minrk commented Apr 4, 2012

At @fperez request, I had a go at addressing the flicker when using clear_output in the notebook. I used a simple timeout, which gets cleared/flushed immediately on the next output-related action.

I won't object to moving that to a separate PR, but I did it here as it is related, and this new code makes significant use of the clear;print;repeat pattern for which the flicker is most problematic.

@minrk
Copy link
Member Author

minrk commented Apr 4, 2012

this notebook shows a few examples of print, the new AR.wait_interactive, and plotting with clear_output, all of which seem better behaved than previously.

@minrk minrk mentioned this pull request Apr 9, 2012
@minrk
Copy link
Member Author

minrk commented Apr 9, 2012

This PR now depends on PR #1563

@fperez
Copy link
Member

fperez commented Apr 14, 2012

@minrk, I merged #1563 but now it's producing some conflicts. Could you rebase this one? I'll be happy to review it quickly...

@fperez
Copy link
Member

fperez commented Apr 14, 2012

And as I commented in that other one, might as well make the notebook with examples part of the PR, so we beef up the notebooks we provide with the system for users. That will become even more useful once we start including their html for in the docs, along with a link to the original .ipynb file, as we'll be able to point users to those pages in the docs.

* ar.wall_time = received - submitted
* ar.serial_time = sum of serial computation time
* ar.elapsed = time since submission (wall_time if done)
* ar.progress = (int) number of sub-tasks that have completed
* ar.wait_interactive(): prints progress
* len(ar) = # of tasks
@minrk
Copy link
Member Author

minrk commented Apr 14, 2012

rebased - I'll add the notebook in the morning, probably.

@minrk
Copy link
Member Author

minrk commented Apr 14, 2012

Progress notebook added, demos:

  • clear_output/print/flush
  • AsyncResult.wait_interactive
  • clear_output/display(Figure)
  • HTML/JS progress bar
  • PyMC ProgressBar class

"collapsed": false,
"input": [
"from IPython.core.display import clear_output",
"for i in range(10):",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's missing an import time here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yes. I have import os,sys,time in my startup files, so I always forget to import them.

@fperez
Copy link
Member

fperez commented Apr 14, 2012

Minor comments inline, easy to fix up and we'll be good to go. This is awesome.

demos intermediate progress with:

* clear_output/print/flush
* AsyncResult.wait_interactive
* clear_output/display(Figure)
* HTML/JS progress bar
* PyMC ProgressBar class
@minrk
Copy link
Member Author

minrk commented Apr 14, 2012

I made your edits to the notebook, and started a doc for the AsyncResult object. I really want to make it clear that the AsyncResult object itself is where most of our API lives. At least the nifty parts.

I have one question we should resolve before merging. In wall_time, I use received - submitted, which is the actual roundtrip time for the Client. The only problem with this one is that the received timestamp is not especially reliable, particularly in interactive use cases. The reason being that this timestamp is made when the result is pulled off of the queue in the Client, as a result of Client.spin(). So if the result of the computation arrives either while the user is sitting idle at an interactive prompt or just performing some client-side computation, that time is added to the wall_time measure.

This is a larger issue for AsyncHubResults (those fetched from the Hub), where the received stamp could technically be days after the computation finished.

Possible partial solutions to this problem:

  • HubResults should use completed-submitted, ignoring result-reply overhead
  • all AsyncResults should use completed-submitted (most consistent, but probably not most useful/interesting)
  • HubResults should just not have a wall_time property
  • Add an analogue to received to the Hub's DB, and use that in HubResults
  • Leave it as-is, and let users deal with wall_time rarely being useful in HubResults (it would still be accurate if the HubResult is requested and waited upon while the computation is pending)

I'm personally inclined towards either one of the last two, but I want there to be a convenient

There are a couple of other timings that might be useful to add:

  • last completed - first started (actual time spent working, excluding overhead - I don't know what it's name should be)
  • last completed - first submitted (wall_time excluding reply overhead, which may not be interesting or meaningful)
  • last received - first started (excludes start overhead, which could have just been waiting for other jobs in the queue)
  • idle time (total time spent on each engine not during a computation) - this might be computed per-engine to draw one of those pipeline concurrency figures.

I don't know how many of these I should actually include. Perhaps I should just make a method that makes it easy to do any of the last/first X - first/last Y deltas.

appease crappy tools that can't deal with reasonable filenames.
allows 'wall_time' to make sense in cases other than simple waiting AsyncResult.
for computing various comparisons of timestamps in AsyncResults
@minrk
Copy link
Member Author

minrk commented Apr 14, 2012

Okay, review addressed I think:

  • spaces removed from filename
  • new properties documented
  • some basic tests added
  • I went with adding received to the Hub's DB for resolving the wall_time issue for HubResults
  • I added AsyncResult.timedelta() for comparing a pair of timestamp sets, which is used in AR.wall_time.

Did I miss anything else?

runs Client.spin() in a background thread at a set interval
@minrk
Copy link
Member Author

minrk commented Apr 15, 2012

Added Client.spin_thread(interval) / stop_spin_thread() for running spin in a background thread, to keep zmq queue clear.

@fperez
Copy link
Member

fperez commented Apr 15, 2012

Great, merging now. Awesome job, thanks!

fperez added a commit that referenced this pull request Apr 15, 2012
Add sugar methods/properties to AsyncResult that are generically useful:

* `ar.wall_time` = received - submitted
* `ar.serial_time` = sum of serial computation time
* `ar.elapsed` = time since submission (wall_time if done)
* `ar.progress` = (int) number of sub-tasks that have completed
* `len(ar)` = # of tasks
* `ar.wait_interactive()`: prints progress

These are simple methods derived from the metadata/timestamps already created.  But I've been persuaded by @wesm's practice of including simple methods that do useful (and/or cool) things.

This also required/revealed some minor fixes/cleanup to clear_output in some cases:

* dedent base `core.displaypub.clear_output`, so it's actually defined in the class
* clear_output publishes `'\r\b'`, so it will clear terminal-like frontends that don't understand full clear_output behavior.
* `core.display.clear_output()` still works, even outside an IPython session.

Added a new notebook that shows how to use these new methods and how to do simple animations/progress bars using `clear_output()`.

Added `Client.spin_thread(interval)` / `stop_spin_thread()` for running spin in a background thread, to keep zmq queue clear.  This can be used to ensure that timing information is as accurate as possible (at the cost of having a background thread active).
@fperez fperez merged commit e0b4311 into ipython:master Apr 15, 2012
mattvonrocketstein pushed a commit to mattvonrocketstein/ipython that referenced this pull request Nov 3, 2014
Add sugar methods/properties to AsyncResult that are generically useful:

* `ar.wall_time` = received - submitted
* `ar.serial_time` = sum of serial computation time
* `ar.elapsed` = time since submission (wall_time if done)
* `ar.progress` = (int) number of sub-tasks that have completed
* `len(ar)` = # of tasks
* `ar.wait_interactive()`: prints progress

These are simple methods derived from the metadata/timestamps already created.  But I've been persuaded by @wesm's practice of including simple methods that do useful (and/or cool) things.

This also required/revealed some minor fixes/cleanup to clear_output in some cases:

* dedent base `core.displaypub.clear_output`, so it's actually defined in the class
* clear_output publishes `'\r\b'`, so it will clear terminal-like frontends that don't understand full clear_output behavior.
* `core.display.clear_output()` still works, even outside an IPython session.

Added a new notebook that shows how to use these new methods and how to do simple animations/progress bars using `clear_output()`.

Added `Client.spin_thread(interval)` / `stop_spin_thread()` for running spin in a background thread, to keep zmq queue clear.  This can be used to ensure that timing information is as accurate as possible (at the cost of having a background thread active).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants