# console capture and replay for notebooks

This notebook shows an example of an small utility that can help capture the output sent to stdout by a long-running process in a notebook.

In [70]:
from __future__ import print_function

from dl_helper import console_capture as cc

### preliminary check
The following cell should print

  `789...`

since the carriage return should make it overwrite all previous output. If not, this is possibly a [buggy notebook version](https://github.com/jupyter/notebook/issues/1970). Update notebook to at least 4.3.1


In [None]:
a = '123...\r456...\r789...'
print(a)

Backspaces are not correctly processed by notebooks, though (they are ignored). So this one should print

   `abc...`

but it will not:

In [71]:
a = '123...\r456...\r789\b\b\babc...'
print(a)

123...456...789abc...


# High-level API

## Process example

To use the high-level API we need to express our processing task in the form of a [callable](https://docs.python.org/3/library/functions.html#callable)  (a function, method, or in general any object with a `__call__` [attribute](https://docs.python.org/3/reference/datamodel.html#object.__call__)). The callable can take arbitrary arguments (as we need to parametrize the processing or send options).

In [72]:
import time

def processing_call( num, msg, **kwargs ):
    '''
    A function that basically does nothing, but takes its time to do 
    that nothing and prints out progress reports while it is doing it
      :param num: how many empty iterations to do
      :param msg: Message to be printed at processing end
      :param kwargs: Additional unused arguments, also to be printed at processing end
    '''
    start = time.time()
    # Outer loop: 'num' times
    for j in range(num):
        # Inner loop: 10 times
        for i in range(10):
            # Print progress reports. Use control characters to overwrite the line
            print( "\b\b\b\b\b\b\b\b\b\b\b\r", '{:2} {:2} {:5.2f}'.format( j+1, i+1, time.time()-start ), end=' ' )

            time.sleep(0.10)
        print('<iteration done>') # next line
        time.sleep(1)
    print( "Done! Msg={} Param={}".format(msg,kwargs) )

## Basic style
The standard procedure is creating a wrap object by passing it the callable with its required arguments ...

In [73]:
p = cc.ProcessWrap( processing_call, 4, "Hi", param="I'm a process" )

... and then calling the `start()` method on that object. It appears to be the same as executing the callable directly: the process is carried out, the cell appears as **`[*]`** (i.e. working) and the notebook as _busy_ and the progress results printed by the callable are written back to the notebook.

In [74]:
p.start()

Launching process ... 
  1 10  0.91 <iteration done>
  2 10  2.93 <iteration done>
  3 10  4.95 <iteration done>
  4 10  6.97 <iteration done>
Done! Msg=Hi Param={'param': "I'm a process"}


But behind the scenes, the printed data has been written to a temporal file, whose contents are automatically loaded upon termination as an internal data object (and the file is also automatically deleted), so they can be reprinted at will:

In [75]:
p.show()

----- STATUS: DONE -----
  1  1  0.00   1  2  0.10   1  3  0.20   1  4  0.30   1  5  0.40   1  6  0.50   1  7  0.60   1  8  0.71   1  9  0.81   1 10  0.91 <iteration done>
  2  1  2.01   2  2  2.11   2  3  2.22   2  4  2.32   2  5  2.42   2  6  2.52   2  7  2.63   2  8  2.73   2  9  2.83   2 10  2.93 <iteration done>
  3  1  4.04   3  2  4.14   3  3  4.24   3  4  4.34   3  5  4.44   3  6  4.54   3  7  4.64   3  8  4.74   3  9  4.85   3 10  4.95 <iteration done>
  4  1  6.05   4  2  6.15   4  3  6.26   4  4  6.36   4  5  6.46   4  6  6.56   4  7  6.6

### accessing raw data

The same output is also available in raw form through the `output` attribute of the object:

In [76]:
p.output

'\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  1  1  0.00 \x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  1  2  0.10 \x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  1  3  0.20 \x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  1  4  0.30 \x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  1  5  0.40 \x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  1  6  0.50 \x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  1  7  0.60 \x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  1  8  0.71 \x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  1  9  0.81 \x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  1 10  0.91 <iteration done>\n\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  2  1  2.01 \x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  2  2  2.11 \x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  2  3  2.22 \x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  2  4  2.32 \x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  2  5  2.42 \x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\x08\r  2  6  2.52 \x08\x08\x08\x08\x08\x08\x08\x08\x08\

Obviously all those control characters make the data a bit difficult to read. We can use the provided `clean_string()` function, which applies backspaces (`\b`) and carriage returns (`\r`) as they would be processed were the data sent to standard output:

In [77]:
cc.clean_string( p.output )

'  1 10  0.91 <iteration done>\n  2 10  2.93 <iteration done>\n  3 10  4.95 <iteration done>\n  4 10  6.97 <iteration done>\nDone! Msg=Hi Param={\'param\': "I\'m a process"}\n'

### closing and reopening the notebook window

More importantly, the `start()` call, though it seems to be blocking it is doing so by actually looping over a check-sleep cycle. The actual process is being carried out in a **background thread**, which will continue saving its results to the temporal file even if 
* the browser window is closed 
* or the kernel is interrupted (since that intrruption will stop only the main thread, not the background thread)

Opening the notebook later on and executing the `show()` method will print out the results stored in the file. Moreover, it will reconnect the standard output of the process with the notebook cell, so that further output gets back to the notebook. We can test it with a longer running process:

In [78]:
p2 = cc.ProcessWrap( processing_call, 50, "Hi" )

In [None]:
p2.start()  
# after launching execution of this cell, save the notebook & close the browser window 
# (confirm the warning about leaving the page)
# note: close the window, NOT the notebook

Launching process ... 
  1 10  0.91 <iteration done>
  2 10  2.93 <iteration done>


In [80]:
# Then come back again here, INTERRUPT the kernel and execute this cell. 
# It should print all output received so far, and continue from there
p2.show() 

----- STATUS: RUNNING -----
  1 10  0.91 <iteration done>
  2 10  2.93 <iteration done>
  3 10  4.95 <iteration done>
  4 10  6.97 <iteration done>
  5 10  8.98 <iteration done>
  6 10 10.99 <iteration done>
  7 10 13.00 <iteration done>
  8 10 15.02 <iteration done>
  9 10 17.04 <iteration done>
 10 10 19.06 <iteration done>
 11 10 21.09 <iteration done>
 12 10 23.11 <iteration done>
 13 10 25.14 <iteration done>
 14 10 27.16 <iteration done>
 15 10 29.17 <iteration done>
 16 10 31.20 6  3 30<iteration done>
 17 10 33.22 <iteration done>
 18 10 35.24 <iteration done>
 19 10 37.25 <iteration done>
 20 10 39.27 <iteration done>
 21 10 41.30 <iteration done>
 22 10 43.31 <iteration done>
 23 10 45.34 <iteration done>
 24 10 47.36 <iteration done>
 25 10 49.38 <iteration done>
 26 10 51.41 <iteration done>
 27 10 53.43 <iteration done>
 28 10 55.45 <iteration done>
 29 10 57.47 <iteration done>
 30 10 59.49 <iteration done>
 31 10 61.51 <iteration done>
 32 10 63.53 <iteration done>
 33 1

There is one caveat: as seen in the previous cell, the `start()` call was blocking the notebook (by doing the mentioned check-sleep cycle), so coming back to the notebook (opening its window again) will make it appear unresponsive. 

You will need to **interrupt** the kernel to stop that `start()` cell so that it is possible to execute a `show()` cell. This is not as disruptive as it seems, since what it is being interrupted is *the foreground process*, not the background thread that it is doing the real processing. But nevertheless if it is known for sure that the window will be closed and reopened, it will be better to use the non-blocking call below.

## Non-blocking usage

This modality works the same way, but adding the `block=False` parameter to the `start()` method: 

In [84]:
pn = cc.ProcessWrap( processing_call, 50, "Look! NonBlocking!" )

In [85]:
pn.start( block=False )
# Now the notebook appears free to use (not busy), though the background thread 
# is still pushing text to the cell output
# We can save & close the notebook here. There will still be a "confirm" warning, probably.

Launching process ... 
  1 10  0.91 <iteration done>
  2 10  2.93 <iteration done>
  3 10  4.95 <iteration done>


In [90]:
# After coming back to the notebook, in this modality it is no longer necessary to interrupt the kernel. 
# Just execute this cell to reconnect with the processing cell output
pn.show()

----- STATUS: DONE -----
  1  1  0.00   1  2  0.10   1  3  0.20   1  4  0.30   1  5  0.40   1  6  0.50   1  7  0.61   1  8  0.71   1  9  0.81   1 10  0.91 <iteration done>
  2  1  2.02   2  2  2.12   2  3  2.22   2  4  2.32   2  5  2.42   2  6  2.52   2  7  2.63   2  8  2.73   2  9  2.83   2 10  2.93 <iteration done>
  3  1  4.03   3  2  4.14   3  3  4.24   3  4  4.34   3  5  4.44   3  6  4.54   3  7  4.64   3  8  4.75   3  9  4.85   3 10  4.95 <iteration done>
  4  1  6.06   4  2  6.17   4  3  6.27   4  4  6.37   4  5  6.47   4  6  6.57   4  7  6.6

In [91]:
# We can still print the collected output, even after it has finished
pn.show()

----- STATUS: DONE -----
  1  1  0.00   1  2  0.10   1  3  0.20   1  4  0.30   1  5  0.40   1  6  0.50   1  7  0.61   1  8  0.71   1  9  0.81   1 10  0.91 <iteration done>
  2  1  2.02   2  2  2.12   2  3  2.22   2  4  2.32   2  5  2.42   2  6  2.52   2  7  2.63   2  8  2.73   2  9  2.83   2 10  2.93 <iteration done>
  3  1  4.03   3  2  4.14   3  3  4.24   3  4  4.34   3  5  4.44   3  6  4.54   3  7  4.64   3  8  4.75   3  9  4.85   3 10  4.95 <iteration done>
  4  1  6.06   4  2  6.17   4  3  6.27   4  4  6.37   4  5  6.47   4  6  6.57   4  7  6.6

## Additional options

### verbose

Adding `verbose=False` to the `start()` method will suppress output to the notebook (but still save to the temporal file)

In [92]:
pq = cc.ProcessWrap( processing_call, 3, "Now Quiet!" )
pq.start( verbose=False )
# This will take a few seconds to run, with no output produced while it is running

In [93]:
# After finishing, we can now print the captured data
pq.show()

----- STATUS: DONE -----
  1  1  0.00   1  2  0.10   1  3  0.20   1  4  0.30   1  5  0.40   1  6  0.50   1  7  0.60   1  8  0.70   1  9  0.80   1 10  0.90 <iteration done>
  2  1  2.00   2  2  2.10   2  3  2.20   2  4  2.30   2  5  2.40   2  6  2.51   2  7  2.61   2  8  2.71   2  9  2.81   2 10  2.91 <iteration done>
  3  1  4.01   3  2  4.11   3  3  4.21   3  4  4.31   3  5  4.41   3  6  4.51   3  7  4.61   3  8  4.71   3  9  4.81   3 10  4.91 <iteration done>
Done! Msg=Now Quiet! Param={}


### delete

The temporal file is automatically deleted upon task termination. This is usually not a problem because the data in the file is loaded onto the `ProcessWrap` object, and it can be printed via `show()` (or retrieved through the `output` attribute). But of course it will be lost if the notebook kernel is terminated (_close and halt_). If the processing is very long or important, as a safety measure the `delete=False` parameter to `start()` will cancel file removal.

In [94]:
pk = cc.ProcessWrap( processing_call, 3, "Not deleted!" )
pk.start( delete=False )

Launching process ... 
  1 10  0.92 <iteration done>
  2 10  2.93 <iteration done>
  3 10  4.95 <iteration done>
Done! Msg=Not deleted! Param={}


In [95]:
# See the filename that was used
pk.logname

'/vm/Vagrant/machine-learning-vm/notebook/vmfiles/Soft/docker-dl-gpu/dl-helper/notebook/notebook-v82kimho.log'

In [96]:
# The data is available for posterity in the logfile
# We use the io.open 'newline' argument so that (in Python 3) we avoid newline translation
import io
with io.open(pk.logname, "r", newline='') as f:
    print( f.read() )

  1  1  0.00   1  2  0.10   1  3  0.20   1  4  0.31   1  5  0.41   1  6  0.51   1  7  0.61   1  8  0.71   1  9  0.81   1 10  0.92 <iteration done>
  2  1  2.02   2  2  2.12   2  3  2.22   2  4  2.32   2  5  2.42   2  6  2.53   2  7  2.63   2  8  2.73   2  9  2.83   2 10  2.93 <iteration done>
  3  1  4.04   3  2  4.14   3  3  4.24   3  4  4.35   3  5  4.45   3  6  4.55   3  7  4.65   3  8  4.75   3  9  4.85   3 10  4.95 <iteration done>
Done! Msg=Not deleted! Param={}



In [97]:
# We can still delete it manually, of course
import os
os.unlink( pk.logname )

### transform
The `transform=` argument to the `start()` call allows applying a processing function to all output that is to be sent to console, implicitly or explicitly via the `show()` method (but **not** to the content saved to file: the logfile will still contain the exact output that was generated).

This might be useful to yield an output more adapted for displaying in the Notebook (note, however, that if the output makes use of control characters, such as progress bars do, it is difficult to add text formatting)

In [98]:
pk = cc.ProcessWrap( processing_call, 3, "transform!" )

# Make all text uppercase
pk.start( transform=lambda x: x.upper() )

Launching process ... 
  1 10  0.92 <ITERATION DONE>
  2 10  2.94 <ITERATION DONE>
  3 10  4.95 <ITERATION DONE>
DONE! MSG=TRANSFORM! PARAM={}


# Low level API
The `ProcessWrap` object functionality is implemented via lower level objects:

* `OutputDest` is a file-like object that substitutes _stdout_ & _stderr_ to perform the data capture
* `ConsoleCapture` provides an object that uses `OutputDest`, and `ConsoleCaptureCtx` a context manager for it
* `ProcessingThread` provides the processing thread that encapsulates its output via `ConsoleCaptureCtx`

These objects can be used directly if more specialised treatment is needed

### ConsoleCaptureCtx


In [99]:
import datetime

with cc.ConsoleCaptureCtx(verbose=False) as ctx1:
    for i in range(15):
        print( '{:3}  {}'.format(i+1, datetime.datetime.now()) )

In [100]:
ctx1.reprint();

  1  2019-05-09 15:06:11.663762
  2  2019-05-09 15:06:11.663816
  3  2019-05-09 15:06:11.663835
  4  2019-05-09 15:06:11.663851
  5  2019-05-09 15:06:11.663865
  6  2019-05-09 15:06:11.663881
  7  2019-05-09 15:06:11.663896
  8  2019-05-09 15:06:11.663986
  9  2019-05-09 15:06:11.664227
 10  2019-05-09 15:06:11.664242
 11  2019-05-09 15:06:11.664251
 12  2019-05-09 15:06:11.664260
 13  2019-05-09 15:06:11.664269
 14  2019-05-09 15:06:11.664278
 15  2019-05-09 15:06:11.664287


In [101]:
ctx1.data

'  1  2019-05-09 15:06:11.663762\n  2  2019-05-09 15:06:11.663816\n  3  2019-05-09 15:06:11.663835\n  4  2019-05-09 15:06:11.663851\n  5  2019-05-09 15:06:11.663865\n  6  2019-05-09 15:06:11.663881\n  7  2019-05-09 15:06:11.663896\n  8  2019-05-09 15:06:11.663986\n  9  2019-05-09 15:06:11.664227\n 10  2019-05-09 15:06:11.664242\n 11  2019-05-09 15:06:11.664251\n 12  2019-05-09 15:06:11.664260\n 13  2019-05-09 15:06:11.664269\n 14  2019-05-09 15:06:11.664278\n 15  2019-05-09 15:06:11.664287\n'

### ConsoleCapture

In [102]:
cc2 = cc.ConsoleCapture()

In [103]:
# Print something (that will not be captured)
print( "A")
print( "B" )

A
B


In [104]:
# Start capturing, write something and stop capturing
cc2.start()

print( "CDE...\r", end='')
print( "...FGH")
print( "IJK")

cc2.stop();

CDE......FGH
IJK


In [105]:
# Print what was captured, and remove the logfile
cc2.reprint().remove();

CDE......FGH
IJK
