# bokeh in the notebook: managing resources lifecycle 


## About this notebook

This notebook belongs to a series of small projects which aim is to evaluate the [Jupyter](http://jupyter.org/) ecosystem for science experiments control. The main idea is use the _Juypter notebook_ as a convergence platform in order to offer a fully featured environment to scientists. 

## About bokeh

Experiments control requires both static and dynamic (i.e live) data visualization. Since Jupyter doesn't provide any 'official' data visualization solution, we need to select one. Among the available solutions, [bokeh](http://bokeh.pydata.org/en/latest) presents the highest potential for our application.

Bokeh as been selected for its:
1. [built-in notebook integration](http://bokeh.pydata.org/en/latest/docs/user_guide/notebook.html)
2. built-in [data streaming](http://bokeh.pydata.org/en/latest/docs/reference/models/sources.html#bokeh.models.sources.ColumnDataSource.patch) [features](http://bokeh.pydata.org/en/latest/docs/reference/models/sources.html#bokeh.models.sources.ColumnDataSource.stream) for live plots update 
3. ability to add [custom or specialized behaviors](http://bokeh.pydata.org/en/latest/docs/user_guide/interaction/callbacks.html) in response to property changes and other events
4. [graphics quality](http://bokeh.pydata.org/en/latest/docs/gallery.html#gallery)

Have a look to this [quickstart](http://bokeh.pydata.org/en/latest/docs/user_guide/quickstart.html) for a bokeh overview.

## Topic of the day

The following content tries to point out a problem we faced while evaluating bokeh in the jupyter notebook. To summarize, let's say that bokeh works really well and fulfills our requirements but it currently (*) has some side-effects that make things diverge in terms of performances and memory consumption. 

(*) true for bokeh version <= 0.12.6

So let's see what we are talking about...  

### Route bokeh outputs to notebook cells
This will also load BokehJS - the JavaScript part of bokeh.

In [None]:
from bokeh.io import output_notebook
output_notebook()

### BokehSessionHandler class
See 'Lifecycle' in [bokeh server architecture](http://bokeh.pydata.org/en/latest/docs/dev_guide/server.html#devguide-server).
We simply use it to trace the bokeh server events but it might be a bit more useful in the future.

In [None]:
from __future__ import print_function

from bokeh.application.handlers import Handler

class BokehSessionHandler(Handler):

    def on_server_loaded(self, server_context):
        print("SessionHandler: on_server_loaded <<")
        print("SessionHandler: on_server_loaded >>")

    def on_server_unloaded(self, server_context):
        print("SessionHandler: on_server_unloaded <<")
        print("SessionHandler: on_server_unloaded >>")

    def on_session_created(self, session_context):
        print("SessionHandler: on_session_created <<")
        BokehServer.print_info(True)
        print("SessionHandler: on_session_created >>")

    def on_session_destroyed(self, session_context):
        print("SessionHandler: on_server_unloaded <<")
        print("SessionHandler: on_server_unloaded >>")


### BokehSession class

This is the super class of any _session_ we open on the `BokehServer` singleton. 

Our model is based on a _'one session per notebook cell'_ approach. It means that each _session_ is tightly linked to a particular cell. This is a good thing cause we obviously want the bokeh plots to appear as _outputs_ of the cell from which they've created. More generally, we'll certainly want every _output_ related to session to be routed to its associated cell. That's the next 'topic of the day' we'll treat. 

See also `BokehServer.open_session`.

In [None]:
class BokehSession(object):
    
    def __init__(self):
        """the associated bokeh document (set by 'friend' class BokehServer - for experts only)"""
        self._doc = None
        """periodic callback period in seconds - defaults to None (i.e. periodic callback disabled)"""
        self._callback_period = None
        
    def open(self):
        """open the session"""
        BokehServer.open_session(self)
        
    def close(self):
        """close the session"""
        BokehServer.close_session(self)
        
    def setup_model(self):
        """return the bokeh model - i.e. plot(s) or layout - to be attached to the session or None is no model"""
        return None

    def periodic_callback(self):
        """return the periodic callback or None is the session has no periodic activity"""
        return None
    
    @property 
    def callback_period(self):
        """return the (periodic) callback period in seconds or None (i.e. periodic callback disabled)"""
        return self._callback_period

    @callback_period.setter 
    def callback_period(self, p):
        """set the (periodic) callback period in seconds or None to disable the callback"""
        self._callback_period = p
        if self._doc is not None:
            BokehServer.update_callback_period(self)

### BokehServer class

Embedded bokeh server. Private singleton.  

In [None]:
import socket
from collections import deque

from IPython.display import HTML, clear_output

from tornado.ioloop import IOLoop
from bokeh.server.server import Server
from bokeh.application import Application
from bokeh.application.handlers import FunctionHandler
from bokeh.embed import autoload_server
from bokeh.io import reset_output

class BokehServer(object):

    __bkh_app__ = None
    __bkh_srv__ = None
    __srv_url__ = None
    __sessions__ = deque()
        
    @staticmethod
    def __start_server():
        app = Application(FunctionHandler(BokehServer.__entry_point))
        app.add(BokehSessionHandler())
        srv = Server(
            {'/': app},
            io_loop=IOLoop.instance(),
            port=0,
            host='*',
            allow_websocket_origin=['*']
        )
        srv.start()
        srv_addr = srv.address if srv.address else socket.gethostbyname(socket.gethostname())
        BokehServer.__bkh_srv__ = srv
        BokehServer.__bkh_app__ = app
        BokehServer.__srv_url__ = 'http://{}:{}'.format(srv_addr, srv.port)
        
    @staticmethod
    def __entry_point(doc):
        try:
            session = BokehServer.__sessions__.pop() #TODO: should we lock BokehServer.__sessions__? 
            session._doc = doc
            model = session.setup_model()
            if model:
                doc.add_root(model)
            BokehServer.__add_periodic_callback(session)
        except Exception as e:
            print(e)
        
    @staticmethod
    def __add_periodic_callback(session):
        assert(isinstance(session, BokehSession))
        pcb = session.periodic_callback
        try:
            session._doc.remove_periodic_callback(pcb)
        except:
            pass
        prd = session.callback_period
        if prd is not None:
            session._doc.add_periodic_callback(pcb, max(250, 1000. * prd))
        
    @staticmethod
    def open_session(new_session):
        assert(isinstance(new_session, BokehSession))
        if not BokehServer.__bkh_srv__:
            BokehServer.__start_server()
        BokehServer.__sessions__.appendleft(new_session) #TODO: should we lock BokehServer.__sessions__? 
        script = autoload_server(model=None, url=BokehServer.__srv_url__)
        html_display = HTML(script)
        display(html_display)
        
    @staticmethod
    def close_session(session):
        """totally experimental attempt to destroy a session from python!"""
        assert(isinstance(session, BokehSession))
        session_id = session._doc.session_context.id
        print("trying to destroy session '{}'".format(session_id))
        session = BokehServer.__bkh_srv__.get_session('/', session_id)
        session.destroy()
        print("session '{}' successfully destroyed".format(session_id))
        
    @staticmethod
    def update_callback_period(session):
        assert(isinstance(session, BokehSession))
        BokehServer.__add_periodic_callback(session)
        
    @staticmethod
    def print_info(called_from_session_handler=False):
        if not BokehServer.__bkh_srv__:
            print("no Bokeh server running") 
            return
        try:
            print("Bokeh server URL: {}".format(BokehServer.__srv_url__))
            sessions = BokehServer.__bkh_srv__.get_sessions()
            num_sessions = len(sessions)
            if called_from_session_handler:
                num_sessions += 1
            print("Number of opened sessions: {}".format(num_sessions))
        except Exception as e:
            print(e)
            

### MySession class
A user specialization of the `BokehSession`.

In [None]:
import numpy as np

from bokeh.plotting import figure
from bokeh.plotting.figure import Figure
from bokeh.models.glyphs import Rect
from bokeh.models import ColumnDataSource
from bokeh.models.widgets import Slider
from bokeh.layouts import layout, widgetbox

class MySession(BokehSession):
    
    def __init__(self):
        BokehSession.__init__(self)
        self.callback_period = 1.
        self._np = 100
        self._widgets_layout = None
        columns = dict()
        columns['x'] = self._gen_x_scale()
        columns['y'] = self._gen_random_data()
        self._cds = ColumnDataSource(data=columns)

    def _gen_x_scale(self):
        """x data"""
        return np.linspace(1, self._np, num=self._np, endpoint=True)
    
    def _gen_random_data(self):
        """y data"""
        return np.random.rand(self._np)
    
    def __on_refresh_period_change(self, attr, old, new):
        """called when the user changes the refresh period using the dedicated slider"""
        self.callback_period = new
        
    def __on_num_points_change(self, attr, old, new):
        """called when the user changes the number of points using the dedicated slider"""
        self._np = int(new)

    def setup_model(self):
        """setup the session model then return it"""
        rrs = Slider(start=0.25, end=2, value=self.callback_period, step=0.25, title="Refresh period [s]")
        rrs.on_change("value", self.__on_refresh_period_change)
        nps = Slider(start=0, end=1000, value=self._np, step=10, title="Num. points")
        nps.on_change("value", self.__on_num_points_change)
        p = figure(plot_width=650, plot_height=200)
        p.toolbar_location = 'above'
        p.line(x='x', y='y', source=self._cds, color="navy", alpha=0.5)
        self._widgets_layout = widgetbox(nps, rrs)
        return layout([[self._widgets_layout, p]])
    
    def periodic_callback(self):
        """periodic callback"""
        self._cds.data.update(x=self._gen_x_scale(), y=self._gen_random_data())

### SC1: let's open a first session...

In [None]:
s1 = MySession()
s1.open()

### SC2: open a second session...

In [None]:
s2 = MySession()
s2.open()

### What if we re-execute SC1 and/or SC2?
We now have two running sessions and everything works properly. However, what if we re-execute SC1 or SC2?

Please, do so and see how many running sessions we have...

We now have 3 sessions running! It means that re-executing the cell doesn't magically cleanup the previous session
. The same apply if we `clear` the cell output (see `Cell` menu > `Current Outputs` > `Clear`). 

The bad new is that things will clearly diverge after a few tens CS1 (and/or CS2) re-execution cause the zombies sessions continue to run in the background - generating some CPU load and memory leaks. 

So, the big question is: is there a way to deal with this? is there a mechanism providing a way to properly cleanup a session when the cell to which it's attached is re-executed or cleared? 

### Discussion

#### The Jupyter Notebook part of the problem
So far, the jupyter notebook doesn't provide any notification mechanism that could help in our case. There's no way to attach an action callback and some user data to a cell. We could imagine something like:
1. attach `my_cleanup_callback` to the _current cell_ for the `execute` and the `clear` actions,
2. pass `my_internal_scheming_data` as an argument when `my_cleanup_callback` is triggered.  

With such a mechanism we could easily retrieve the cell content and properly release the associated resources.
There's certainly a smarter/more adapted/... solution but that's the idea. 

In our case, one could argue that adding a `close` button to the layout returned by `MySession.setup_model` could partially solve the problem. That's true as far as the user use this button to close the cell before re-executing it. IHMO, that's ugly and error prone. However, in next section, the `MyExtendedSession` class adds such a close button in order to be able to work on the bokeh part of the problem.  

#### The bokeh part of the problem
Let's now pretend we have a way to be notified when it's time to cleanup our session(s). Ok, but, does bokeh offer a way to properly cleanup a server session? So far, no. There's currently no solution to that problem. It seems that the next bokeh release - i.e. 0.12.7 - will notably focus on performances and memory management but for now, we can't provide the scientists with something usable in production. 

Releasing the resources on python side is not the biggest part of the problem. The main problem is the memory leaks generated in the browser on JavaScript side (a.k.a BokehJS).

We tried two different designs:
- the present one, based on the `BokehServer` class
- and [this one](https://github.com/nleclercq/jupyter-for-controls/tree/master/bokeh-data-streaming-for-notebook) in which we were using a _"one server per session per model"_ approach.

In both case, we could find a solution to properly release the session resources on python side but not on JS side. 

### MyExtendedSession class: trying to do something to properly destroy a session from python 

We now reach the heart of the problem we are trying to address in this study. 

The idea here is to add a `close` button which `on_click` callback triggers a cleanup process. The latter contains everything we found useful - in the bokeh API - to release the session resources (i.e. that's just a attempt):

- [Document.clear](http://bokeh.pydata.org/en/latest/docs/reference/document.html#bokeh.document.Document.clear)
- [reset_output](http://bokeh.pydata.org/en/latest/docs/reference/io.html#bokeh.io.reset_output)
- [not documented Session.destroy](https://github.com/bokeh/bokeh/blob/master/bokeh/server/session.py). See `BokehServer.close_session` above.

In [None]:
from bokeh.models.widgets import Button

class MyExtendedSession(MySession):
    
    def __init__(self):
        MySession.__init__(self)

    def setup_model(self):
        model = super(MyExtendedSession, self).setup_model()
        b = Button(label='close')
        b.on_click(self.close)
        self._widgets_layout.children.append(b)
        return model
    
    def periodic_callback(self):
        print('MyExtendedSession.periodic_callback <<')
        super(MyExtendedSession, self).periodic_callback()
        print('MyExtendedSession.periodic_callback >>')
   
    def close(self):  
        """overwrites super.close - tries to cleanup everything properly - at least on python side"""
        try:
            # clear document content (i.e. remove roots)
            self._doc.clear()
        except Exception as e:
            self.exception(e)
        try:
            # reset_output 
            reset_output()
        except Exception as e:
            self.exception(e)
        try:  
            # close the session (will remove all callbacks)
            BokehServer.close_session(self)
        except Exception as e:
            self.exception(e) 
        # finally clear cell outputs - e.g. logging (this is an ipython call - not a bokeh one)
        clear_output()

In [None]:
s3 = MyExtendedSession()
s3.open()

In [None]:
BokehServer.print_info()

### About ServerSession.destroy
As far as I understand, `ServerSession.destroy` is normally called - through a `tornado connection` - as a consequence of a `close` request from the client (e.g. when the browser tab is closed or something similar). It means that `ServerSession.destroy` only releases the python resources. Clearly speaking, this can't be the solution we are looking for.

## Conclusion

The conclusion is quite clear: we need something to properly cleanup every single resource attached to a server session on both python and JS side from python itself. My Bokeh/JS knowledge doesn't allow me to propose something but I sure that the bokeh gurus see what I'm talking about. I hope to be able to update this notebook with some good news once 0.12.7 is released.


BTW, the content of this notebook is certainly obvious for a bokeh gurus but for the rest of us it could be useful. I personally learned A LOT about bokeh doing this work. Hope this helps you too guys.     