# Interactive geographical visualisation of MONROE data

## Features

This notebook enables on-demand visualisation of data collected with MONROE platform on a geographical map. It provides the following features:
* Fast visualisation of multiple parameters of data collected on MONROE nodes along the geographical dimensions.
* Adaptive granularity, where the data resolution is adjusted by the user.
* Selection and on-disk storage of visualised data for further analysis in Orange data mining toolbox. 

## Prerequisites

### Database access
Cassandra DB used for the central MONROE data repository is a no-SQL database inappropriate for time-series data mining. Instead, this notebook requires that the data is stored in an Influx DB accessible from the machine where this script is run. Influx DB is a database that __[performs up to 168 times faster for certain queries than Cassandra DB](https://www.influxdata.com/blog/influxdb-vs-cassandra-time-series/)__. To create a replica of MONROE data on your local Influx DB: 
> 1) Create recipes for loading and naming MONROE data tables and their attributes that you plan to have in your local database. __[Example recipes](https://github.com/ivek1312/ricercando/tree/master/scripts/recipes)__.

> 2) Download __[MONROE daily dump CSV files](https://www.monroe-system.eu/user/dailyDumps/)__ of tables for which the recepies are present and for dates for which you would like to have data in your local database.

> 3) Run cassandra_dump_to_line_protocol.sh as __[per instructions](https://github.com/ivek1312/ricercando/tree/master/scripts)__.

### Python packages
The notebook requires the following Python packages:
* **ricercando** - this package is bundled in __[RICERCANDO repository](https://github.com/ivek1312/ricercando)__ and can be installed with ```pip install -e ."``` ran in the repository's root directory.

## Known issues
* Plotting measurements sampled at one second intervalcan be very slow or can crash you browser. 


## Analysis flow
Please run the following cells one after another, starting with the pre-initialisation cells.

### Pre-initialisation

In [None]:
# Set to database IP. This must be reachable from the machine where this script is ran. 
DB_IP='192.168.27.75'
# DB_IP='localhost'

In [None]:
#load with this parameters
#jupyter notebook --NotebookApp.iopub_data_rate_limit=10000000000

#problem with zoom https://github.com/ioam/geoviews/issues/111
#pip  install --user --pre -i https://pypi.anaconda.org/bokeh/channel/dev/simple bokeh==0.12.14dev6 --extra-index-url https://pypi.python.org/simple/

import holoviews as hv
import param, paramnb
import pandas as pd
from colorcet import cm #pip install colorcet
import numpy as np
from bokeh.models import HoverTool

from bokeh.models import WMTSTileSource
from operator import itemgetter

from cartopy import crs as ccrs


#https://stackoverflow.com/questions/41675041/bokeh-time-series-plot-annotation-is-off-by-1-hour/41698735
#https://github.com/bokeh/bokeh/issues/5499
#https://github.com/bokeh/bokeh/issues/1135
#https://github.com/bokeh/bokeh/issues/729
#https://github.com/bokeh/bokeh/issues/1103


hv.notebook_extension('bokeh', width=100)

from ricercando import set_connection_params, all_tables, all_nodes, getdf, tables_for_node, nodes_for_table
from ricercando.db import _CATEGORICAL_COLUMNS
set_connection_params(host=DB_IP)
#set_connection_params(host='localhost')


rtt_opts= {'Points':{'style':dict(cmap='Set3', size=2), 'plot':dict( color_index='Message', width=400, height=400, colorbar=True, tools=['hover', 'lasso_select', 'box_select']) }}

#data type, if df doesnt coontain these valuse, fill them with appropriate NA values => 'None' if categorical, zero if continous
values = {}
categorical = list(_CATEGORICAL_COLUMNS)

continous = ['Altitude', 'SatelliteCount', 'Speed', 'RTT', 'BootCounter', 'CPU_Apps', 'CPU_User', 'CumUptime', 'Swap', 
             'Uptime', 'RSCP','RSRP','RSRQ','RSSI', 'Temperature', 'IOWait', 
             'TCPCbytesAll','TCPSbytesAll','TCPDuration','TCPCRTTAVG','TCPCRTTSTD','TCPCPktsRetx','TCPCPktsOOO','TCPSPktsRetx','TCPSPktsOOO',
            'Download','Upload','RTTClient','RTTServer','Status'
            'UDPCbytesAll','UDPSbytesAll','UDPCDurat','UDPSDurat', 'TCPGoodPutUpload', 'TCPGoodPutDownload', 'UDPGoodPutUpload', 'UDPGoodPutDownload']

for val in categorical:
    values[val]='None'
    
for val in continous:
    values[val]=0

values['RTT']=-50 #nans not plotted, lets visualize that with -50
kdims=['Longitude','Latitude']

#all tables from dataframe
tables = 'ping gps modem event sensor nettest tcpcomplete udpcomplete'

import geoviews as gv

from bokeh.tile_providers import STAMEN_TONER
tiles = {'OpenMap': WMTSTileSource(url='http://c.tile.openstreetmap.org/{Z}/{X}/{Y}.png'),
         'ESRI': WMTSTileSource(url='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{Z}/{Y}/{X}.jpg'),
         'Wikipedia': WMTSTileSource(url='https://maps.wikimedia.org/osm-intl/{Z}/{X}/{Y}@2x.png'),
         'StamenToner': STAMEN_TONER}

tile_options = dict(width=600,height=600, xaxis=None,yaxis=None,bgcolor='white',show_grid=True)
  
#changes the html object when another column is selected, it si not possible to draw to same graph with different axes
def render(obj):
    renderer = hv.renderer('bokeh')
    plot = renderer.get_plot(obj)
    size = renderer.get_size(plot)
    #return renderer.figure_data(plot), size #bokeh older than 0.12.10 and holoview older than 1.9.0
    return renderer._figure_data(plot), size

def fixVals(df):
    tmp = df.copy()
    for val in categorical: #categorical data have nan and can't be shown if nan is not set to some new categirical value=>'None'
        if val in tmp.columns:
            tmp[val] = tmp[val].cat.add_categories("None")
        else: tmp[val] = np.nan
                    
    for val in continous:
        if val not in tmp.columns:
            tmp[val] = np.nan
    
    return tmp.fillna(value=values)


In [None]:
# Classes for data exploration

class DateExplorer(hv.streams.Stream):
    
    output = paramnb.view.HTML(renderer=render)
    
    Node = param.ObjectSelector(default='582', objects=nodes_for_table()['gps'], precedence=5)
    
    Day = param.ObjectSelector(default='01', objects=["%.2d" % i for i in range(1,32)], precedence=1)
    Month = param.ObjectSelector(default='01', objects=["%.2d" % i for i in range(1,13)],precedence=2)

    
    Year = param.Integer(default=2018, bounds=(2016, 2018),precedence=3)
    Coloring = param.ObjectSelector(default='RTT', objects=continous+categorical, precedence=4)
    Colormap = param.ObjectSelector(default=cm['linear_bmy_10_95_c71'], objects=cm.values())
    Sampling = param.ObjectSelector(default='1m', objects=['30m','1m','1s'])
    
    MapTile = param.ObjectSelector(default="OpenMap", objects=['OpenMap','ESRI','Wikipedia','StamenToner'])

    data = []
    callbacks = []
    
    def retData(self):
        return pd.concat([d[1].iloc[c.index] for d,c in zip(self.data,self.callbacks)])
    
    class CallbackClass(object):
        def __init__(self): 
            self.index=None
        def callback(self, index):
            self.index = index
            return hv.Overlay([])
    
    def event(self, **kwargs):
        if self.output is None or 'Day' in kwargs or 'Month' in kwargs or 'Year' in kwargs or 'Node' in kwargs  or 'Coloring' in kwargs or 'Colormap' in kwargs or 'MapTile' in kwargs or 'Sampling' in kwargs:
            
            df = getdf(tables, nodeid=self.Node, start_time='{0}-{1}-{2} 00:00:00'.format(self.Year, self.Month, self.Day), 
                        end_time='{0}-{1}-{2} 23:59:59'.format(self.Year, self.Month, self.Day), freq=self.Sampling)
            
            if df.empty or 'Iccid' not in df.columns:
                self.output  = hv.Points(pd.DataFrame([[0,0]], columns=kdims), kdims=kdims,vdims=[], label='Empty dataframe').opts(rtt_opts); return
            
            iccidGroups = [(iccid,group.reset_index())  for iccid,group in df.groupby('Iccid')
                           if all(x in group.columns for x in ['Latitude', 'Longitude', self.Coloring ]) and
                           group.Latitude.notnull().any() and group.Longitude.notnull().any() and  group[self.Coloring].notnull().any()
                          ]
            if not iccidGroups:
                self.output  = hv.Points(pd.DataFrame([[0,0]], columns=kdims), kdims=kdims,vdims=[],label='GPS or '+self.Coloring+' missing.').opts(rtt_opts); return
                        
            iccidGroups = sorted(iccidGroups, key=itemgetter(0))
            iccidGroups4plot = [ (iccid,fixVals(group))  for iccid,group in iccidGroups]
            self.data = iccidGroups

            

            rtt_opts['Points']['plot']['color_index']=self.Coloring

            rtt_opts['Points']['style']['cmap']=self.Colormap
        
            HVpoints = [ gv.Points(group, kdims=kdims, vdims=categorical+continous, label=iccid).opts(rtt_opts ) for iccid,group in iccidGroups4plot]

            streams4points = [hv.streams.Selection1D(source=points) for points in HVpoints]
            
            self.callbacks = [self.CallbackClass() for point in HVpoints]
            dmaps = [hv.DynamicMap(callback.callback ,kdims=[], streams=[selection])  for callback,selection in zip(self.callbacks,streams4points)]
            self.output = hv.Layout([point*dmap*gv.WMTS(tiles[self.MapTile]) for point,dmap in zip(HVpoints,dmaps)]).cols(2)

        else:            
            super(DateExplorer, self).event( **kwargs)
            
class GPSPlot(object):
    def __init__(self): 
        self.explorer = DateExplorer()
        paramnb.Widgets(self.explorer, continuous_update=True, callback=self.explorer.event, on_init=True)
    def retData(self):
        return self.explorer.retData()

## Visualisation initalisation and interaction
Running the cell below should produce a node/date/parameter selector and geographical plots of these -- one plot for each of the node's interfaces.
If you want to visualise data from multiple nodes simultaneously, simply copy the cell, rename the variable (say to ```plot2```) and run it, as shown in the example below.

### Interactive visualisation
The visualisation widget allows the user to:
* Select the date (day, month, year) for which the data will be shown. 
* Select the node whose data will be visualised.
* Select the parameter that will correspond to the coloring of the plotted points. 
* Select the colormap.

The data is initially always shown on a 24-hour plot and for all interfaces on the selected node (plot title corresponds to the interface ICCID). However, the user can zoom in to a particular region on the plot, in which case all plots are zoomed in simultaneously.

In [None]:
plot1 = GPSPlot()

In [None]:
# plot2 = GPSPlot()

## Data selection and storage
Data can be selected with Lasso select or Box select on a plot. 
Calling ```retData()``` function of the ```GPSPlot``` object returns a data frame that corresponds to the selected data as in the example below.

In [None]:
df_selected = plot1.retData()[['Iccid', 'Latitude','Longitude','RSSI']]

The selected data can now be stored on a local disk and loaded in Orange using the iPython connector widget from the MONROE toolbox. 

In [None]:
# Stores the df_selected dataframe to a local disk.
%store df_selected