# Annotation tool for time series data

By: Stefania Russo, Kris Villez
Copyright: 2018, distributed with BSD3 license 

## The challenge

In the context of the ADASen project, we want to address research questions regarding the utility of supervised and unsupervised machine learning models in anomaly detection for environmental systems. We have therefore selected a range of anomaly detection methods for benchmarking on data sets produced by six infrastructures at Eawag.

Critical to the benchmarking is the availability of fully labelled training and test data sets of normal and abnormal behavior in environmental data. 
An annotation tool has therefore being developed to perform the labelling procedure.

This notebook shows an application of the labelling procedure to time series data. Here, each time series is a univariate 24h signal from a spatially-distributed, low-power sensor network.

Each series is visualised as a 3am-3am time series.

## Current method

Below are described the steps for data access, data preparation, visualization and labelling procedure.

- The data is in the form of .csv data files. Each data file consists of many 24h sets across 3 sensors.
- Corruption checks are performed and dates cointaining corrupted time-series are removed
- The labelling procedure starts and the first plots are displayed. The 3 plots at the top are univariate sensor signals, where the bottom plot shows a collection of these signals.  

- The annotation tool allows the labelling expert to interactivelly select multiple portions of the time series by moving through the data with the mouse cursor.

- Each time the button 'Next' is clicked, all the selected areas (time index and sensor value) are saved together with information about the date stamp date. At the end of the procedure, the user can easily access to the anomaly labels in an easy manner.

- When the process is over, the plots need to be closed and then the cell 'Save labelled data' hs to be run 

- Note: if the user wants to change any of his selections, he needs to move forward to the next plot by clicking 'Next', perform a selection of the anomalous data, and then go back and restart.


# Usage

 - Install python and open this Jupyter notebook 
 - Paste your working directory into path_all
 
# NOTE: 
## This is a beta version of the labelling tool! Please provide any feedback




# Iniziatization

In [1]:
# Import Statements

import os
import numpy as np
import pandas as pd
import csv
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.pylab as pl
import matplotlib.gridspec as gridspec
from matplotlib.widgets import Button
from matplotlib.widgets import SpanSelector
import itertools
from sklearn import preprocessing
import seaborn as sns
import datetime as datetime
from datetime import timedelta



### Options

In [2]:
# Windows server: 
# path_all = ('//eaw-depts/eng$/IngData/StefaniaRusso/NEST/')

# iOS server: 
#path_all = ("/Volumes/NEST/")

# On personal laptop
path_all = ('/Users/russoste/Desktop/NEST/Data/months/')


save_path = path_all     # Destination folder to for labelled data

In [48]:
# Select Case

Case = 1     # case 1 : GAK, case 2: Pressure T1 T2

if Case == 1:
    folder = 'data_pressure_sensor_GAK/'
    name_of_file1 = '08_August 2018.csv'
    
if Case == 2:
    folder = 'data_pressure_sensors_T1_T2/' 
    name_of_file1 = '08_August 2018.csv'


# Load data and basic sanity checks

In [49]:
# Load data

completePath = os.path.join(path_all, folder, name_of_file1) 
df = pd.read_csv(completePath)

In [50]:
df

Unnamed: 0,day,hour,p3,p4
0,01.08.2018,00:00:00,91.6,12.4
1,01.08.2018,00:00:10,91.6,12.4
2,01.08.2018,00:00:20,91.6,12.4
3,01.08.2018,00:00:30,91.6,12.4
4,01.08.2018,00:00:40,91.6,12.4
5,01.08.2018,00:00:50,91.6,12.4
6,01.08.2018,00:01:00,91.6,12.4
7,01.08.2018,00:01:10,91.6,12.4
8,01.08.2018,00:01:20,91.6,12.5
9,01.08.2018,00:01:30,91.6,12.4


In [51]:
df2 = df.copy(deep=True)

sr0 = df2.keys()[2]
sr1 = df2.keys()[3]
print('Sensor names:',sr0,',', sr1)


Sensor names: p3 , p4


In [52]:
np.unique(df2['day'])

array(['01.08.2018', '02.08.2018', '03.08.2018', '04.08.2018',
       '05.08.2018', '06.08.2018', '07.08.2018', '08.08.2018',
       '09.08.2018', '10.08.2018', '11.08.2018', '12.08.2018',
       '13.08.2018', '14.08.2018', '15.08.2018', '16.08.2018',
       '17.08.2018', '18.08.2018', '19.08.2018', '20.08.2018',
       '21.08.2018', '22.08.2018', '23.08.2018', '24.08.2018',
       '25.08.2018', '26.08.2018', '27.08.2018', '28.08.2018',
       '29.08.2018', '30.08.2018', '31.08.2018'], dtype=object)

In [53]:
# Create datetime 

df2['day'] = [x.date() for x in (pd.to_datetime([i for i in df2['day']], format='%d.%m.%Y'))] 
df2['time'] = [x.time() for x in (pd.to_datetime([i for i in df2['hour']], format='%H:%M:%S'))]   # remove primes from the time
df1 = df2.copy(deep=True)
df2.set_index(['day','hour'], inplace=True)

df_bf_00 = df2[sr0]
df_bf_01 = df2[sr1]

In [54]:
df2.drop(columns ='time', inplace=True)

In [55]:
df2

Unnamed: 0_level_0,Unnamed: 1_level_0,p3,p4
day,hour,Unnamed: 2_level_1,Unnamed: 3_level_1
2018-08-01,00:00:00,91.6,12.4
2018-08-01,00:00:10,91.6,12.4
2018-08-01,00:00:20,91.6,12.4
2018-08-01,00:00:30,91.6,12.4
2018-08-01,00:00:40,91.6,12.4
2018-08-01,00:00:50,91.6,12.4
2018-08-01,00:01:00,91.6,12.4
2018-08-01,00:01:10,91.6,12.4
2018-08-01,00:01:20,91.6,12.5
2018-08-01,00:01:30,91.6,12.4


## Basic sanity checks

In [56]:
# Accessing dates
i_date = df2.index.get_level_values(0)                                      # get all dates
idx_date = np.unique(df2.index.get_level_values(0), return_index=True)[1]      # get index of unique dates
date_list = i_date[idx_date]   # get list of all dates
print('Unique dates:',date_list)

for pl_i in range(len(date_list)):
    if len(df_bf_00[date_list[pl_i].date()].values) != 8640:
        print('Corrupted date:', date_list[pl_i].date())
        print('Corrupted date index:',pl_i)
        print('Corrupted date shape:', df2.loc[date_list[pl_i].date()].shape)  


Unique dates: DatetimeIndex(['2018-08-01', '2018-08-02', '2018-08-03', '2018-08-04',
               '2018-08-05', '2018-08-06', '2018-08-07', '2018-08-08',
               '2018-08-09', '2018-08-10', '2018-08-11', '2018-08-12',
               '2018-08-13', '2018-08-14', '2018-08-15', '2018-08-16',
               '2018-08-17', '2018-08-18', '2018-08-19', '2018-08-20',
               '2018-08-21', '2018-08-22', '2018-08-23', '2018-08-24',
               '2018-08-25', '2018-08-26', '2018-08-27', '2018-08-28',
               '2018-08-29', '2018-08-30', '2018-08-31'],
              dtype='datetime64[ns]', name='day', freq=None)
Corrupted date: 2018-08-14
Corrupted date index: 13
Corrupted date shape: (8638, 2)


In [57]:
# Removing dates where data is missing
data_df2 = df2.copy()
for pl_i in range(len(date_list)):
    if len(df_bf_00[date_list[pl_i].date()].values) != 8640:
        data_df2.drop(date_list[pl_i].date(),level='day',inplace=True)


In [58]:
# Checking if it works and compute again date index

# Accessing dates
i_date = data_df2.index.get_level_values(0)                                      # get all dates
idx_date = np.unique(data_df2.index.get_level_values(0), return_index=True)[1]      # get index of unique dates
date_list = i_date[idx_date]   # get list of all dates
#print('Unique dates:',date_list)

df_bf_00 = data_df2[sr0]
df_bf_01 = data_df2[sr1]


for pl_i in range(len(date_list)):
    if len(df_bf_00[date_list[pl_i].date()].values) <288:
        print('Corrupted date:', date_list[pl_i].date())
        print('Corrupted date index:',pl_i)
        print('Corrupted date shape:', data_df2.loc[date_list[pl_i].date()].shape)  

In [59]:
data_df2.shape[0]/8640


30.0

In [60]:
# Dates and times
data_time = []
for pl_i in idx_date:                             # create data_time indeces to have access later
    time = data_df2.loc[i_date[pl_i]].index
    data_time.append(time)                        # associated to every date segment

In [61]:
time_int = [np.linspace(1, 8640, num = 8640, dtype=int) for _ in range(len(date_list))]

# Plotting

In [63]:
%matplotlib tk

data1 = []
data2 = []
data123 = []

itera = date_list

import numpy as np
import matplotlib.pylab as pl
import matplotlib.gridspec as gridspec

gs = gridspec.GridSpec(2, 2)

fig = plt.figure()
#plt.axis([0, 24, -3, 100])

ax1 = fig.add_subplot(gs[0, 0]) # row 0, col 0
ax2 = fig.add_subplot(gs[0, 1]) # row 0, col 1
ax4 = fig.add_subplot(gs[1, :]) # row 1, span all columns

ax1.set_title(sr0, fontdict=None, pad=None)
ax2.set_title(sr1, fontdict=None, pad=None)
full = sr0 + ' '+ sr1
ax4.set_title(full, fontdict=None, pad=None)

fig.suptitle(str(date_list[0].date()), fontsize=12)



if Case == 1:
    ax1.set_ylim([-3,100])
    ax2.set_ylim([-3,100])
    ax4.set_ylim([-3,100])
    
if Case == 2:
    ax1.set_ylim([40,150])
    ax2.set_ylim([40,150])
    ax4.set_ylim([40,150])

for pl_i in range(len(date_list)): 
    ax1.plot(time_int[pl_i], df_bf_00[date_list[pl_i].date()].values, '#C0C0C0', lw=2)
    ax2.plot(time_int[pl_i], df_bf_01[date_list[pl_i].date()].values, '#C0C0C0', lw=2)
    
l, = ax1.plot(time_int[0], df_bf_00[date_list[0].date()].values, '#1E90FF', lw=2)     #the first one is the one in blue
l2, = ax2.plot(time_int[0], df_bf_01[date_list[0].date()].values, '#8B008B')


ll1, = ax4.plot(time_int[0], df_bf_00[date_list[0].date()].values, '#1E90FF')
ll2, = ax4.plot(time_int[0], df_bf_01[date_list[0].date()].values, '#8B008B')




############### Buttons widget  ####################

class Index(object):
    ind = 0

    def next(self, event):
        self.ind += 1
        i = self.ind % len(itera)

        #ydata0 will be the plot alone
        ydata1 = df_bf_00[date_list[i].date()].values   
        ydata2 = df_bf_01[date_list[i].date()].values 
        xdata = time_int[i]          
        
        l.set_ydata(ydata1)
        l.set_xdata(xdata)
        l2.set_ydata(ydata2)
        l2.set_xdata(xdata)
        
        ll1.set_ydata(ydata1)
        ll2.set_ydata(ydata2)
        
        ll1.set_xdata(xdata) 
        ll2.set_xdata(xdata)
        
        if (i == (0)):
            fig.suptitle('End of data files - restarting with data file ' + str(date_list[i].date()), fontsize=12)
        else: 
            fig.suptitle(str(date_list[i].date()), fontsize=12)
            
        plt.draw()

    def prev(self, event):
        self.ind -= 1
        i = self.ind % len(itera)
        
        #ydata0 will be the plot alone
        ydata1 = df_bf_00[date_list[i].date()].values 
        ydata2 = df_bf_01[date_list[i].date()].values 
        xdata = time_int[i]  
        
        l.set_ydata(ydata1)
        l.set_xdata(xdata)
        
        l2.set_ydata(ydata2)
        l2.set_xdata(xdata)
        

        ll1.set_ydata(ydata1)
        ll2.set_ydata(ydata2)


        ll1.set_xdata(xdata) 
        ll2.set_xdata(xdata)

        

        if (i == (0)):
            fig.suptitle('End of data files - restarting with data file ' + str(date_list[i].date()), fontsize=12)
        else: 
            fig.suptitle(str(date_list[i].date()), fontsize=12)
            
        plt.draw()

callback = Index()

axprev = plt.axes([0.7, 0.05, 0.1, 0.075])
axnext = plt.axes([0.81, 0.05, 0.1, 0.075])
bnext = Button(axnext, 'Next')
bnext.on_clicked(callback.next)

bprev = Button(axprev, 'Previous')
bprev.on_clicked(callback.prev)

"""
valore = '11'
def presskey(event):
    print('Pressed key = ', event.key)
    #sys.stdout.flush()    
    global valore 
    valore = event.key       
    return valore
"""

def onselect1(xmin, xmax):
    x = time_int[callback.ind % len(itera)]
    y = df_bf_00[date_list[callback.ind % len(itera)].date()].values 
    today = date_list[callback.ind % len(itera)]
   
    indmin1, indmax1 = np.searchsorted(x, (xmin, xmax))
    indmax1 = min(len(x) - 1, indmax1)
    thisx = x[indmin1:indmax1]
    thisy = y[indmin1:indmax1]    
    nplist = np.array([today.date() for i in range(len(thisx))])
        
    a1 = np.c_[nplist, thisx, thisy]
    global data1
    data1.extend(a1)
    #np.savetxt(completeName_label_1, data1)

        

def onselect2(xmin, xmax):
    x = time_int[callback.ind % len(itera)]
    y = df_bf_01[date_list[callback.ind % len(itera)].date()].values 
    today = date_list[callback.ind % len(itera)]
    
    indmin, indmax = np.searchsorted(x, (xmin, xmax))
    indmax = min(len(x) - 1, indmax)
    thisx = x[indmin:indmax]
    thisy = y[indmin:indmax]
    nplist = np.array([today.date() for i in range(len(thisx))])
    
    a2 = np.c_[nplist, thisx, thisy]
    global data2
    data2.extend(a2)
    

def onselect4(xmin, xmax):
    x = time_int[callback.ind % len(itera)]
    y1 = df_bf_00[date_list[callback.ind % len(itera)].date()].values 
    y2 = df_bf_01[date_list[callback.ind % len(itera)].date()].values
    today = date_list[callback.ind % len(itera)]
    
    indmin, indmax = np.searchsorted(x, (xmin, xmax))
    indmax = min(len(x) - 1, indmax)
    
    thisx = x[indmin:indmax]
    thisy1 = y1[indmin:indmax]
    thisy2 = y2[indmin:indmax]
    nplist = np.array([today.date() for i in range(len(thisx))])
        
    # save
    a123 = np.c_[nplist, thisx, thisy1, thisy2]
    global data123
    data123.extend(a123)

    
"""
# Connect key event to figure
fig.canvas.mpl_connect('key_press_event',presskey)
"""

#class1 = Onselect_1()

spans1 = SpanSelector(ax1, onselect1, 'horizontal', useblit=False,
                      rectprops=dict(alpha=0.5, facecolor='red'), span_stays=True)
span2 = SpanSelector(ax2, onselect2, 'horizontal', useblit=True,
                    rectprops=dict(alpha=0.5, facecolor='red'), span_stays=True )
span4 = SpanSelector(ax4, onselect4, 'horizontal', useblit=True,
                    rectprops=dict(alpha=0.5, facecolor='red') , span_stays=True)

## Save raw labels   CHECK

In [12]:
data1 = pd.DataFrame(data1, columns=['date','time_m', sr0])
data2 = pd.DataFrame(data2, columns=['date','time_m', sr1])
data3 = pd.DataFrame(data3, columns=['date','time_m', sr2])
data123 = pd.DataFrame(data123, columns=['date','time_m', sr0, sr1, sr2])


data1.to_csv(os.path.join(save_path, name_of_file_l1+sr0 + ".csv") )
data2.to_csv(os.path.join(save_path, name_of_file_l1+sr1 + ".csv") )
data3.to_csv(os.path.join(save_path, name_of_file_l1+sr2 + ".csv") )
data123.to_csv(os.path.join(save_path, name_of_file_l1+sr0+sr1+sr2 + ".csv") )


Traceback (most recent call last):
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/cbook/__init__.py", line 215, in process
    func(*args, **kwargs)
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/widgets.py", line 1597, in release
    self._release(event)
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/widgets.py", line 1841, in _release
    self.onselect(vmin, vmax)
  File "<ipython-input-11-896b60f296ca>", line 159, in onselect1
    data1.extend(a1)
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/pandas/core/generic.py", line 5067, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'extend'
Traceback (most recent call last):
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/cbook/__init__.py", line 215, in process
    func(*args, **kwargs)
 

Traceback (most recent call last):
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/cbook/__init__.py", line 215, in process
    func(*args, **kwargs)
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/widgets.py", line 1597, in release
    self._release(event)
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/widgets.py", line 1841, in _release
    self.onselect(vmin, vmax)
  File "<ipython-input-11-896b60f296ca>", line 159, in onselect1
    data1.extend(a1)
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/pandas/core/generic.py", line 5067, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'extend'
Traceback (most recent call last):
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/cbook/__init__.py", line 215, in process
    func(*args, **kwargs)
 

Traceback (most recent call last):
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/cbook/__init__.py", line 215, in process
    func(*args, **kwargs)
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/widgets.py", line 1597, in release
    self._release(event)
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/widgets.py", line 1841, in _release
    self.onselect(vmin, vmax)
  File "<ipython-input-11-896b60f296ca>", line 159, in onselect1
    data1.extend(a1)
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/pandas/core/generic.py", line 5067, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'extend'
Traceback (most recent call last):
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/cbook/__init__.py", line 215, in process
    func(*args, **kwargs)
 

Traceback (most recent call last):
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/cbook/__init__.py", line 215, in process
    func(*args, **kwargs)
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/widgets.py", line 1597, in release
    self._release(event)
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/widgets.py", line 1841, in _release
    self.onselect(vmin, vmax)
  File "<ipython-input-11-896b60f296ca>", line 159, in onselect1
    data1.extend(a1)
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/pandas/core/generic.py", line 5067, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'extend'
Traceback (most recent call last):
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/cbook/__init__.py", line 215, in process
    func(*args, **kwargs)
 

Traceback (most recent call last):
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/cbook/__init__.py", line 215, in process
    func(*args, **kwargs)
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/widgets.py", line 1597, in release
    self._release(event)
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/widgets.py", line 1841, in _release
    self.onselect(vmin, vmax)
  File "<ipython-input-11-896b60f296ca>", line 159, in onselect1
    data1.extend(a1)
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/pandas/core/generic.py", line 5067, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'extend'
Traceback (most recent call last):
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/cbook/__init__.py", line 215, in process
    func(*args, **kwargs)
 

In [24]:
data1

Unnamed: 0,date,time_m,bf_07


### Back to time

In [25]:
data1_l = data1.copy(deep=True)
data2_l = data2.copy(deep=True)
data3_l = data3.copy(deep=True)
data123_l = data123.copy(deep=True)

In [26]:
data1_l["time"] = None
data2_l["time"] = None 
data3_l["time"] = None 
data123_l["time"] = None 

for iSample in range(data1_l.shape[0]):
    entry = data1.loc[iSample,'time_m']
    time = str(timedelta(minutes=int(entry)))
    data1_l.loc[iSample,'time'] = time
data1_l = data1_l[['date', 'time_m', 'time', sr0]]
data1_l.drop(['time_m'], axis=1, inplace=True)
data1_l['date'] = pd.to_datetime((data1_l['date']), format='%Y-%m-%d')
data1_l['time'] = [x.time() for x in (pd.to_datetime([i for i in data1_l['time']], format='%H:%M:%S'))] 
    

for iSample in range(data2_l.shape[0]):
    entry = data2.loc[iSample,'time_m']
    time = str(timedelta(minutes=int(entry)))
    data2_l.loc[iSample,'time'] = time
data2_l = data2_l[['date', 'time_m', 'time', sr1]]
data2_l.drop(['time_m'], axis=1, inplace=True)
data2_l['date'] = pd.to_datetime((data2_l['date']), format='%Y-%m-%d')
data2_l['time'] = [x.time() for x in (pd.to_datetime([i for i in data2_l['time']], format='%H:%M:%S'))] 
    

for iSample in range(data3_l.shape[0]):
    entry = data3.loc[iSample,'time_m']
    time = str(timedelta(minutes=int(entry)))
    data3_l.loc[iSample,'time'] = time
data3_l = data3_l[['date', 'time_m', 'time', sr2]]
data3_l.drop(['time_m'], axis=1, inplace=True)
data3_l['date'] = pd.to_datetime((data3_l['date']), format='%Y-%m-%d')
data3_l['time'] = [x.time() for x in (pd.to_datetime([i for i in data3_l['time']], format='%H:%M:%S'))] 
   

for iSample in range(data123_l.shape[0]):
    entry = data123.loc[iSample,'time_m']
    time = str(timedelta(minutes=int(entry)))
    data123_l.loc[iSample,'time'] = time
data123_l = data123_l[['date', 'time_m', 'time', sr0, sr1, sr2]]
data123_l.drop(['time_m'], axis=1, inplace=True)
data123_l['date'] = pd.to_datetime((data123_l['date']), format='%Y-%m-%d')
data123_l['time'] = [x.time() for x in (pd.to_datetime([i for i in data123_l['time']], format='%H:%M:%S'))] 
data123_l

Unnamed: 0,date,time,bf_07,bl_ceb60,bf_08


Traceback (most recent call last):
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/cbook/__init__.py", line 215, in process
    func(*args, **kwargs)
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/widgets.py", line 1597, in release
    self._release(event)
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/matplotlib/widgets.py", line 1841, in _release
    self.onselect(vmin, vmax)
  File "<ipython-input-22-896b60f296ca>", line 177, in onselect2
    data2.extend(a2)
  File "/Users/russoste/anaconda3/envs/ADAS_env/lib/python3.6/site-packages/pandas/core/generic.py", line 5067, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'extend'
ERROR:root:Invalid alias: The name clear can't be aliased because it is another magic command.
ERROR:root:Invalid alias: The name more can't be aliased because it is another magic command.
E

# Labelled data

In [67]:
data_lab_1 = pd.DataFrame(df_bf_00.copy())
#data_lab_1['Anomaly'] = '0'
data_lab_1 = data_lab_1.reset_index(level=[0,1])

data_lab_2 = pd.DataFrame(df_bf_01.copy())
#data_lab_2['Anomaly'] = '0'
data_lab_2 = data_lab_2.reset_index(level=[0,1])

data_lab_3 = pd.DataFrame(df_bf_02.copy())
#data_lab_3['Anomaly'] = '0'
data_lab_3 = data_lab_3.reset_index(level=[0,1])

data_lab_123 = pd.DataFrame(df_bf_00.copy())
data_lab_123[sr1] = pd.DataFrame(df_bf_01)
data_lab_123[sr2] = pd.DataFrame(df_bf_02)
#data_lab_123['Anomaly'] = '0'
data_lab_123 = data_lab_123.reset_index(level=[0,1])
data_lab_123

Unnamed: 0,date,time,bf_07,bl_ceb60,bf_08
0,2018-09-01,00:00:00,,19.916871,
1,2018-09-01,00:05:00,,19.908271,
2,2018-09-01,00:10:00,,19.790993,
3,2018-09-01,00:15:00,,19.933870,
4,2018-09-01,00:20:00,,18.531899,
5,2018-09-01,00:25:00,,19.640369,
6,2018-09-01,00:30:00,,,
7,2018-09-01,00:35:00,,18.799678,
8,2018-09-01,00:40:00,,18.584133,
9,2018-09-01,00:45:00,,19.077330,


In [68]:
#labels_df = pd.merge(data_lab_123, data123_l, on = ['date', 'time'], how='left', indicator=True)
labels_df_1 = pd.merge(data_lab_1, data1_l, how='left', indicator=True)
labels_df_2 = pd.merge(data_lab_2, data2_l, how='left', indicator=True)
labels_df_3 = pd.merge(data_lab_3, data3_l, how='left', indicator=True)
labels_df_123 = pd.merge(data_lab_123, data123_l, how='left', indicator=True)

In [69]:
labels_df_1

Unnamed: 0,date,time,bf_07,_merge
0,2018-09-01,00:00:00,,left_only
1,2018-09-01,00:05:00,,left_only
2,2018-09-01,00:10:00,,left_only
3,2018-09-01,00:15:00,,left_only
4,2018-09-01,00:20:00,,left_only
5,2018-09-01,00:25:00,,left_only
6,2018-09-01,00:30:00,,left_only
7,2018-09-01,00:35:00,,left_only
8,2018-09-01,00:40:00,,both
9,2018-09-01,00:45:00,,both


In [70]:
# ADD zeros and ones with dictionary mapping

mapper_dict = {'left_only': 0, 'both': 1}

def mp(entry):
    """
    maps new values
    """
    return mapper_dict[entry] if entry in mapper_dict else entry
mp = np.vectorize(mp)

In [71]:
labels_df_1 ['_merge'] = mp(labels_df_1['_merge'])
labels_df_2 ['_merge'] = mp(labels_df_2['_merge'])
labels_df_3 ['_merge'] = mp(labels_df_3['_merge'])
labels_df_123 ['_merge'] = mp(labels_df_123['_merge'])

In [72]:
labels_df_1 = labels_df_1.rename(index=str, columns={"_merge": "Anomaly"})
labels_df_2 = labels_df_2.rename(index=str, columns={"_merge": "Anomaly"})
labels_df_3 = labels_df_3.rename(index=str, columns={"_merge": "Anomaly"})
labels_df_123 = labels_df_123.rename(index=str, columns={"_merge": "Anomaly"})


In [73]:
labels_df_1.to_csv(os.path.join(save_path, name_of_file_l1_time+sr0 + ".csv") )
labels_df_2.to_csv(os.path.join(save_path, name_of_file_l1_time+sr1 + ".csv") )
labels_df_3.to_csv(os.path.join(save_path, name_of_file_l1_time+sr2 + ".csv") )
labels_df_123.to_csv(os.path.join(save_path, name_of_file_l1_time+sr0+sr1+sr2 + ".csv") )            


In [76]:
labels_df_1

Unnamed: 0,date,time,bf_07,Anomaly
0,2018-09-01,00:00:00,,0
1,2018-09-01,00:05:00,,0
2,2018-09-01,00:10:00,,0
3,2018-09-01,00:15:00,,0
4,2018-09-01,00:20:00,,0
5,2018-09-01,00:25:00,,0
6,2018-09-01,00:30:00,,0
7,2018-09-01,00:35:00,,0
8,2018-09-01,00:40:00,,1
9,2018-09-01,00:45:00,,1


# Other useful commands 
### Back to multi-indexing

In [None]:
data_lab_1.set_index(['date','time'], inplace=True)
data_lab_2.set_index(['date','time'], inplace=True)
data_lab_3.set_index(['date','time'], inplace=True)
data_lab_123.set_index(['date','time'], inplace=True)

In [250]:
data1_ll = data1_l.copy(deep=True)
data1_ll.set_index(['date','time'], inplace=True)

data2_ll = data2_l.copy(deep=True)
data2_ll.set_index(['date','time'], inplace=True)

data3_ll = data3_l.copy(deep=True)
data3_ll.set_index(['date','time'], inplace=True)

data123_ll = data123_l.copy(deep=True)
data123_ll.set_index(['date','time'], inplace=True)


In [251]:
# Accessing dates
i_date = data1_ll.index.get_level_values(0)                                      # get all dates
idx_date = np.unique(data1_ll.index.get_level_values(0), return_index=True)[1]      # get index of unique dates
date_list = i_date[idx_date]   # get list of all unique dates

# Accessing dates
i_date2 = data_lab_1.index.get_level_values(0)                                      # get all dates
idx_date2 = np.unique(data_lab_1.index.get_level_values(0), return_index=True)[1]      # get index of unique dates
date_list2 = i_date2[idx_date2]   # get list of all unique dates