
&nbsp;
&nbsp;

&nbsp;



<img src="https://project-leo.co.uk/wp-content/themes/project-leo/assets/img/project-leo.png"/>


  
  
&nbsp;
&nbsp;
    

  
  # Time Syncing Notebook

  &nbsp;
  
  
  This Project LEO Notebook is to be used for the detection of any errors in energy data (for two datasets) where there may exist discrepancies through the lagging of power data. **This Notebook does assume some level of programming experience**. Please contact the University of Oxford [Data Coordinator](mailto:masao.ashtine@oerc.ox.ac.uk) if you would like any help with using this or any other notebooks. 
  
  This Notebook has a user-friendly version where the code is kept in the background, allowing you to move through the running of the cells (using the 'Run' button in the Menu above), only viewing the output graphs and information. If you would like this experience, use the cell below and follow the instructions. Alternatively, you can install the `jupyter_contrib_nbextensions` package to use the `codefolding` extenstion which allows the coding cells to be collapsible. 
  
  If you would like to control this notebook more (owing to error checking or otherwise) please follow the intructions in the cell below. Any visible cell can be controlled and edited by double-clicking the content. Use **Shift+Enter** to run the cell. **However, it is STRONGLY recommended to note any changes that you have made with commenting, using your initials as well. Eg: '# [MA]: The following line was indented for increased readability'

<details><summary><b>Want a more user-friendly experience? Click here and follow these instructions</b></summary>
<p>

The following code can be run to to hide all input coding cells to create a more user-friendly Notebook. To unhide all of the coding blocks to make any edits, simply select 'Kernel' in the Menu above, and then select 'Restart & Clear Output'. Please note that this will delete any created variables and information from memory and you will need to rerun the cells.

To run the following code, use the following cell to input the block of code below (starting wil '%', ending with '</style>'). Run this cell to then hide all input code cells.

```html
%%html
<style>
div.input {
    display:none;
}
</style>
```

</p>
</details>




In [None]:
%%html 
<style>
div.input {
    display:none;
}
</style>

## Importing the relevant libraries 

&nbsp;

The cell below, hidden or otherwise, will be used to import the libraries needed for the analysis of the data. Note, you may need to install some of these libraries manually and this can be done using the `pip install` functionality from your terminal. See [here](https://packaging.python.org/tutorials/installing-packages/) for more information. Some of these have been setup for running in the cell which follows the importation of the libraries.

This notebook also relies on the `ipywidgets` library and thus if you do not have this on a your local computer, you will have to follow the intructions found [here](https://towardsdatascience.com/interactive-controls-for-jupyter-notebooks-f5c94829aee6) to install it with relative ease! Some example plots using this and other libraries can be found [here](https://mybinder.org/v2/gh/WillKoehrsen/Data-Analysis/widgets-stable?filepath=widgets%2FWidgets-Overview.ipynb).

&nbsp;

In [1]:
# Importation of libraries 

# Importation of Numerical Libraries
from scipy import signal, fftpack, optimize
from pymongo import MongoClient, errors
import ipywidgets as widgets
from getpass import getpass
import pandas as pd
import numpy as np
import ssl, sys
import csv

# Importation of Visualization Libraries
from plotly.offline import iplot, init_notebook_mode, plot
from ipywidgets import interact, interact_manual
from plotly.subplots import make_subplots
from matplotlib import pyplot as plt
from IPython.display import display
import chart_studio.plotly as py
import plotly.graph_objs as go
import cufflinks as cf

# Configurations
cf.set_config_file(colorscale='plotly')
init_notebook_mode(connected=True)
cf.go_offline(connected=True)



In [None]:
# Use this cell to install any libraries that may be absent on the server.
# You may need to restart the kernel/reload the notebook after installation
!pip install chart-studio
!pip install dnspython

## Importing the Energy Data

&nbsp;

The cells below will ask you to input the file paths for the two datasets that you would like to compare. It is important to use the **FULL** and correct paths for the data. Please ensure that the data has been cleaned and you can use the scripts found on the [Project LEO Bitbucket Repository](https://bitbucket.org/projectleodata/project-leo-database/src/master/).
  
  Please note that if you need to read data from an Excel Spreadsheet, you will need to adjust the cell below to account for this using the Pandas `read_excel` function or one of your choosing.
  
  You will also be asked to input the position of any *Date* and *Time* columns in the datasets. If a *Date* column is the first, and *Time* the second, simply input `0, 1`. If you have a single column of date and time, enter `0` if it is the first column (or any subsequent position where appropriate).
  
  &nbsp;

In [3]:
# User input of the site data 

# This cell uses an interactive method to obtain the file paths from the user
# It also asks for the position of the date/time columns 
class data_input():
    def __init__(self, 
                 sites = 'Replace this with csv/txt path', 
                 times = 'Eg: 0, 1'
                ):
        
        # These widgets allow the user to input text options
        self.site1_data = widgets.Text(description = 'Site 1: ',value = sites)
        self.site2_data = widgets.Text(description = 'Site 2: ',value = sites)
        self.time1 = widgets.Text(description = 'Site 1 Time: ',value = times) 
        self.time2 = widgets.Text(description = 'Site 2 Time: ',value = times) 
        
        # The following allows for storage of the input data into a variable after entry
        # This relies on the 'handle_submit' function below.
        self.site1_data.on_submit(self.handle_submit)
        self.site2_data.on_submit(self.handle_submit)
        self.time1.on_submit(self.handle_submit)
        self.time2.on_submit(self.handle_submit)
        
        # Displays the widgets
        display(self.site1_data, self.site2_data, self.time1, self.time2)
        
    def handle_submit(self, text):
        self.v = text.value
        return self.v

# Initializes the class defined above
data_input = data_input()


Text(value='Replace this with csv/txt path', description='Site 1: ')

Text(value='Replace this with csv/txt path', description='Site 2: ')

Text(value='Eg: 0, 1', description='Site 1 Time: ')

Text(value='Eg: 0, 1', description='Site 2 Time: ')

In [4]:
# Loading the data into memory through Pandas 

# The data will now be read into Python using the Pandas framework
# First part separates the user input by the comma
time_cols1 = [int(p) for p in data_input.time1.value.split(", ")]
time_cols2 = [int(p) for p in data_input.time2.value.split(", ")]

# This part of the code automatically determines the delimiter of any submitted txt file, if one was submitted
if data_input.site1_data.value.split('.')[-1] == 'txt':
    with open(data_input.site1_data.value, 'r') as csvfile:
        dialect = csv.Sniffer().sniff(csvfile.read(1024))
    site1_df = pd.read_csv(data_input.site1_data.value, delimiter=dialect.delimiter, parse_dates={"timestamp":time_cols1})
    site1_df.set_index("timestamp", inplace=True)
else:
    site1_df = pd.read_csv(data_input.site1_data.value, parse_dates={"timestamp":time_cols1})
    site1_df.set_index("timestamp", inplace=True)

# Does the same for the second site
if data_input.site2_data.value.split('.')[-1] == 'txt':
    with open(data_input.site2_data.value, 'r') as csvfile:
        dialect = csv.Sniffer().sniff(csvfile.read(1024))
    site2_df = pd.read_csv(data_input.site2_data.value, delimiter=dialect.delimiter, parse_dates={"timestamp":time_cols2})
    site2_df.set_index("timestamp", inplace=True)
else:
    site2_df = pd.read_csv(data_input.site2_data.value, parse_dates={"timestamp":time_cols2})
    site2_df.set_index("timestamp", inplace=True)
    

## Sample of the datasets

  
  Here are the first 5 lines of  your first input dataset below:
  
  &nbsp;
  

In [5]:

site1_df.head()


Unnamed: 0_level_0,Voltage L1N Min,Voltage L1N Avg,Voltage L1N Max,Voltage L2N Min,Voltage L2N Avg,Voltage L2N Max,Voltage L3N Min,Voltage L3N Avg,Voltage L3N Max,Current L1 Min,...,THD A L1 Max,THD A L2 Min,THD A L2 Avg,THD A L2 Max,THD A L3 Min,THD A L3 Avg,THD A L3 Max,THD A N Min,THD A N Avg,THD A N Max
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019-11-18 11:45:43,244.845,244.937,245.028,244.375,244.453,244.532,245.159,245.224,245.29,50.264,...,24.5,16.7,16.7,16.9,14.1,14.5,14.8,96.4,96.5,96.8
2019-11-18 11:45:48,244.793,244.976,245.159,244.218,244.323,244.401,245.055,245.264,245.394,50.032,...,24.7,15.4,16.1,17.3,14.2,14.4,15.0,95.4,96.1,96.7
2019-11-18 11:45:53,244.872,245.081,245.185,244.218,244.336,244.427,245.185,245.329,245.42,49.705,...,24.7,15.6,15.8,16.0,14.1,14.6,14.9,95.4,95.6,95.8
2019-11-18 11:45:58,245.107,245.211,245.316,244.48,244.571,244.663,244.898,245.041,245.342,49.445,...,25.2,16.7,16.9,17.2,12.3,13.2,15.1,96.4,98.3,99.4
2019-11-18 11:46:03,245.002,245.055,245.159,244.114,244.375,244.558,245.185,245.264,245.316,49.255,...,25.0,15.7,16.8,17.4,14.3,14.6,15.1,95.3,96.2,96.8


  ----------------------
  And here are the first 5 lines of your second input dataset below:
  
  ---------------------
  

In [6]:

site2_df.head()


Unnamed: 0_level_0,Vrms ph-n L1N Min,Vrms ph-n L1N Avg,Vrms ph-n L1N Max,Vrms ph-n L2N Min,Vrms ph-n L2N Avg,Vrms ph-n L2N Max,Vrms ph-n L3N Min,Vrms ph-n L3N Avg,Vrms ph-n L3N Max,Vrms ph-n NG Min,...,THD A L1 Max,THD A L2 Min,THD A L2 Avg,THD A L2 Max,THD A L3 Min,THD A L3 Avg,THD A L3 Max,THD A N Min,THD A N Avg,THD A N Max
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019-11-17 23:59:04,246.48,246.5,246.52,245.98,246.0,246.02,245.96,245.96,245.98,0.04,...,15.75,12.13,12.18,12.24,11.7,11.73,11.75,265.54,266.25,266.94
2019-11-17 23:59:05,246.54,246.54,246.54,246.0,246.0,246.02,245.96,245.98,245.98,0.04,...,15.68,12.04,12.07,12.1,11.65,11.67,11.68,267.01,267.77,269.03
2019-11-17 23:59:06,246.5,246.52,246.54,245.98,245.98,246.0,245.94,245.96,245.98,0.04,...,15.64,12.01,12.02,12.03,11.64,11.64,11.65,269.44,270.26,271.18
2019-11-17 23:59:07,246.48,246.5,246.52,245.96,245.98,246.0,245.96,245.96,245.96,0.04,...,15.63,12.03,12.08,12.14,11.62,11.63,11.64,261.45,264.9,268.25
2019-11-17 23:59:08,246.48,246.48,246.5,245.94,245.98,246.0,245.92,245.94,245.96,0.04,...,15.9,12.11,12.22,12.35,11.58,11.69,11.83,261.01,262.81,265.47


## The Data at a First Glance
  
  &nbsp;
  
  Use the following to interact with the data. This section relies on the `ipywidgets` library and if this is not installed on your local system, you will need to do so as per the instructions in the **Importing the relevant libraries** section of this Notebook.
  
  &nbsp;

In [7]:
# Viewing the data for a particular site 

# Ask user for the site they would like to view data for
Site = input("Enter the Site number that you would like to see data for: ")

# Choose the dataset based on the user input
if Site == '1':
    df = site1_df
else:
    df = site2_df

Enter the Site number that you would like to see data for: 1


In [8]:
# Allow user input for the plotting of the data 

# This section will use the interact library to pull out the 
# data columns from the data for a dynamic plotting and visualization experience 
@interact
# TODO: We will eventually clean up these functions so that they are not repetitive 
def sample_line_plot(Variable1=widgets.Dropdown(options=list(df.select_dtypes('number').columns), 
                                        description='Variable 1'),
              Variable2=widgets.Dropdown(options=list(df.select_dtypes('number').columns), 
                                        description='Variable 2', value=df.columns.to_list()[1]),
              time=widgets.ToggleButtons(options=['Raw', 'Minute', 'Hour', 'Day', 'Month'], 
                                                    description='Time'),
              color_button=widgets.ToggleButtons(options=['white', 'polar', 'solar', 'henanigans'], 
                                                    description='Plot Theme'),
              color_scale=widgets.ToggleButtons(options=['polar', 'spectral', 'pastel1', 'dark2'], 
                                                    description='Colour Scale')):
    
    
    # The following resamples the data based on the user selection
    if time == 'Raw':
        new_df = df.resample('30S').mean()
    elif time == 'Minute':
        new_df = df.resample('T').mean()
    elif time == 'Hour':
        new_df = df.resample('H').mean()
    elif time == 'Day':
        new_df = df.resample('D').mean()
    else:
        new_df = df.resample('M').mean()
                        
    # The newly resampled data is then plotted based on the user's choice
    new_df.iplot(kind='line', y=[Variable1, Variable2], xTitle='Time', yTitle='',
                 title='Click on the legend to isolate parameters', mode='lines', theme=color_button, colorscale=color_scale)
    

interactive(children=(Dropdown(description='Variable 1', options=('Voltage L1N Min', 'Voltage L1N Avg', 'Volta…

## Resampling of Data
  
  &nbsp;
  
  This section will be used to resample the datasets (if needed) based on the data resolution as seen in the **Sample of the Datasets** section above. Please use the following to enter the resampling that you will like to use on a particular dataset. 
  
  **If you are entering the** `spline` **method, the value in the dropdown represents the order of the spline interpolation**
  
  &nbsp;

In [9]:
# Allow the user to resample a dataset. 
# This assumes that one dataset is resampled to the frequency of the other and thus 
# two datasets should not be resampled

# This class is used for the resampling of the data, allowing user input to then 
# treat the data depending on the input parameters.
class resample_input():
    def __init__(self):
        self.site = widgets.Dropdown(description = 'Site: ', options = ['1', '2'], value='1')
        self.resample = widgets.Text(description = 'Resample to: ', value='Eg: 1S, 1M, 10M etc')
        self.interp = widgets.Dropdown(description = 'Interpolation: ', 
                                       options = ['linear', 'spline, 2', 'spline, 3'], value='linear')
        
        self.resample.on_submit(self.handle_submit)
        display(self.site, self.resample, self.interp)
        
    def handle_submit(self, text):
        self.v = text.value
        return self.v

resamp_input = resample_input()


Dropdown(description='Site: ', options=('1', '2'), value='1')

Text(value='Eg: 1S, 1M, 10M etc', description='Resample to: ')

Dropdown(description='Interpolation: ', options=('linear', 'spline, 2', 'spline, 3'), value='linear')

In [10]:
# Perform the resampling based on the input above 

# This section will print a confirmation of the resampling and some sample data from the dataset
# First parses the user input and preallocates the dataframe objects 
interp = resamp_input.interp.value.split(", ")
site1_rs_df, site2_rs_df = None, None

# Display and resample based on the site that was selected
if resamp_input.site.value == '1':
    print("\nThe first dataset has been resampled to {} using the {} interpolation method based on your inputs"
          .format(resamp_input.resample.value, interp[0]))
    
    # Applies the order of the spline if this was selected
    if len(interp) > 1:
        site1_rs_df = site1_df.resample(resamp_input.resample.value).interpolate(method=interp[0], order=interp[1])
    else:
        site1_rs_df = site1_df.resample(resamp_input.resample.value).interpolate(method=interp[0])
        
    # Display function needed to force output
    display(site1_rs_df.head())

else:
    print("\nThe second dataset has been resampled to {} using the {} interpolation method based on your inputs"
          .format(resamp_input.resample.value, interp[0]))
    if len(interp) > 1:
        site2_rs_df = site2_df.resample(resamp_input.resample.value).interpolate(method=interp[0], order=interp[1])
    else:
        site2_rs_df = site2_df.resample(resamp_input.resample.value).interpolate(method=interp[0])
    display(site2_rs_df.head())
    


The first dataset has been resampled to 1S using the linear interpolation method based on your inputs


Unnamed: 0_level_0,Voltage L1N Min,Voltage L1N Avg,Voltage L1N Max,Voltage L2N Min,Voltage L2N Avg,Voltage L2N Max,Voltage L3N Min,Voltage L3N Avg,Voltage L3N Max,Current L1 Min,...,THD A L1 Max,THD A L2 Min,THD A L2 Avg,THD A L2 Max,THD A L3 Min,THD A L3 Avg,THD A L3 Max,THD A N Min,THD A N Avg,THD A N Max
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2019-11-18 11:45:43,244.845,244.937,245.028,244.375,244.453,244.532,245.159,245.224,245.29,50.264,...,24.5,16.7,16.7,16.9,14.1,14.5,14.8,96.4,96.5,96.8
2019-11-18 11:45:44,244.8346,244.9448,245.0542,244.3436,244.427,244.5058,245.1382,245.232,245.3108,50.2176,...,24.54,16.44,16.58,16.98,14.12,14.48,14.84,96.2,96.42,96.78
2019-11-18 11:45:45,244.8242,244.9526,245.0804,244.3122,244.401,244.4796,245.1174,245.24,245.3316,50.1712,...,24.58,16.18,16.46,17.06,14.14,14.46,14.88,96.0,96.34,96.76
2019-11-18 11:45:46,244.8138,244.9604,245.1066,244.2808,244.375,244.4534,245.0966,245.248,245.3524,50.1248,...,24.62,15.92,16.34,17.14,14.16,14.44,14.92,95.8,96.26,96.74
2019-11-18 11:45:47,244.8034,244.9682,245.1328,244.2494,244.349,244.4272,245.0758,245.256,245.3732,50.0784,...,24.66,15.66,16.22,17.22,14.18,14.42,14.96,95.6,96.18,96.72


## Time syncing data

  &nbsp;
  
  This section will look at the data from the two datasets (please note that their data times must overlap) to join the data based on columns of choice. This will allow the user to examine data from the two sites, giving the opportunity to compare the data of selected variables. 
  
  This section will also rely on correlation statistics to automatically sync the two datasets based on the input variables. For instance, if the average voltage from each site is chosen, the function can automatically detect where there is a lag in one site's data versus the other, for instance, by 5 seconds etc. The data will then be time shifted to remove these lags for further analysis. This is explained a bit more in the **Data Cleaning and Processes** document found on the [SharePoint](https://ssecom.sharepoint.com/sites/extranet-networks-engineering_NIC/Shared%20Documents/WP4/Deliverables).
  
  Unsynced data happens when there is a strong correlation between the signals in the power data but there is a shift owing to an existing time lag in data recordings between the two sites of interest. More information on the *Time Lagged Cross-Correlation (TLCC)* technique used can be found [here](https://towardsdatascience.com/four-ways-to-quantify-synchrony-between-time-series-data-b99136c4a9c9).
  
  **You must ensure that the datasets match in resolution (with an overlap of time periods in the data) using the sections above if needed**
  
  &nbsp;
  
  ### The data will first be concatenated based on the variables of interest
  
  &nbsp;


In [11]:
# Allow the user to concatenate the two datasets based on selected variables 

# This class allows the user to select the two variables in the datasets that they would like to use
# for the potential error detection in terms of unsynced data. 
class concat_data():
    def __init__(self):
               
        self.site1 = widgets.Dropdown(description='Site 1 Variable: ', 
                                     options=list(site1_df.select_dtypes('number').columns))
        self.site2 = widgets.Dropdown(description='Site 2 Variable: ', 
                                     options=list(site2_df.select_dtypes('number').columns))
        
        display(self.site1, self.site2)
        
        # Assumes only one or none of the datasets has been resampled for simplicity
        # thus allowing the concatenation of the right dataset
        if site1_rs_df is not None:
            self.site_df_comb = pd.concat([site1_rs_df[self.site1.value], site2_df[self.site2.value]], axis=1, join='inner', sort=False)
            self.site1_var, self.site2_var = site1_rs_df[self.site1.value], site2_df[self.site2.value]
        elif site2_rs_df is not None:
            self.site_df_comb = pd.concat([site1_df[self.site1.value], site2_rs_df[self.site2.value]], axis=1, join='inner', sort=False)
            self.site1_var, self.site2_var = site1_df[self.site1.value], site2_rs_df[self.site2.value]
        else:
            self.site_df_comb = pd.concat([site1_df[self.site1.value], site2_df[self.site2.value]], axis=1, join='inner', sort=False)
            self.site1_var, self.site2_var = site1_df[self.site1.value], site2_df[self.site2.value]
            
concat_input = concat_data()


Dropdown(description='Site 1 Variable: ', options=('Voltage L1N Min', 'Voltage L1N Avg', 'Voltage L1N Max', 'V…

Dropdown(description='Site 2 Variable: ', options=('Vrms ph-n L1N Min', 'Vrms ph-n L1N Avg', 'Vrms ph-n L1N Ma…

In [12]:
# Plot the data based on the input above 
@interact
def comb_line_plot(time=widgets.ToggleButtons(options=['Raw', 'Minute', 'Hour', 'Day', 'Month'], 
                                                    description='Time'),
              color_button=widgets.ToggleButtons(options=['white', 'polar', 'solar', 'henanigans'], 
                                                    description='Plot Theme'),
              color_scale=widgets.ToggleButtons(options=['polar', 'spectral', 'pastel1', 'dark2'], 
                                                    description='Colour Scale')):
    
    
    # The following resamples the data based on the user selection
    df=concat_input.site_df_comb
    
    if time == 'Raw':
        new_df = df.resample('30S').mean()
    elif time == 'Minute':
        new_df = df.resample('T').mean()
    elif time == 'Hour':
        new_df = df.resample('H').mean()
    elif time == 'Day':
        new_df = df.resample('D').mean()
    else:
        new_df = df.resample('M').mean()
                        
    # The newly resampled data is then plotted based on the user's choice
    new_df.iplot(kind='line', xTitle='Time', yTitle='',
                 title='Click on the legend to isolate parameters', mode='lines', theme=color_button, colorscale=color_scale)

interactive(children=(ToggleButtons(description='Time', options=('Raw', 'Minute', 'Hour', 'Day', 'Month'), val…

In [19]:
# Perform the cross-correlation analysis using the two input parameters 
def crosscorr(data1, data2, lag=0, wrap=False):
    
    # If there is a wrap flag, the following wraps the data so that the correlation 
    # values on the 'edges' of the data are still calculated by adding the data from the other side of the signal:
    if wrap:
        shifted2 = data2.shift(lag)
        shifted2.iloc[:lag] = data2.iloc[-lag:].values
        return data1.corr(shifted2)
    else: 
        # Otherwise, return the correlation value of the two datasets based on the lag
        return data1.corr(data2.shift(lag))

# The following will implement the function above, across a range of lag times to find 
# the lag where the correlation is the highest. The direction of the correlation (+/-) 
# indicates if the first site is lagging or leading in the signal.
data = concat_input.site_df_comb.copy()
d1 = data.iloc[:,0]
d2 = data.iloc[:,1]

# The 'lag_range' variable may need to be adjusted if the data is really not synced well
lag_range = 500
corr_vals = [crosscorr(d1, d2, lag) for lag in range(-int(lag_range),int(lag_range+1))]
max_shift = np.ceil(len(corr_vals)/2) - np.argmax(corr_vals)

# Find the frequency of the dataset in order to report the lag (if any in the data)
freq = (data.index[1] - data.index[0]).seconds
if freq < 3600:
    # report in minutes
    print("The '{}' and '{}' of Site 1 and 2 respectively have a lag of {:.2f} minutes in their signals and thus will be synced to remove this offset in the data".format(data.columns[0], data.columns[0], abs(max_shift/60)))

else:
    # report in hours (should be uncommon)
    print("The '{}' and '{}' of Site 1 and 2 respectively have a lag of {:.2f} hours in their signals and thus will be synced to remove this offset in the data".format(data.columns[0], data.columns[0], abs(max_shift/3600)))
    

class lag_options():
    def __init__(self):
               
        self.choice = widgets.Dropdown(description='Sync data?: ', 
                                     options=['Select an Option','Yes, sync based on offset above', 
                                              'No, leave in raw format'])
        display(self.choice)
        
sync_input = lag_options()

The 'Voltage L1N Min' and 'Voltage L1N Min' of Site 1 and 2 respectively have a lag of 3.20 minutes in their signals and thus will be synced to remove this offset in the data


Dropdown(description='Sync data?: ', options=('Select an Option', 'Yes, sync based on offset above', 'No, leav…

In [15]:
# Displays the newly synced data if time syncing is applied

if sync_input.choice.value == 'Yes, sync based on offset above':
    
    # If 'max_shift' is negative, the first site is the lead in the lag
    # and thus needs to be shifted forward by the 'max_shift' value
    if max_shift < 0:
        data_shift = concat_input.site_df_comb.copy()
        data_noshift = concat_input.site_df_comb.copy()

        data_shift.drop(data_shift.columns[1], axis=1, inplace=True)
        data_shift.index = data_shift.index + pd.Timedelta(seconds=max_shift)

        data_noshift.drop(data_noshift.columns[0], axis=1, inplace=True)
        sync_site_df_comb = pd.concat([data_noshift, data_shift], axis=1, join='inner', sort=False)
        
    else:
        data_shift = concat_input.site_df_comb.copy()
        data_noshift = concat_input.site_df_comb.copy()

        data_shift.drop(data_shift.columns[0], axis=1, inplace=True)
        data_shift.index = data_shift.index + pd.Timedelta(seconds=max_shift)

        data_noshift.drop(data_noshift.columns[1], axis=1, inplace=True)
        sync_site_df_comb = pd.concat([data_shift, data_noshift], axis=1, join='inner', sort=False)        
        
        
    @interact
    def sync_line_plot(time=widgets.ToggleButtons(options=['Raw', 'Minute', 'Hour', 'Day', 'Month'], 
                                                        description='Time'),
                       color_button=widgets.ToggleButtons(options=['white', 'polar', 'solar', 'henanigans'], 
                                                        description='Plot Theme'),
                       color_scale=widgets.ToggleButtons(options=['polar', 'spectral', 'pastel1', 'dark2'], 
                                                        description='Colour Scale')):


        # The following resamples the data based on the user selection
        new_df=sync_site_df_comb

        if time == 'Raw':
            new_df = new_df.resample('30S').mean()
        elif time == 'Minute':
            new_df = new_df.resample('T').mean()
        elif time == 'Hour':
            new_df = new_df.resample('H').mean()
        elif time == 'Day':
            new_df = new_df.resample('D').mean()
        else:
            new_df = new_df.resample('M').mean()
            
            
        # The newly resampled data is then plotted based on the user's choice
        new_df.iplot(kind='line', xTitle='Time', yTitle='',
                     mode='lines', theme=color_button, colorscale=color_scale)

else:
    print("The datasets will be left in their raw format")


interactive(children=(ToggleButtons(description='Time', options=('Raw', 'Minute', 'Hour', 'Day', 'Month'), val…

In [17]:
# Allow the user to compare the synced and unsynced datasets

@interact
def sync_line_plot(time=widgets.ToggleButtons(options=['Raw', 'Minute', 'Hour', 'Day', 'Month'], 
                                                    description='Time'),
                   color_button=widgets.ToggleButtons(options=['plotly_white', 'simple_white', 'ggplot2', 'seaborn'], 
                                                    description='Plot Theme')):


    # The following resamples the data based on the user selection
    new_df1=concat_input.site_df_comb
    new_df2=sync_site_df_comb

    if time == 'Raw':
        new_df1 = new_df1.resample('30S').mean()
        new_df2 = new_df2.resample('30S').mean()
    elif time == 'Minute':
        new_df1 = new_df1.resample('T').mean()
        new_df2 = new_df2.resample('T').mean()
    elif time == 'Hour':
        new_df1 = new_df1.resample('H').mean()
        new_df2 = new_df2.resample('H').mean()
    elif time == 'Day':
        new_df1 = new_df1.resample('D').mean()
        new_df2 = new_df2.resample('D').mean()
    else:
        new_df1 = new_df1.resample('M').mean()
        new_df2 = new_df2.resample('M').mean()


    # The newly resampled data is then plotted based on the user's choice
    fig = make_subplots(rows=2, cols=1, shared_xaxes=True)

    fig.add_trace(go.Line(x=new_df1.index, y=new_df1[new_df1.columns[0]], name=new_df1.columns[0]), row=1, col=1)
    fig.add_trace(go.Line(x=new_df1.index, y=new_df1[new_df1.columns[1]], name=new_df1.columns[1]), row=1, col=1)
    
    fig.add_trace(go.Line(x=new_df2.index, y=new_df2[new_df1.columns[1]], name=new_df2.columns[1]), row=2, col=1)
    fig.add_trace(go.Line(x=new_df2.index, y=new_df2[new_df1.columns[0]], name=new_df2.columns[0]), row=2, col=1)

    # Formatting
    fig.update_yaxes(title_text="Raw data", row=1, col=1)
    fig.update_yaxes(title_text="Synced data", row=2, col=1)

    fig.update_layout(height=800, width=950, title_text="Time Sync Comparison. Use 'Pan' tool to zoom in to specific times", template=color_button)
    iplot(fig)
    

interactive(children=(ToggleButtons(description='Time', options=('Raw', 'Minute', 'Hour', 'Day', 'Month'), val…

## Corrected data 
*(if errors found)*

  Further functionality for outputting the corrected data will eventually be streamline with the other error detection methods in Project LEO and updated here.