# TNT hand gestures study

# flow of program
DELETE BEFORE SUBMISSION

-   code to import data from 3 csv files held in the data folder
-   data is then combined in a dictionary and then converted to a pandas dataframe
-   the data is then cleaned and the columns are renamed
-   the data is then split into a training and testing set
-   a linear regression model is then fitted to the training data
-   the model is then used to predict the test data
-   the mean squared error and r2 score are then calculated
-   the results are then printed to the console

**Dependencies**

In [None]:
!pip install pandas
!pip install numpy
!pip install matplotlib
!pip install seaborn
!pip install scikit-learn
!pip install tensorflow

**Importing relevant libraries.**

In [None]:
'''
Topics covered till lab 7
read Excel file: pd.read_excel()<br>
read CSV file: pd.read_csv()<br>
rename columns: df.rename()<br>
unique values from a column: df.unique()<br>
duplicated rows: df.duplicated()<br>
drop duplicated rows: df.drop_duplicates()<br>
quantile / percentile: df.quantile()<br>
rows with NaN: df.isna()<br>
drop rows with NaN: df.dropna()<br>
determine the bin: pd.cut  
assign numerical values to different categorical data: pd.get_dummies<br>
determine data type: df.dtypes<br>
plot regression plot: sns.regplot<br>
calculate Pearson correlation: stats.pearsonr<br>

In our previous labs, the main target was to find the most meaningful features for predicting the car price. In lab 7 we will try to develop different models to predict the car price using those meaningful features. The developed model will help us to understand how these variables are used to predict the result(car price) in our case.<br> The possible meaning features in car price prediction dataset are:<br>
'''
#-----------------Information-----------------#

'''
    Title: Linear Regression Model for Predicting Absolute Acceleration
    Data Collection Declaration:

    This project is being developed for a Data Science and Machine Learning class.
    The data used in this project was collected by the student developers at the University of Nottingham. 

    Legal Aspects:

    The data collection process complied with all applicable laws and university policies. 
    Any personal data that was collected has been anonymized to protect the privacy of the individuals involved. 

    Please note that the use of this data must comply with all relevant data protection and privacy laws. 
    Unauthorized use, disclosure, or duplication of this data is strictly prohibited.
'''
'''
    Data Information:

    Data within the dataset being examined is of the format of a csv file with the following columns:
    Column Names and Types:
    
    '''

#-----------------Information-----------------#

#-----------------Importing Libraries-----------------#

import pandas as pd
import numpy as np
import os
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
import math
# import k-nearest neighbors
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# import DBSCAN
from sklearn.cluster import DBSCAN



#-----------------Importing Libraries-----------------#

In [None]:

#-----------------Flags-----------------#

SYS_MSG = True # Flag to toggle control over printing system messages to console 
PLOT = True # Flag to toggle control over plotting graphs
TEST_PLOT = False # Flag to toggle control over plotting test graphs

#-----------------Flags-----------------#

#-----------------Basic Functions-----------------#

# Function to print system messages
def print_sys_msg(msg):
    if SYS_MSG:
        print('-'*10+'System control message'+'-'*10+'\t\t'+msg)

# Function to print normal messages
def print_msg(msg):
    print('-'*10+'Message from program'+'-'*10+'\t'+msg)

#-----------------Basic Functions-----------------#

## Preperation Phase  
We create base classes to manage our data, visualisation, analysis, and prediction.  
Creating DataHandler class. It wwill manage the import, stored manipulation, and final storage of data.  
\- [member name]

In [None]:
#-----------------DataHandler Class-----------------#
'''
Class DataHandler
    purpose: import and manage data (data manipulation, data wrangling, and data preprocessing)

    initialization example:
        data = DataHandler()
    
    functions:
    - import_data: import data from csv files and combine them into a single dataframe
        dependencies used: pandas
        function call example: data.import_data(['data/member1.csv', 'data/member2.csv', 'data/member3.csv'])
        input: list of file names
        output: none
    
    - import_data_system: import data from the system
        dependencies used: pandas, os
        function call example: data.import_data_system(['data/member1', 'data/member2', 'data/member3'])
        input: list of directory names
        output: none
        
    - data_shape: print the shape of the dataframe
        dependencies used: pandas
        function call example: data.data_shape()
        input: none
        output: none

    - data_head: print the first 5 rows of the dataframe
        dependencies used: pandas
        function call example: data.data_head()
        input: none
        output: none

    - data_info: print the information of the dataframe
        dependencies used: pandas
        function call example: data.data_info()
        input: none
        output: none

    - data_describe: print the description of the dataframe
        dependencies used: pandas
        function call example: data.data_describe()
        input: none
        output: none

    - data_null: print the null values in the dataframe
        dependencies used: pandas
        function call example: data.data_null()
        input: none
        output: none
    
    - data_corr: print the correlation matrix of the dataframe
        dependencies used: pandas
        function call example: data.data_corr()
        input: none
        output: none

    - drop_duplicates: drop duplicates        
        dependencies used: pandas
        function call example: data.drop_duplicates()
        input: none
        output: none

    - drop_null: drop null values     
        dependencies used: pandas
        function call example: data.drop_null()
        input: none
        output: none
    
    - drop_outliers: drop outliers
        dependencies used: pandas
        function call example: data.drop_outliers()
        input: none
        output: none

    - drop_negative_time: drop negative time values
        dependencies used: pandas
        function call example: data.drop_negative_time()
        input: none
        output: none
        
- Managed by: Samarth
- Created on: 03/02/2024
- Modified on: 03/02/2024
- Contact:  psxs2@nottingham.ac.uk
'''
class DataHandler:
    number_of_files = 0
    data = None
    df = None

    def __init__(self):
        print_sys_msg('DataHandler:__init__: DataHandler object created')
    #-----------------Data Import Functions-----------------#
    
    # declaration example - data = DataHandler.import_data(['data/member1.csv', 'data/member2.csv', 'data/member3.csv'])
    def import_data(self, files):
            print_sys_msg('DataHandler:import_data: importing data from 3 csv files and combining them into a single dataframe')
            number_of_files = len(files)
            for i in range(number_of_files):
                if i == 0:
                    self.df = pd.read_csv(files[i])
                    self.data = {'data_'+str(i+1): pd.read_csv(files[i])}
                else:
                    self.df = pd.concat([self.df, pd.read_csv(files[i])], ignore_index=True)
                    self.data['data_'+str(i+1)] = pd.read_csv(files[i])
            print_sys_msg('DataHandler:import_data: data imported successfully')

    # declaration example - data = DataHandler.import_data_system(['data/member1/type_of_gesture/', 'data/member2/type_of_gesture/', 'data/member3/type_of_gesture/'])
    def import_data_system(self, directorys):
        print_sys_msg('DataHandler:import_data: importing data from the system')
        
        # all files are stored in the format data/member1/type_of_gesture/gesture1/Raw Data.csv

        # getting the number of members
        number_of_members = len(directorys)

        print_sys_msg('DataHandler:import_data: number of members: '+str(number_of_members))

        for i in range(number_of_members):
            # member name from the directory
            member = directorys[i].split('/')[1]
            # gesture name from the directory
            gesture = directorys[i].split('/')[2]
            # getting the number of gestures
            number_of_gestures = len(os.listdir(directorys[i]))

            print_sys_msg('DataHandler:import_data: member: '+member)
            print_sys_msg('DataHandler:import_data: gesture name: '+gesture)
            print_sys_msg('DataHandler:import_data: number of gestures: '+str(number_of_gestures))
            
            for j in range(number_of_gestures):
                # reading name of each gesture and then grabbing the 'Raw Data.csv' file from it
                gesture_name = os.listdir(directorys[i])[j]
                
                # reading the 'Raw Data.csv' file inside the gesture folder
                # if self.data is None:
                #     self.data = {'data_'+member+'_'+gesture+'_'+'_'+str(j+1): pd.read_csv(directorys[i]+gesture_name+'/Raw Data.csv')}
                if self.data is None:
                    self.data = {'data_'+member+'_'+gesture+'_'+gesture_name: pd.read_csv(directorys[i]+gesture_name+'/Raw Data.csv')}                
                # else: 
                #   self.data['data_'+member+'_'+gesture+'_'+str(j+1)] = pd.read_csv(directorys[i]+gesture_name+'/Raw Data.csv')
                else:
                    self.data['data_'+member+'_'+gesture+'_'+gesture_name] = pd.read_csv(directorys[i]+gesture_name+'/Raw Data.csv')

                # # adding gesture_name to the dictionary of data
                # self.data['data_'+member+'_'+gesture+'_'+str(j+1)]['Gesture'] = gesture_name
        
        print_sys_msg('DataHandler:import_data: data imported successfully')
        # print the dictionary names 
        print_sys_msg('DataHandler:import_data: printing the dictionary names')
        print_sys_msg(str(self.data.keys()))
        

        
    #-----------------Data Import Functions-----------------#
                    
    
    #-----------------Basic Data Wrangling Functions-----------------#
    
    def data_shape(self):
        print_sys_msg('DataHandler:data_shape: printing the shape of the dataframe')
        print_sys_msg(str(self.df.shape))
    def data_head(self):
        print_sys_msg('DataHandler:data_head: printing the first 5 rows of the dataframe')
        print_sys_msg(str(self.df.head()))
    
    def data_info(self):
        print_sys_msg('DataHandler:data_info: printing the information of the dataframe')
        print_sys_msg(str(self.df.info()))
                      
    def data_describe(self):
        print_sys_msg('DataHandler:data_describe: printing the description of the dataframe')
        print_sys_msg(str(self.df.describe()))
                      
    def data_null(self):
        print_sys_msg('DataHandler:data_null: printing the null values in the dataframe')
        print_sys_msg(str(self.df.isnull().sum()))
                      
    def data_corr(self):
        print_sys_msg('DataHandler:data_corr: printing the correlation matrix of the dataframe')
        print_sys_msg(str(self.df.corr()))
    
    def data_missing(self):
        print_sys_msg('DataHandler:data_missing: printing the missing values in the dataframe')
        print_sys_msg(str(self.df.isna().any(axis=1)))
    #-----------------Basic Data Wrangling Functions-----------------#
        
    #-----------------Data Preprocessing Functions-----------------#

    # add_to_data: add data to the dataframe
    def add_to_data(self, data):
        print_sys_msg('DataHandler:add_to_data: adding data to the dataframe')
        self.df = pd.concat([self.df, data], ignore_index=True)

    def drop_duplicates(self):
        print_sys_msg('DataHandler:drop_duplicates: dropping duplicates')
        self.df = self.df.drop_duplicates()
    
    def drop_null(self):
        print_sys_msg('DataHandler:drop_null: dropping null values')
        self.df = self.df.dropna()


    # def drop_outliers(self):
    #     print_sys_msg('DataHandler:drop_outliers: dropping outliers'
    #     self.df = self.df[(self.df['Linear Acceleration x (m/s^2)'] > -10) & (self.df['Linear Acceleration x (m/s^2)'] < 10)]
    #     self.df = self.df[(self.df['Linear Acceleration y (m/s^2)'] > -10) & (self.df['Linear Acceleration y (m/s^2)'] < 10)]
    #     self.df = self.df[(self.df['Linear Acceleration z (m/s^2)'] > -10) & (self.df['Linear Acceleration z (m/s^2)'] < 10)]
    #     self.df = self.df[(self.df['Absolute acceleration (m/s^2)'] > 0) & (self.df['Absolute acceleration (m/s^2)'] < 10)]

    def drop_negative_time(self):
        print_sys_msg('DataHandler:drop_negative_time: dropping negative time values')
        self.df = self.df[self.df['Time (s)'] > 0]

    # missing values handling - drop rows with missing values
    def drop_missing(self, threshold=3):
        print_sys_msg('DataHandler:drop_missing: dropping missing values')
        self.df = self.df.dropna(thresh=threshold).copy()

    # missing values handling - fill missing values with mean of the column
    def fill_missing(self):
        print_sys_msg('DataHandler:fill_missing: filling missing values with mean of the column')
        self.df = self.df.fillna(self.df.mean())

    # missing values handling - fill missing values with median of the column
    def fill_missing_median(self):
        print_sys_msg('DataHandler:fill_missing_median: filling missing values with median of the column')
        self.df = self.df.fillna(self.df.median())
    
    # missing values handling - fill missing values with mode of the column
    def fill_missing_mode(self):
        print_sys_msg('DataHandler:fill_missing_mode: filling missing values with mode of the column')
        self.df = self.df.fillna(self.df.mode().iloc[0])
    
    # missing values handling - fill missing values with bill debth of the column
    #-----------------------------------
    #-----------------------------------To be written
    #-----------------------------------
    
    # data normalization - min-max normalization
    def min_max_normalization(self):
        print_sys_msg('DataHandler:min_max_normalization: min-max normalization')
        self.df = (self.df - self.df.min()) / (self.df.max() - self.df.min())
    
    # data normalization - standardization
    def standardization(self):
        print_sys_msg('DataHandler:standardization: standardization')
        self.df = (self.df - self.df.mean()) / self.df.std()
    

    #-----------------Data Preprocessing Functions-----------------#


    #-----------------Data Splitting Functions-----------------#
    
    #-----------------Data Splitting Functions-----------------#
        
    #-----------------Storing Data Functions-----------------#

    def store_data_with_name(self, file_name):
        print_sys_msg('DataHandler:store_data: storing data to a csv file')
        self.df.to_csv(file_name, index=False)
    
    def store_data_with_current_date_time(self):
        print_sys_msg('DataHandler:store_data_with_current_date_time: storing data to a csv file with current date and time')
        self.df.to_csv('data_'+str(pd.to_datetime('today'))+'.csv', index=False)
    
    def store_data_with_index(self):
        print_sys_msg('DataHandler:store_data_with_index: storing data to a csv file with index')
        
        # data is stored with index only
        # getting the highest index of the data in the data folder and then incrementing it by 1
        # storing the data with the new index
        #-----------------------------------
        #-----------------------------------To be written

    #-----------------Storing Data Functions-----------------#

#-----------------DataHandler Class-----------------#

Next we make a basic visualization class that will manage the plotting and core visualization functions.

In [None]:
#-----------------DataVisualization Class-----------------#
'''
Class DataVisualization

    purpose: visualize data (data visualization)
    charts included: line, scatter, bar, histogram, box plot, violin plot, bullet, table, sparkline, connected scatter plot, box, pie, doughnut, gauge, waffle
    
    functions:
    -  __init__: initialize the object
        dependencies used: none
        function call example: data_visualization = DataVisualization()
        input: none
        output: none

    - switch_to_seaborn: switch to seaborn
        dependencies used: none
        function call example: data_visualization.switch_to_seaborn(True)
        input: boolean flag
        output: none

    - add_index: add index
        dependencies used: none
        function call example: data_visualization.add_index(index)
        input: index
        output: none

    - add_data: add data
        dependencies used: none
        function call example: data_visualization.add_data(data)
        input: data
        output: none

    - plot_chart: plot chart
        dependencies used: matplotlib, seaborn
        function call example: data_visualization.plot_chart('line', 'cyan', 0.8)
        input: type, color, thickness
        output: none

    - set_grid_params: set grid parameters
        dependencies used: matplotlib
        function call example: data_visualization.set_grid_params(2, 2, 20, 5, 'Title')
        input: rows, cols, figsize_x, figsize_y, title
        output: none

    - plot_grid_1d: plot grid 1d
        dependencies used: matplotlib, seaborn
        function call example: data_visualization.plot_grid_1d(0, 'line', 'x', 'y', 'Title', 'cyan', 0.8)
        input: count, type, name_x, name_y, plot_title, color, thickness    
        output: none

    - plot_grid_2d: plot grid 2d
        dependencies used: matplotlib, seaborn
        function call example: data_visualization.plot_grid_2d(0, 0, 'line', 'x', 'y', 'Title', 'cyan', 0.8)
        input: row, col, type, name_x, name_y, plot_title, color, thickness
        output: none

    - set_x_label: set x label
        dependencies used: matplotlib
        function call example: data_visualization.set_x_label('x')
        input: label
        output: none

    - set_y_label: set y label
        dependencies used: matplotlib
        function call example: data_visualization.set_y_label('y')
        input: label
        output: none

    - set_x_tick_labels: set x tick labels
        dependencies used: matplotlib
        function call example: data_visualization.set_x_tick_labels(labels)
        input: labels
        output: none

    - set_y_tick_labels: set y tick labels
        dependencies used: matplotlib
        function call example: data_visualization.set_y_tick_labels(labels)
        input: labels
        output: none

    - figure_size: set figure size
        dependencies used: matplotlib
        function call example: data_visualization.figure_size(20, 5)
        input: width, height
        output: none

    - set_title: set title
        dependencies used: matplotlib
        function call example: data_visualization.set_title('Title')
        input: title
        output: none

    - clear_plot: clear plot
        dependencies used: matplotlib
        function call example: data_visualization.clear_plot()
        input: none
        output: none

    - show_plot: show plot
        dependencies used: matplotlib
        function call example: data_visualization.show_plot()
        input: none
        output: none
   
- Managed by: 
- Created on: 03/02/2024
- Modified on: 03/02/2024
- Contact: @nottingham.ac.uk
'''
class DataVisualization:
    
    # boolean flag to determine if the plot is matplotlib or seaborn
    is_seaborn = False

    def __init__(self, index):
        self.add_index(index)
        self.data = []

    def switch_to_seaborn(self, flag):
        self.is_seaborn = flag

    def add_index(self, index):
        self.index = index.copy()

    def add_data(self, data):
        self.data = data.copy()

    # def plot_chart(self, type, c='cyan', thickness=0.8, legend=None):
    def plot_chart(self, type, name_x, name_y, plot_title, c='cyan', thickness=0.8, legend=None):
        # charts included: line, scatter, bar, histogram, box plot, violin plot, bullet, table, sparkline, connected scatter plot, box, pie, doughnut, gauge, waffle
        if type == 'line':
            sns.lineplot(x=self.index, y=self.data, c=c, linewidth=thickness) if self.is_seaborn else plt.plot(self.index, self.data, c=c, linewidth=thickness)
        elif type == 'scatter':
            sns.scatterplot(x=self.index, y=self.data, c=c) if self.is_seaborn else plt.scatter(self.index, self.data, c=c)
        elif type == 'bar':
            sns.barplot(x=self.index, y=self.data, c=c) if self.is_seaborn else plt.bar(self.index, self.data, c=c)
        elif type == 'histogram':
            sns.histplot(self.data, c=c) if self.is_seaborn else plt.hist(self.data, c=c)
        elif type == 'box plot':
            sns.boxplot(self.data, c=c) if self.is_seaborn else plt.boxplot(self.data, c=c)
        elif type == 'violin plot':
            sns.violinplot(self.data, c=c) if self.is_seaborn else plt.violinplot(self.data, c=c)
        elif type == 'bullet':
            sns.bullet(self.data, c=c) if self.is_seaborn else plt.bullet(self.data, c=c)
        elif type == 'table':
            sns.table(self.data, c=c) if self.is_seaborn else plt.table(self.data, c=c)
        elif type == 'sparkline':
            sns.sparkline(self.data, c=c) if self.is_seaborn else plt.sparkline(self.data, c=c)
        elif type == 'connected scatter plot':
            sns.lineplot(x=self.index, y=self.data, sort=False, c=c) if self.is_seaborn else plt.plot(self.index, self.data, c=c)
        elif type == 'box':
            sns.boxplot(self.data, c=c) if self.is_seaborn else plt.boxplot(self.data, c=c)
        elif type == 'pie':
            plt.pie(self.data, labels=self.index)
        elif type == 'doughnut':
            plt.pie(self.data, labels=self.index, wedgeprops=dict(width=0.5))
        elif type == 'gauge':
            plt.pie(self.data, labels=self.index, wedgeprops=dict(width=0.2))
        elif type == 'waffle':
            plt.pie(self.data, labels=self.index, wedgeprops=dict(width=0.1))
        else:
            print('Invalid chart type')
        
        if legend is not None:
            plt.legend(legend, loc='upper right')
    
    def set_grid_params(self, rows, cols, figsize_x, figsize_y, title):
        if rows <= 0 or cols <= 0:
            print_sys_msg('DataVisualization:set_grid_params: rows and cols should be greater than 0')
            return
        self.fig, self.ax = plt.subplots(rows, cols, figsize=(figsize_x, figsize_y))
        self.fig.suptitle(title)
    
    def plot_grid_1d(self, count, type, name_x, name_y, plot_title, c= 'cyan', thickness=0.8, legend=None):
        
        if count <= 1:
            self.plot_chart(type, color, thickness)
            return
        self.ax[count].set_xlabel(name_x)
        self.ax[count].set_ylabel(name_y)
        self.ax[count].set_title(plot_title)
        if type == 'line':
            sns.lineplot(x=self.index, y=self.data, c=c, linewidth=thickness, ax=self.ax[count]) if self.is_seaborn else self.ax[count].plot(self.index, self.data, c=c, linewidth=thickness)
        elif type == 'scatter':
            sns.scatterplot(x=self.index, y=self.data, c=c, ax=self.ax[count]) if self.is_seaborn else self.ax[count].scatter(self.index, self.data, c=c)
        elif type == 'bar':
            sns.barplot(x=self.index, y=self.data, c=c, ax=self.ax[count]) if self.is_seaborn else self.ax[count].bar(self.index, self.data, c=c)
        elif type == 'histogram':
            sns.histplot(self.data, c=c, ax=self.ax[count]) if self.is_seaborn else self.ax[count].hist(self.data, c=c)
        elif type == 'box plot':
            sns.boxplot(self.data, c=c, ax=self.ax[count]) if self.is_seaborn else self.ax[count].boxplot(self.data, c=c)
        elif type == 'violin plot':
            sns.violinplot(self.data, c=c, ax=self.ax[count]) if self.is_seaborn else self.ax[count].violinplot(self.data, c=c)
        elif type == 'bullet':
            sns.bullet(self.data, c=c, ax=self.ax[count]) if self.is_seaborn else self.ax[count].bullet(self.data, c=c)
        elif type == 'table':
            sns.table(self.data, c=c, ax=self.ax[count]) if self.is_seaborn else self.ax[count].table(self.data, c=c)
        elif type == 'sparkline':
            sns.sparkline(self.data, c=c, ax=self.ax[count]) if self.is_seaborn else self.ax[count].sparkline(self.data, c=c)
        elif type == 'connected scatter plot':
            sns.lineplot(x=self.index, y=self.data, sort=False, c=c, ax=self.ax[count]) if self.is_seaborn else self.ax[count].plot(self.index, self.data, c=c)
        elif type == 'box':
            sns.boxplot(self.data, c=c, ax=self.ax[count]) if self.is_seaborn else self.ax[count].boxplot(self.data, c=c)
        elif type == 'pie':
            self.ax[count].pie(self.data, labels=self.index)
        elif type == 'doughnut':
            self.ax[count].pie(self.data, labels=self.index, wedgeprops=dict(width=0.5))
        elif type == 'gauge':
            self.ax[count].pie(self.data, labels=self.index, wedgeprops=dict(width=0.2))
        elif type == 'waffle':
            self.ax[count].pie(self.data, labels=self.index, wedgeprops=dict(width=0.1))
        else:
            print('Invalid chart type')

        if legend is not None:
            self.ax[count].legend(legend, loc='upper right')

    def plot_grid_2d(self, row, col, type, name_x, name_y, plot_title, c='cyan', thickness=0.8, legend=None):
        self.ax[row, col].set_xlabel(name_x)
        self.ax[row, col].set_ylabel(name_y)
        self.ax[row, col].set_title(plot_title)

        if type == 'line':
            sns.lineplot(x=self.index, y=self.data, c=c, linewidth=thickness, ax=self.ax[row, col]) if self.is_seaborn else self.ax[row, col].plot(self.index, self.data, c=c, linewidth=thickness)
        elif type == 'scatter':
            sns.scatterplot(x=self.index, y=self.data, c=c, ax=self.ax[row, col]) if self.is_seaborn else self.ax[row, col].scatter(self.index, self.data, c=c)
        elif type == 'bar':
            sns.barplot(x=self.index, y=self.data, c=c, ax=self.ax[row, col]) if self.is_seaborn else self.ax[row, col].bar(self.index, self.data, c=c)
        elif type == 'histogram':
            sns.histplot(self.data, c=c, ax=self.ax[row, col]) if self.is_seaborn else self.ax[row, col].hist(self.data, c=c)
        elif type == 'box plot':
            sns.boxplot(self.data, c=c, ax=self.ax[row, col]) if self.is_seaborn else self.ax[row, col].boxplot(self.data, c=c)
        elif type == 'violin plot':
            sns.violinplot(self.data, c=c, ax=self.ax[row, col]) if self.is_seaborn else self.ax[row, col].violinplot(self.data, c=c)
        elif type == 'bullet':
            sns.bullet(self.data, c=c, ax=self.ax[row, col]) if self.is_seaborn else self.ax[row, col].bullet(self.data, c=c)
        elif type == 'table':
            sns.table(self.data, c=c, ax=self.ax[row, col]) if self.is_seaborn else self.ax[row, col].table(self.data, c=c)
        elif type == 'sparkline':
            sns.sparkline(self.data, c=c, ax=self.ax[row, col]) if self.is_seaborn else self.ax[row, col].sparkline(self.data, c=c)
        elif type == 'connected scatter plot':
            sns.lineplot(x=self.index, y=self.data, sort=False, c=c, ax=self.ax[row, col]) if self.is_seaborn else self.ax[row, col].plot(self.index, self.data, c=c)
        elif type == 'box':
            sns.boxplot(self.data, c=c, ax=self.ax[row, col]) if self.is_seaborn else self.ax[row, col].boxplot(self.data, c=c)
        elif type == 'pie':
            self.ax[row, col].pie(self.data, labels=self.index)
        elif type == 'doughnut':
            self.ax[row, col].pie(self.data, labels=self.index, wedgeprops=dict(width=0.5))
        elif type == 'gauge':
            self.ax[row, col].pie(self.data, labels=self.index, wedgeprops=dict(width=0.2))
        elif type == 'waffle':
            self.ax[row, col].pie(self.data, labels=self.index, wedgeprops=dict(width=0.1))
        else:
            print('Invalid chart type')

        if legend is not None:
            self.ax[row, col].legend(legend, loc='upper right')

    def set_x_label(self, label):
        plt.xlabel(label)

    def set_y_label(self, label):
        plt.ylabel(label)

    def set_x_tick_labels(self, labels):
        plt.xticks(self.index, labels)
    
    def set_y_tick_labels(self, labels):
        plt.yticks(self.index, labels)
    
    def figure_size(self, width = 20, height = 5):
        plt.figure(figsize=(width, height))
    
    def set_title(self, title):
        plt.title(title)

    def clear_plot(self):
        plt.clf()
        
    def show_plot(self):
        plt.show()

In [None]:
# class sliding time window. it will create frames of dataframes based on the sliding time window size. 
# window_size -> size of the window which represents the number of rows in the dataframe
# step_size -> size of the step which represents the number of rows to move the window by 
# data -> dataframe which will be used to create frames
# create_frames -> function to create frames of dataframes based on the sliding time window size
# get_frames -> function to get the frames


class SlidingTimeWindow:
    def __init__(self, data, window_size, step_size):
        self.data = data
        self.window_size = window_size
        self.step_size = step_size
        self.frames = []
    
    def create_frames(self):
        for i in range(0, len(self.data), self.step_size):
            if i + self.window_size < len(self.data):
                self.frames.append(self.data.iloc[i:i+self.window_size])
            else:
                self.frames.append(self.data.iloc[i:])
        return self.frames
    
    def get_frames(self):
        return self.frames


# class DBSCANClustering. it will two groups using DBSCAN clustering algorithm for each column in the dataframe except the time, and gesture column. Also it will have a function to visualize it

class DBSCANClustering:
    def __init__(self, data, eps, min_samples):
        self.data = data
        self.eps = eps
        self.min_samples = min_samples
        self.clusters = []
    
    def cluster(self):
        for column in self.data.columns:
            print_sys_msg('DBSCANClustering:cluster: column: '+column)
            if column != 'Time (s)' and column != 'Gesture':
                print_sys_msg('DBSCANClustering:cluster: clustering started for column: '+column)  
                clustering = DBSCAN(eps=self.eps, min_samples=self.min_samples).fit(self.data[column].values.reshape(-1, 1))
                self.clusters.append(clustering.labels_)
        print_sys_msg('DBSCANClustering:cluster: clustering done')
        print_sys_msg('DBSCANClustering:cluster: clusters: '+str(len(self.clusters)))
        return self.clusters
    
    def visualize_with_DataVisualization(self, include_original_data = False, sns = False):
        for i in range(len(self.clusters)):
            if self.data.columns[i] != 'Time (s)':
                dv = DataVisualization(self.data['Time (s)'])
                if sns == True:
                    dv.switch_to_seaborn(True)
                dv.add_data(self.data.iloc[:, i])
                dv.plot_chart('scatter', 'cyan', 0.8)
                if include_original_data == True:
                    dv.plot_chart('line', 'red', 0.3)
                dv.set_x_label('Time (s)')
                dv.set_y_label(self.data.columns[i])
                dv.set_title(self.data.columns[i])
                dv.show_plot()
                if sns == True:
                    dv.switch_to_seaborn(False)
    
    def visualize_with_DataVisualization_selected_column(self, column, include_original_data = False, sns = False): # column is the index of the column. 1- linear acceleration x, 2- linear acceleration y, 3- linear acceleration z, 4- absolute acceleration
        dv = DataVisualization(self.data['Time (s)'])
        if sns == True:
            dv.switch_to_seaborn(True)
        dv.add_data(self.data.iloc[:, column])
        dv.plot_chart('scatter', c=self.clusters[column], thickness=0.8)
        if include_original_data == True:
            dv.plot_chart('line', c='red', thickness=0.3)
        dv.set_x_label('Time (s)')
        dv.set_y_label(self.data.columns[column])
        dv.set_title(self.data.columns[column])
        dv.show_plot()
        if sns == True:
            dv.switch_to_seaborn(False)
        
    def visualize_with_DataVisualization_2d(self, topic = 'DBSCAN Clustering', include_original_data = False, sns = False, save=False, file_path_name='DBSCAN Clustering', visualize_in_terminal=False):
        # use dv.plot_grid_2d
        dv = DataVisualization(self.data['Time (s)'])
        if sns == True:
            dv.switch_to_seaborn(True)

        dv.set_grid_params(2, 2, 20, 10, topic)

        dv.add_data(self.data.iloc[:, 1])
        dv.plot_grid_2d(0, 0, 'scatter', 'Time (s)', self.data.columns[1], self.data.columns[1], c=self.clusters[0])
        if include_original_data == True:
            dv.plot_grid_2d(0, 0, 'line' , 'Time (s)', self.data.columns[1], self.data.columns[1], c='red', thickness=0.3)

        dv.add_data(self.data.iloc[:, 2])
        dv.plot_grid_2d(0, 1, 'scatter', 'Time (s)', self.data.columns[2], self.data.columns[2], c=self.clusters[1])
        if include_original_data == True:
            dv.plot_grid_2d(0, 1, 'line' , 'Time (s)', self.data.columns[2], self.data.columns[2], c='red', thickness=0.3)

        dv.add_data(self.data.iloc[:, 3])
        dv.plot_grid_2d(1, 0, 'scatter', 'Time (s)', self.data.columns[3], self.data.columns[3], c=self.clusters[2])
        if include_original_data == True:
            dv.plot_grid_2d(1, 0, 'line' , 'Time (s)', self.data.columns[3], self.data.columns[3], c='red', thickness=0.3)
            
        dv.add_data(self.data.iloc[:, 4])
        dv.plot_grid_2d(1, 1, 'scatter', 'Time (s)', self.data.columns[4], self.data.columns[4], c=self.clusters[3])
        if include_original_data == True:
            dv.plot_grid_2d(1, 1, 'line' , 'Time (s)', self.data.columns[4], self.data.columns[4], c='red', thickness=0.3)
        
        if visualize_in_terminal == True:
            dv.show_plot()
        if sns == True:
            dv.switch_to_seaborn(False)
        if save == True:
            file_path_name = 'cluster_data/'+file_path_name
            dv.fig.savefig(file_path_name)

    def Visualize_best_esp_min_samples(self, best_values_df, eps, min_samples, include_original_data = False, sns = False, save=False, file_path_name='DBSCAN Clustering'):
        # create a heatmap with index as eps and columns as min_samples and values as the number of unique clusters found in linear acceleration x, linear acceleration y, linear acceleration z, and absolute acceleration for each key in best_values_df.
        self.best_values_df = best_values_df
        dv = DataVisualization(self.best_values_df['eps'])
        keys = self.best_values_df['key'].unique()
        for key in keys:
            # create a heatmap for each key in best_values_df with index as eps and columns as min_samples and values as the number of unique clusters found in linear acceleration x, linear acceleration y, linear acceleration z, and absolute acceleration. make 4 seperate heatmaps for each acceleration in a grid of 2 x 2. use sns.heatmap to create the heatmap
            dv.switch_to_seaborn(True)
            dv.set_grid_params(2, 2, 20, 15, key)
            sns.heatmap(self.best_values_df[self.best_values_df['key'] == key].pivot(index='eps', columns='min_samples', values='unique_clusters_linear_acceleration_x'), ax=dv.ax[0, 0], cmap='coolwarm')
            dv.ax[0, 0].set_title('unique_clusters_linear_acceleration_x')
            sns.heatmap(self.best_values_df[self.best_values_df['key'] == key].pivot(index='eps', columns='min_samples', values='unique_clusters_linear_acceleration_y'), ax=dv.ax[0, 1], cmap='coolwarm')
            dv.ax[0, 1].set_title('unique_clusters_linear_acceleration_y')
            sns.heatmap(self.best_values_df[self.best_values_df['key'] == key].pivot(index='eps', columns='min_samples', values='unique_clusters_linear_acceleration_z'), ax=dv.ax[1, 0], cmap='coolwarm')
            dv.ax[1, 0].set_title('unique_clusters_linear_acceleration_z')
            sns.heatmap(self.best_values_df[self.best_values_df['key'] == key].pivot(index='eps', columns='min_samples', values='unique_clusters_absolute_acceleration'), ax=dv.ax[1, 1], cmap='coolwarm')
            dv.ax[1, 1].set_title('unique_clusters_absolute_acceleration')
            dv.show_plot()
            dv.switch_to_seaborn(False)


## DATA Initiation Phase  

### Data import  
importing 3 seperate files containing the data from phyphox of each member of the team

In [None]:
data_circles = DataHandler()
data_circles.import_data_system(['data/anakha/circles/'])
data_circles.name='circles'

data_come_here = DataHandler()
data_come_here.import_data_system(['data/anakha/come here/'])
data_come_here.name='come here'

data_go_away = DataHandler()
data_go_away.import_data_system(['data/anakha/go away/'])
data_go_away.name='go away'

data_wave = DataHandler()
data_wave.import_data_system(['data/anakha/wave/'])
data_wave.name='wave'

In [None]:
if False: # set to True to visualize the raw data
    # show linear accelaration x, y, z in a plot alongside Absolute acceleration to the right in a grid for circle the keys in the dictionary data.data
    for gesture in [data_circles, data_come_here, data_go_away, data_wave]:
        for key in gesture.data.keys():
            print_sys_msg(key) 
            dv = DataVisualization(gesture.data[key]['Time (s)'])

            dv.set_grid_params(2, 2, 20, 15, gesture.name+' -> '+key)
            dv.add_data(gesture.data[key]['Linear Acceleration x (m/s^2)'])
            dv.plot_grid_2d(0, 0, 'line', 'Time (s)', 'Linear Acceleration x (m/s^2)', 'Linear Acceleration x (m/s^2) vs Time (s)', "red")
            dv.add_data(gesture.data[key]['Linear Acceleration y (m/s^2)'])
            dv.plot_grid_2d(0, 1, 'line', 'Time (s)', 'Linear Acceleration y (m/s^2)', 'Linear Acceleration y (m/s^2) vs Time (s)', "green")
            dv.add_data(gesture.data[key]['Linear Acceleration z (m/s^2)'])
            dv.plot_grid_2d(1, 0, 'line', 'Time (s)', 'Linear Acceleration z (m/s^2)', 'Linear Acceleration z (m/s^2) vs Time (s)', "blue")
            dv.add_data(gesture.data[key]['Absolute acceleration (m/s^2)'])
            dv.plot_grid_2d(1, 1, 'line', 'Time (s)', 'Absolute acceleration (m/s^2)', 'Absolute acceleration (m/s^2) vs Time (s)')

            dv.show_plot()

In [None]:
for gesture in [data_circles, data_come_here, data_go_away, data_wave]:
# for gesture in [data_wave]:
    print_sys_msg("starting pre-labelling analysis for gesture: "+gesture.name)

    best_values = []  # List to store the data of the best values of eps and min_samples
    eps = [0.1, 0.2, 0.3, 0.4, 0.5]
    min_samples = [20, 30, 40, 50, 100, 200]
    
    for key in gesture.data.keys():
        print_sys_msg(key)
        # print_msg("printing sample data with eps: 0.3 and min_samples: 30")
        # dbscan = DBSCANClustering([gesture].data[key], 0.3, 30)
        # dbscan.cluster()
        # dbscan.visualize_with_DataVisualization_2d(key, include_original_data=True, sns=True)
        
        for e in eps:
            for m in min_samples:
                print_msg(key+' -> eps: '+str(e)+' min_samples: '+str(m))
    
                dbscan = DBSCANClustering(gesture.data[key], e, m)
                dbscan.cluster()
                print_msg('unique clusters found in linear acceleration x: '+str(np.unique(dbscan.clusters[0]))+' total clusters: '+str(len(np.unique(dbscan.clusters[0]))))
                print_msg('unique clusters found in linear acceleration x has the following cluster density: ')
                for cluster in np.unique(dbscan.clusters[0]):
                    print_msg('cluster: '+str(cluster)+' density: '+str(len(dbscan.clusters[0][dbscan.clusters[0] == cluster])))
                print_msg('unique clusters found in linear acceleration y: '+str(np.unique(dbscan.clusters[1]))+' total clusters: '+str(len(np.unique(dbscan.clusters[1]))))
                print_msg('unique clusters found in linear acceleration y has the following cluster density: ')
                for cluster in np.unique(dbscan.clusters[1]):
                    print_msg('cluster: '+str(cluster)+' density: '+str(len(dbscan.clusters[1][dbscan.clusters[1] == cluster])))
                print_msg('unique clusters found in linear acceleration z: '+str(np.unique(dbscan.clusters[2]))+' total clusters: '+str(len(np.unique(dbscan.clusters[2]))))
                print_msg('unique clusters found in linear acceleration z has the follwing cluster density: ')
                for cluster in np.unique(dbscan.clusters[2]):
                    print_msg('cluster: '+str(cluster)+' density: '+str(len(dbscan.clusters[2][dbscan.clusters[2] == cluster])))
                print_msg('unique clusters found in absolute acceleration: '+str(np.unique(dbscan.clusters[3]))+' total clusters: '+str(len(np.unique(dbscan.clusters[3]))))
                print_msg('unique clusters found in absolute acceleration has the follwing cluster density: ')
                for cluster in np.unique(dbscan.clusters[3]):
                    print_msg('cluster: '+str(cluster)+' density: '+str(len(dbscan.clusters[3][dbscan.clusters[3] == cluster])))
                # visualize a particular key in [gesture].data using DBSCAN clustering algorithm with different values of eps and min_samples
                if key == False: # set to Key to visualize the data for each key in [gesture].data and False if you want to skip visualization for faster processing.
                    dbscan.visualize_with_DataVisualization_2d(key+' '+str(e)+' '+str(m), include_original_data=True, sns=True, save=True, file_path_name=gesture.name+'/'+key+'_eps_'+str(e)+'_min_samples_'+str(m)+'.png', visualize_in_terminal=True)

                best_values.append({
                    'key': key, 'eps': e, 'min_samples': m,
                    'unique_clusters_linear_acceleration_x': len(np.unique(dbscan.clusters[0])),
                    'unique_clusters_linear_acceleration_y': len(np.unique(dbscan.clusters[1])),
                    'unique_clusters_linear_acceleration_z': len(np.unique(dbscan.clusters[2])),
                    'unique_clusters_absolute_acceleration': len(np.unique(dbscan.clusters[3])),
                    'cluster_density_linear_acceleration_x': [len(dbscan.clusters[0][dbscan.clusters[0] == cluster]) for cluster in np.unique(dbscan.clusters[0])],
                    'cluster_density_linear_acceleration_y': [len(dbscan.clusters[1][dbscan.clusters[1] == cluster]) for cluster in np.unique(dbscan.clusters[1])],
                    'cluster_density_linear_acceleration_z': [len(dbscan.clusters[2][dbscan.clusters[2] == cluster]) for cluster in np.unique(dbscan.clusters[2])],
                    'cluster_density_absolute_acceleration': [len(dbscan.clusters[3][dbscan.clusters[3] == cluster]) for cluster in np.unique(dbscan.clusters[3])]
                })
                
    gesture.best_values_df = pd.DataFrame(best_values)
    print_msg('gesture.best_value_df: '+str(gesture.best_values_df))
    
    # print gesture.best_values_df to a xlsx file
    gesture.best_values_df.to_excel('best_values_df.xlsx', index=False)
    gesture.tested_eps = eps
    gesture.tested_min_samples = min_samples

In [None]:
for gesture in [data_circles, data_come_here, data_go_away, data_wave]:
# for gesture in [data_wave]:
    keys = gesture.best_values_df['key'].unique()

    # -----------------Data Visualization of eps and min_samples-----------------
    if False: # set to True to visualize the data
        # create a heatmap with index as eps and columns as min_samples and values as the number of unique clusters found in linear acceleration x, linear acceleration y, linear acceleration z, and absolute acceleration for each key in best_values_df.
        dv = DataVisualization(gesture.best_values_df['eps'])
        
        for key in keys:
            # create a heatmap for each key in gesture.best_values_df with index as eps and columns as min_samples and values as the number of unique clusters found in linear acceleration x, linear acceleration y, linear acceleration z, and absolute acceleration. make 4 seperate heatmaps for each acceleration in a grid of 2 x 2. use sns.heatmap to create the heatmap
            dv.switch_to_seaborn(True)
            dv.set_grid_params(2, 2, 20, 15, key)
            sns.heatmap(gesture.best_values_df[gesture.best_values_df['key'] == key].pivot(index='eps', columns='min_samples', values='unique_clusters_linear_acceleration_x'), ax=dv.ax[0, 0], cmap='coolwarm')
            dv.ax[0, 0].set_title('unique_clusters_linear_acceleration_x')
            sns.heatmap(gesture.best_values_df[gesture.best_values_df['key'] == key].pivot(index='eps', columns='min_samples', values='unique_clusters_linear_acceleration_y'), ax=dv.ax[0, 1], cmap='coolwarm')
            dv.ax[0, 1].set_title('unique_clusters_linear_acceleration_y')
            sns.heatmap(gesture.best_values_df[gesture.best_values_df['key'] == key].pivot(index='eps', columns='min_samples', values='unique_clusters_linear_acceleration_z'), ax=dv.ax[1, 0], cmap='coolwarm')
            dv.ax[1, 0].set_title('unique_clusters_linear_acceleration_z')
            sns.heatmap(gesture.best_values_df[gesture.best_values_df['key'] == key].pivot(index='eps', columns='min_samples', values='unique_clusters_absolute_acceleration'), ax=dv.ax[1, 1], cmap='coolwarm')
            dv.ax[1, 1].set_title('unique_clusters_absolute_acceleration')
            dv.show_plot()
            dv.switch_to_seaborn(False)
    # -----------------Data Visualization of eps and min_samples-----------------

    # -----------------Selecting the best values of eps and min_samples-----------------
    # select the best values of eps and min_samples based on the number of unique clusters found in linear acceleration x, linear acceleration y, linear acceleration z, and absolute acceleration for each key in gesture.best_values_df.

    gesture.aimed_unique_clusters = 3
    
    # -----------------Selecting the best values of eps and min_samples----------------

    # testing calculating the best values of eps and min_samples together.
    min_samples_and_eps_mean_unique_clusters = []
    for e in eps:
        for m in min_samples:
            mean_unique_clusters = []
            for key in keys:
                mean_unique_clusters.append({
                    'key': key,
                    'mean_unique_clusters_linear_acceleration_x': gesture.best_values_df[(gesture.best_values_df['key'] == key) & (gesture.best_values_df['eps'] == e) & (gesture.best_values_df['min_samples'] == m)]['unique_clusters_linear_acceleration_x'].mean(),
                    'mean_unique_clusters_linear_acceleration_y': gesture.best_values_df[(gesture.best_values_df['key'] == key) & (gesture.best_values_df['eps'] == e) & (gesture.best_values_df['min_samples'] == m)]['unique_clusters_linear_acceleration_y'].mean(),
                    'mean_unique_clusters_linear_acceleration_z': gesture.best_values_df[(gesture.best_values_df['key'] == key) & (gesture.best_values_df['eps'] == e) & (gesture.best_values_df['min_samples'] == m)]['unique_clusters_linear_acceleration_z'].mean(),
                    'mean_unique_clusters_absolute_acceleration': gesture.best_values_df[(gesture.best_values_df['key'] == key) & (gesture.best_values_df['eps'] == e) & (gesture.best_values_df['min_samples'] == m)]['unique_clusters_absolute_acceleration'].mean()
                })
            # mean is calculated a second time to get the mean of the mean of the unique clusters for each key.
            min_samples_and_eps_mean_unique_clusters.append({
                'eps': e,
                'min_samples': m,
                'mean_unique_clusters_linear_acceleration_x': pd.DataFrame(mean_unique_clusters)['mean_unique_clusters_linear_acceleration_x'].mean(),
                'mean_unique_clusters_linear_acceleration_y': pd.DataFrame(mean_unique_clusters)['mean_unique_clusters_linear_acceleration_y'].mean(),
                'mean_unique_clusters_linear_acceleration_z': pd.DataFrame(mean_unique_clusters)['mean_unique_clusters_linear_acceleration_z'].mean(),
                'mean_unique_clusters_absolute_acceleration': pd.DataFrame(mean_unique_clusters)['mean_unique_clusters_absolute_acceleration'].mean()
            })
            
    gesture.min_samples_and_eps_mean_unique_clusters_df = pd.DataFrame(min_samples_and_eps_mean_unique_clusters)
    # print_msg('gesture.min_samples_and_eps_mean_unique_clusters_df: '+str(gesture.min_samples_and_eps_mean_unique_clusters_df))

    # select the best eps and min_samples for linear acceleration x
    gesture.best_eps_linear_acceleration_x = gesture.min_samples_and_eps_mean_unique_clusters_df.iloc[(gesture.min_samples_and_eps_mean_unique_clusters_df['mean_unique_clusters_linear_acceleration_x'] - gesture.aimed_unique_clusters).abs().argsort()[:1]]['eps'].values[0]
    gesture.best_min_samples_linear_acceleration_x = gesture.min_samples_and_eps_mean_unique_clusters_df.iloc[(gesture.min_samples_and_eps_mean_unique_clusters_df['mean_unique_clusters_linear_acceleration_x'] - gesture.aimed_unique_clusters).abs().argsort()[:1]]['min_samples'].values[0]
    print_msg(gesture.name+' -> best_eps_linear_acceleration_x.min_samples_and_eps_mean_unique_clusters_df: '+str(gesture.best_eps_linear_acceleration_x))
    print_msg(gesture.name+' -> best_min_samples_linear_acceleration_x: '+str(gesture.best_min_samples_linear_acceleration_x))

    # select the best eps and min_samples for linear acceleration y
    gesture.best_eps_linear_acceleration_y = gesture.min_samples_and_eps_mean_unique_clusters_df.iloc[(gesture.min_samples_and_eps_mean_unique_clusters_df['mean_unique_clusters_linear_acceleration_y'] - gesture.aimed_unique_clusters).abs().argsort()[:1]]['eps'].values[0]
    gesture.best_min_samples_linear_acceleration_y = gesture.min_samples_and_eps_mean_unique_clusters_df.iloc[(gesture.min_samples_and_eps_mean_unique_clusters_df['mean_unique_clusters_linear_acceleration_y'] - gesture.aimed_unique_clusters).abs().argsort()[:1]]['min_samples'].values[0]
    print_msg(gesture.name+' -> best_eps_linear_acceleration_y: '+str(gesture.best_eps_linear_acceleration_y))
    print_msg(gesture.name+' -> best_min_samples_linear_acceleration_y: '+str(gesture.best_min_samples_linear_acceleration_y))

    # select the best eps and min_samples for linear acceleration z
    gesture.best_eps_linear_acceleration_z = gesture.min_samples_and_eps_mean_unique_clusters_df.iloc[(gesture.min_samples_and_eps_mean_unique_clusters_df['mean_unique_clusters_linear_acceleration_z'] - gesture.aimed_unique_clusters).abs().argsort()[:1]]['eps'].values[0]
    gesture.best_min_samples_linear_acceleration_z = gesture.min_samples_and_eps_mean_unique_clusters_df.iloc[(gesture.min_samples_and_eps_mean_unique_clusters_df['mean_unique_clusters_linear_acceleration_z'] - gesture.aimed_unique_clusters).abs().argsort()[:1]]['min_samples'].values[0]
    print_msg(gesture.name+' -> best_eps_linear_acceleration_z: '+str(gesture.best_eps_linear_acceleration_z))
    print_msg(gesture.name+' -> best_min_samples_linear_acceleration_z: '+str(gesture.best_min_samples_linear_acceleration_z))

    # select the best eps and min_samples for absolute acceleration
    gesture.best_eps_absolute_acceleration = gesture.min_samples_and_eps_mean_unique_clusters_df.iloc[(gesture.min_samples_and_eps_mean_unique_clusters_df['mean_unique_clusters_absolute_acceleration'] - gesture.aimed_unique_clusters).abs().argsort()[:1]]['eps'].values[0]
    gesture.best_min_samples_absolute_acceleration = gesture.min_samples_and_eps_mean_unique_clusters_df.iloc[(gesture.min_samples_and_eps_mean_unique_clusters_df['mean_unique_clusters_absolute_acceleration'] - gesture.aimed_unique_clusters).abs().argsort()[:1]]['min_samples'].values[0]
    print_msg(gesture.name+' -> best_eps_absolute_acceleration: '+str(gesture.best_eps_absolute_acceleration))
    print_msg(gesture.name+' -> best_min_samples_absolute_acceleration: '+str(gesture.best_min_samples_absolute_acceleration))


    # -----------------Selecting the best values of eps and min_samples----------------
    

In [None]:
# -----------------labelling the data using the best values of eps and min_samples----------------
# label the data using the best values of eps and min_samples for each key in data_wave.data
for gesture in [data_circles, data_come_here, data_go_away, data_wave]:
# for gesture in [data_wave]:
    keys = gesture.data.keys()
    for key in keys:
        print_sys_msg(key)
        print_msg('columns being worked on are: '+str(gesture.data[key].columns))

        print_msg('best_eps_linear_acceleration_x: '+str(gesture.best_eps_linear_acceleration_x))
        print_msg('best_min_samples_linear_acceleration_x: '+str(gesture.best_min_samples_linear_acceleration_x))
        dbscan = DBSCANClustering(gesture.data[key], gesture.best_eps_linear_acceleration_x, gesture.best_min_samples_linear_acceleration_x)
        dbscan.cluster()
        gesture.data[key]['DBSCAN Clustering linear acceleration x'] = dbscan.clusters[0]

        print_msg('best_eps_linear_acceleration_y: '+str(gesture.best_eps_linear_acceleration_y))
        print_msg('best_min_samples_linear_acceleration_y: '+str(gesture.best_min_samples_linear_acceleration_y))
        dbscan = DBSCANClustering(gesture.data[key], gesture.best_eps_linear_acceleration_y, gesture.best_min_samples_linear_acceleration_y)
        dbscan.cluster()
        gesture.data[key]['DBSCAN Clustering linear acceleration y'] = dbscan.clusters[1]

        print_msg('best_eps_linear_acceleration_z: '+str(gesture.best_eps_linear_acceleration_z))
        print_msg('best_min_samples_linear_acceleration_z: '+str(gesture.best_min_samples_linear_acceleration_z))
        dbscan = DBSCANClustering(gesture.data[key], gesture.best_eps_linear_acceleration_z, gesture.best_min_samples_linear_acceleration_z)
        dbscan.cluster()
        gesture.data[key]['DBSCAN Clustering linear acceleration z'] = dbscan.clusters[2]

        print_msg('best_eps_absolute_acceleration: '+str(gesture.best_eps_absolute_acceleration))
        print_msg('best_min_samples_absolute_acceleration: '+str(gesture.best_min_samples_absolute_acceleration))
        dbscan = DBSCANClustering(gesture.data[key], gesture.best_eps_absolute_acceleration, gesture.best_min_samples_absolute_acceleration)
        dbscan.cluster()
        gesture.data[key]['DBSCAN Clustering absolute acceleration'] = dbscan.clusters[3]
        print_msg('columns after labelling are: '+str(gesture.data[key].columns))

        # combine the appropriate cluster labels for each key in gesture.data
        gesture.clusters = [gesture.data[key]['DBSCAN Clustering linear acceleration x'], gesture.data[key]['DBSCAN Clustering linear acceleration y'], gesture.data[key]['DBSCAN Clustering linear acceleration z'], gesture.data[key]['DBSCAN Clustering absolute acceleration']]
        
        if False: # set to True to visualize the labelled data
            dv = DataVisualization(gesture.data[key]['Time (s)'])
            dv.switch_to_seaborn(True)

            dv.set_grid_params(2, 2, 20, 10, gesture.name+' -> '+key+' -> DBSCAN Clustering')

            dv.add_data(gesture.data[key].iloc[:, 1])
            dv.plot_grid_2d(0, 0, 'scatter', 'Time (s)', gesture.data[key].columns[1], gesture.data[key].columns[1] + ' -> eps: '+str(gesture.best_eps_linear_acceleration_x)+' min_samples: '+str(gesture.best_min_samples_linear_acceleration_x), c=gesture.clusters[0])
            dv.plot_grid_2d(0, 0, 'line' , 'Time (s)', gesture.data[key].columns[1], gesture.data[key].columns[1] + ' -> eps: '+str(gesture.best_eps_linear_acceleration_x)+' min_samples: '+str(gesture.best_min_samples_linear_acceleration_x), c='red', thickness=0.3)

            dv.add_data(gesture.data[key].iloc[:, 2])
            dv.plot_grid_2d(0, 1, 'scatter', 'Time (s)', gesture.data[key].columns[2], gesture.data[key].columns[2] + ' -> eps: '+str(gesture.best_eps_linear_acceleration_y)+' min_samples: '+str(gesture.best_min_samples_linear_acceleration_y), c=gesture.clusters[1])
            dv.plot_grid_2d(0, 1, 'line' , 'Time (s)', gesture.data[key].columns[2], gesture.data[key].columns[2] + ' -> eps: '+str(gesture.best_eps_linear_acceleration_y)+' min_samples: '+str(gesture.best_min_samples_linear_acceleration_y), c='red', thickness=0.3)

            dv.add_data(gesture.data[key].iloc[:, 3])
            dv.plot_grid_2d(1, 0, 'scatter', 'Time (s)', gesture.data[key].columns[3], gesture.data[key].columns[3] + ' -> eps: '+str(gesture.best_eps_linear_acceleration_z)+' min_samples: '+str(gesture.best_min_samples_linear_acceleration_z), c=gesture.clusters[2])
            dv.plot_grid_2d(1, 0, 'line' , 'Time (s)', gesture.data[key].columns[3], gesture.data[key].columns[3] + ' -> eps: '+str(gesture.best_eps_linear_acceleration_z)+' min_samples: '+str(gesture.best_min_samples_linear_acceleration_z), c='red', thickness=0.3)

            dv.add_data(gesture.data[key].iloc[:, 4])
            dv.plot_grid_2d(1, 1, 'scatter', 'Time (s)', gesture.data[key].columns[4], gesture.data[key].columns[4] + ' -> eps: '+str(gesture.best_eps_absolute_acceleration)+' min_samples: '+str(gesture.best_min_samples_absolute_acceleration), c=gesture.clusters[3])
            dv.plot_grid_2d(1, 1, 'line' , 'Time (s)', gesture.data[key].columns[4], gesture.data[key].columns[4] + ' -> eps: '+str(gesture.best_eps_absolute_acceleration)+' min_samples: '+str(gesture.best_min_samples_absolute_acceleration), c='red', thickness=0.3)

            dv.show_plot()

            dv.switch_to_seaborn(False)

# -----------------Deciding sliding window size----------------
# using clusters to decide the sliding window size uniform across all keys and all accelerations
# the sliding window size is decided based on the number of unique clusters found in the data

# marking the vertical regions where the number of clusters are denser on the timeline to decide the sliding window size

gesture.avg_sliding_window_size = []
gesture.avg_sliding_window_step_size = []

for gesture in [data_circles, data_come_here, data_go_away, data_wave]:
    keys = gesture.data.keys()
    for key in keys:
        print_sys_msg(key)
        print_msg('columns being worked on are: '+str(gesture.data[key].columns))
        
        # selecting the clusters to focus on for the sliding window size. After a particular cluster is selected, we can follow it's density on the timeline to decide the sliding window size.
        selected_cluster = 0
        gesture.selected_cluster = selected_cluster
        gesture.selected_cluster_density = gesture.data[key][gesture.data[key]['DBSCAN Clustering linear acceleration x'] == selected_cluster]['DBSCAN Clustering linear acceleration x']
        gesture.selected_cluster_density = gesture.selected_cluster_density.append(gesture.data[key][gesture.data[key]['DBSCAN Clustering linear acceleration y'] == selected_cluster]['DBSCAN Clustering linear acceleration y'])
        gesture.selected_cluster_density = gesture.selected_cluster_density.append(gesture.data[key][gesture.data[key]['DBSCAN Clustering linear acceleration z'] == selected_cluster]['DBSCAN Clustering linear acceleration z'])
        gesture.selected_cluster_density = gesture.selected_cluster_density.append(gesture.data[key][gesture.data[key]['DBSCAN Clustering absolute acceleration'] == selected_cluster]['DBSCAN Clustering absolute acceleration'])
        gesture.selected_cluster_density = gesture.selected_cluster_density.sort_values()
        gesture.selected_cluster_density = gesture.selected_cluster_density.reset_index(drop=True)
        print_msg('gesture.selected_cluster_density: '+str(gesture.selected_cluster_density))
        
        # calculating the sliding window size and step size based on the density of the selected cluster
        gesture.sliding_window_size = 0
        gesture.sliding_window_step_size = 0
        gesture.sliding_window_size_list = []
        gesture.sliding_window_step_size_list = []
        for i in range(1, len(gesture.selected_cluster_density)):
            gesture.sliding_window_size = gesture.selected_cluster_density[i] - gesture.selected_cluster_density[i-1]
            gesture.sliding_window_size_list.append(gesture.sliding_window_size)
            gesture.sliding_window_step_size = gesture.sliding_window_size
            gesture.sliding_window_step_size_list.append(gesture.sliding_window_step_size)
        gesture.avg_sliding_window_size.append(np.mean(gesture.sliding_window_size_list))
        gesture.avg_sliding_window_step_size.append(np.mean(gesture.sliding_window_step_size_list))
        print_msg('gesture.sliding_window_size: '+str(gesture.sliding_window_size))
        print_msg('gesture.sliding_window_step_size: '+str(gesture.sliding_window_step_size))
        print_msg('gesture.avg_sliding_window_size: '+str(gesture.avg_sliding_window_size))
        print_msg('gesture.avg_sliding_window_step_size: '+str(gesture.avg_sliding_window_step_size))

        # visualizing the selected cluster density on the timeline
        dv = DataVisualization(gesture.data[key]['Time (s)'])
        dv.switch_to_seaborn(True)
        dv.add_data(gesture.data[key].iloc[:, 1])
        dv.plot_chart('line', 'red', 0.3)
        dv.add_data(gesture.data[key].iloc[:, 2])
        dv.plot_chart('line', 'green', 0.3)
        dv.add_data(gesture.data[key].iloc[:, 3])
        dv.plot_chart('line', 'blue', 0.3)
        dv.add_data(gesture.data[key].iloc[:, 4])
        dv.plot_chart('line', 'black', 0.3)
        dv.add_data(gesture.selected_cluster_density)
        dv.plot_chart('scatter', 'cyan', 0.8)
        dv.set_x_label('Time (s)')
        dv.set_y_label('Selected Cluster Density')
        dv.set_title('Selected Cluster Density')
        dv.show_plot()
        dv.switch_to_seaborn(False)

        break

        
# -----------------Deciding sliding window size----------------
# -----------------labelling the data using the best values of eps and min_samples----------------



In [None]:
# create a visualization of moving sliding window over dataframe of each key in data.data.keys
# for gesture in [data_circles, data_come_here, data_go_away, data_wave]:
for gesture in [data_wave]:
    gesture.stw = SlidingTimeWindow(data_wave.data[key], 100, 10)
    for key in gesture.data.keys():
        print_sys_msg(key)
        
        gesture.stw.create_frames()
        frames = gesture.stw.get_frames()
        print_sys_msg('Number of frames: '+str(len(frames)))
        # continue
        for frame in frames:
            dv = DataVisualization(frame['Time (s)'])
            dv.add_data(frame['Linear Acceleration y (m/s^2)'])
            # dv.set_grid_params(1, 1, 20, 5, 'Linear Acceleration x (m/s^2) vs Time (s)')
            # dv.plot_grid_1d(1, 'line', 'Time (s)', 'Linear Acceleration x (m/s^2)', 'Linear Acceleration x (m/s^2) vs Time (s)', "red")
            dv.plot_chart('line', 'Time (s)', 'Linear Acceleration x (m/s^2)', 'Linear Acceleration x (m/s^2) vs Time (s)', "red")
            dv.show_plot()
        break


    # topics to cover to run the code are - DBSCAN, Sliding Time Window, Clustering analysis, Normalization,  standardization, test-train split.

In [None]:
# for data_wave use cluster analysis to divide the data in each data key into two clusters based on the distance between the points in the data. Essentially, we are trying to distinguish between low change in accelaration and high change in accelaration to identify and lable the sections as gestures and non-gesture sections.

for key in data_wave.data.keys():
    print_sys_msg(key)
    # print type of data_wave.data[key]
    print_sys_msg(data_wave.data[key].dtypes)

