 <div  style="color:#303030;font-family:'arial blACK', sans-serif,monospace; text-align: center; padding: 50px 0; vertical-align:middle;" > <img src="https://github.com/PIA-Group/ScientIST-notebooks/blob/master/_Resources/Images/Lightbulb.png?raw=true" style=" background:linear-gradient(to right,#FDC86E,#fbb144);border-radius:10px;width:150px;text-align:left; margin-left:10%"  /> <span style="position:relative; bottom:70px; margin-left:5%;font-size:170%;">    Classification of Human Activity Data </span> </div>

## <span style="color:#fbb144;"> Keywords: </span>

```Supervised Learning```,```Unsupervised Learning```, ```Accelerometry Data```

# I. Introduction
<br>
<div style="width:100%; background:linear-gradient(to right,#FDC86E,#fbb144);font-family:'arial black',monospace; text-align: center; padding: 7px 0; border-radius: 5px 50px;margin-top:-15px" >  </div>


In this work, we propose a framework for the automatic classification of 4 different human activity movements (right arm elevation at 90º and 180º on the coronal and sagittal planes) using accelerometer data acquired from a smarthphone hold on the hand. We do so by applying supervised and unsupervised machine learning algorithms.

<img src="https://github.com/PIA-Group/ScientIST-notebooks/blob/master/_Resources/Images/E.Classification_IMG/e003/arm-raise-new.png?raw=true" alt="arm-raise" border="0"/> 

## <div style="color:#fbb144;"> 1. Objectives</div>
* Learn to segment motion signals
* Extract meaningful information from accelerometer
* Perform activity recognition through simple clustering

# II. Experimental
<br>
<div style="width:100%; background:linear-gradient(to right,#FDC86E,#fbb144);font-family:'arial black',monospace; text-align: center; padding: 7px 0; border-radius: 5px 50px;margin-top:-15px" >  </div>


## <div style="color:#fbb144;">  1. Requirements</div>


In this section, the libraries required should be installed, using the command:

In [2]:
%matplotlib notebook

!pip install sklearn >/dev/null 2>&1
!pip install scipy >/dev/null 2>&1
!pip install biosppy >/dev/null 2>&1

and imported:

In [None]:
# NumPy is the fundamental package for scientific computing with Python. 
import numpy as np 
# Matplotlib is a Python 2D plotting library
import matplotlib.pyplot as plt
# to help dealing with directories
import os 
# to manage tables
import pandas as pd 
# to create dictionaries in alphanetical order
from collections import OrderedDict 
# metrics to apply to time series
import scipy 
# to load and save files keeping the original format
import pickle 

## <div style="color:#fbb144;">  2. Data</div>


## <div style="color:#fbb144;">  2.1. Load the data </div>
The data for this experiment was previously aquired by 1 user performing 180º and 90º arm movements on the coronal and transversal planes. Each activity was saved in a different csv file, which can be read as a dictionary. In order to speed the next steps, all activities will be joined together in the same file. If you prefer, this new file can be saved in the format "pickle".

### <div style="color:#fbb144;">2.1.1. Load csv files and create dictionary </div>

Csv files can be easily loaded with pandas function "read_csv", which allows to keep the original table structure since it is saved in a DataFrame. To join all DataFrames together, the dictionary type is also a simple way to keep the original structure intact. Each dataframe is saved in the dictionary using the file name as key. For the following steps to work, correct each file names to include one of the following "C 180", "C 90", "S 180" and "S 90".

In [None]:
## GET CSV FILES DIRECTORY

#place your file directory here
directory = '' 
files_dir = os.listdir(directory)

# list only csv files
csv_files = [file for file in files_dir if file.endswith('.csv')] 


## LOAD FILES AND SAVE IN A DICTIONARY
data = 

<div style="background:#48ba57;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; ">  Note </span> <br>
  <div style="background:#9de3a6;font-size:12px"> 
    List comprehensions are a compact way of writing loops. In the previous cell we list the file names and create the dictionary with all files packed together using list comprehensions. Find out more here: https://docs.python.org/3/tutorial/datastructures.html   
</div>

### <div style="color:#fbb144;">2.1.2. Save and load pickle files </div>
The data for this experiment was previously aquired by 1 user performing 180º and 90º arm movements on the coronal and transversal planes. Each activity was saved in a different csv file, which can be read as a dictionary. In order to speed the next steps, all activities will be joined together in the same file. If you prefer, this new file can be saved in the format "pickle".

In [None]:

new_directory = ''

pickle.dump(data, open(new_directory, 'wb'))

#pickle.load(open(new_directory, 'rb'))

In [None]:
data[csv_files[0]]['AccX'].values

In [None]:
activities_dir = list(data.keys()) #activities names (this should be the same as csv_files)

## <div style="color:#fbb144;">  2.2. Data Reconstruction</div>


<div style="background:#fbb144;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; ">  Warning! </span> <br>
  <div style="background:#ffd08a;font-size:12px"> 
    The data acquired through Google Science Journal was not perfectly acquired, thus it is necessary to compute missing values. For that, we first remove all 'NaN' and then the signal is reconstructed based on the length and the existing values. On the following plots we show the comparison between the original data and reconstructed data.   
</div>

In [None]:
colors = ['#00bfc2','#5756d6','#fada5e', '#62d321', '#fe9b29']

from scipy.signal import resample
for act in activities_dir:
    a = 0
    
    plt.figure(figsize=(15,5))
    for axis in data[act].columns:
        a+=1
        plt.suptitle(act)
        plt.subplot(2,5,a+5)
        # copy signal
        original_sig = data[act][axis] 
        
        data[act][axis] = resample(data[act].dropna(subset=[axis])[axis].values, len(data[act][axis])) # resample 
        plt.plot(data[act][axis], color = colors[0], label='Reconstructed')
        plt.legend()
        plt.subplot(2,5,a)
        plt.title(axis)
        plt.plot(original_sig, color = colors[1], label='Original')
        plt.legend()
    plt.show()
        

<div style="background:#48ba57;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; ">  Note </span> <br>
  <div style="background:#9de3a6;font-size:12px"> 
    Since we replace the original signal by the reconstructed, you need to play the previous cell again, to read the original signal and do its reconstruction. Otherwise, both plots will show the same signal.
</div>

## <div style="color:#fbb144;">  2.3. Data Segmentation </div>

Only use axis 'AccX', 'AccY' and 'AccZ'. These signals will be segmented through 'AccY'. The activity starts when the axis Y reaches its minimum value and it stops when the maximum of each cycle is reached. In the plot below the activitied are segmented between the vertical purple lines.





<div style="background:#fe9b29;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; ">  Caution! </span> <br>
  <div style="background:#ffdab0;font-size:12px"> 
    The following code might need some adjustments in the distance parameter, in order to find all peaks. If this value is too large, some peaks will not be found and to many peaks will be found if this parameter is to small. Check in the plot if all peaks are correctly found.
</div>

In [None]:
segment_idx = OrderedDict()
i = 1
plt.figure(figsize=(20,10))

for act in activities_dir:
    x = data[act]['AccY']
    max_points = scipy.signal.find_peaks(x, height=np.mean(x), threshold=None, distance=100, prominence=None, 
                                         width=None, wlen=None, rel_height=0.5, plateau_size=None)[0]
    min_points = scipy.signal.find_peaks(x*(-1), height=np.mean(-x), threshold=None, distance=30, prominence=None, 
                                         width=None, wlen=None, rel_height=0.5, plateau_size=None)[0]
    plt.subplot(2,2,i)
    i += 1
    plt.plot(data[act]['AccY'], color=colors[0])
    plt.title(act)
    mp_new = []
    for mx in max_points:
        min_point = min_points[np.argmin([abs(mp-mx) for mp in min_points[np.argwhere(min_points<mx)]])]
        mp_new += [min_point]
    print(len(max_points), len(mp_new))
        
    
    segment_idx[act] = np.vstack([mp_new, max_points]).T
    if act == activities_dir[1]:
        segment_idx[act] = segment_idx[act][:-1]
    plt.vlines(segment_idx[act].ravel(), np.min(x), np.max(x), color=colors[1])
plt.show()


In [None]:
segment_idx[act]

In [None]:
segmented_data = OrderedDict()
for act in activities_dir:
    segmented_data[act] = pd.DataFrame()
    for axis in data[act].columns:
        segments = []
        for idx in segment_idx[act]:
        # run segments idx 
            #save here the cropped data
            segments  += [data[act][axis][idx[0]:idx[1]]] 
            
            plt.plot(segments[-1], color=colors[0])
            plt.xlabel('Time (s)', color="#00a0e4")
            plt.ylabel('Amplitude (m/s^2)', color="#00a0e4")
        plt.title(axis)
        plt.show()
        #save here the segments of each activity and axis
        segmented_data[act][axis] = segments 
            

## <div style="color:#fbb144">   3.1. Feature Extraction </div>

In [None]:
segmented_data[act][axis][0]

In [None]:
feats_data = OrderedDict()
feats = ['mean', 'median', 'max', 'var', 'std_dev', 'abs_dev', 'kurtosis', 'skewness']

import biosppy as bs
for act in activities_dir:
    for axis in ['AccX', 'AccY', 'AccZ']:
        feats_val = []
        feats_axis = [axis +'-'+ ft for ft in feats]
        for seg in range(len(segmented_data[act][axis])):
            # get statistical features using biosppy.signals.tools.signal_stats]
            feats_val += [bs.signals.tools.signal_stats(segmented_data[act][axis][seg])[:]] 
        if axis == 'AccX':
            # save as dataframe
            new_row = pd.DataFrame(feats_val, columns=feats_axis) 
        else:
            # add new features to dataframe
            new_row = pd.concat([new_row, pd.DataFrame(feats_val, columns=feats_axis)], axis=1)
        
    feats_data[act] = new_row # save the dataframe here

<div style="background:#48ba57;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; ">  Note </span> <br>
  <div style="background:#9de3a6;font-size:12px"> 
    Even though we have five columns, we chose to use only AccX, AccY and AccZ. While column relative_time is not informative, column DecibelSource has wrong information, since the activities performed are not sound dependent, sound was used as a marker for segmentation. Still it was more reliable to segment through the well defined AccY, than through sound.
</div>

In [None]:
feats_data

In [None]:
print('Features for activity ', activities_dir[1])
feats_data[activities_dir[1]]

## <div style="color:#fbb144;">   3.2. Feature Selection </div>

A set of 30 features might not be a lot for some classification task, however, since we have a very short amount of samples, we could run into overfitting issues. There are several ways to decide which features might be best for a particular problem. An empirical overview (by observing) the behaviour of each feature in the different activities is a good starting point.

In [None]:
for feat in feats_data[act].columns:
    ci = 0
    for activity in feats_data.keys():
        plt.plot(feats_data[activity][feat], label=activity, color=colors[ci])
        ci+=1
    plt.title(feat)
    plt.legend()
    plt.show()

In [None]:
best_features = []
for act in feats_data.keys():
    feats_data[act] = feats_data[act][best_features]

<div style="background:#946db2;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; "> Explore </span> <br>
  <div style="background:#d0b3e6;font-size:12px"> 
    Explore diferent forms of feature selection. For more information go to: https://scikit-learn.org/stable/modules/feature_selection.html
        </div>

In [None]:
feats_data

Create an X and Y joining all information from feats_data. To ease reading, we will replace the activities' names for a simpler expression 'C_180', 'C_90', 'S_180' and 'S_90'.

In [None]:
# stack all features together
X = np.vstack([feats_data[act] for act in feats_data.keys()])
#act_ = #simplify activities labels
act_ = {activities_dir[0]: 'C_180', activities_dir[1]: 'C_90', activities_dir[2]: 'S_180',
       activities_dir[3]: 'S_90'}
Y = np.hstack([[act_[act]]*len(feats_data[act]) for act in feats_data.keys()])
#Y = #stack all labels together using the new simplified names of each class
Y #show Y

<div style="background:#48ba57;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; ">  Note </span> <br>
  <div style="background:#9de3a6;font-size:12px"> 
    Alternatively, the dataset could be separated into a training (learn the model), validation (tunn hyperparemeters) and test set (evaluate the model).
</div>

<div style="background:#946db2;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; "> Explore </span> <br>
  <div style="background:#d0b3e6;font-size:12px"> 
    Explore diferent forms of train-test separation and cross-validation. For more information go to: https://scikit-learn.org/stable/modules/cross_validation.html#cross-validation
        </div>

<div style="background:#946db2;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; "> Explore </span> <br>
  <div style="background:#d0b3e6;font-size:12px"> 
    Explore how different test sizes impact the classification results. 
</div>

<div style="background:#fbb144;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; ">  Warning! </span> <br>
  <div style="background:#ffd08a;font-size:12px"> 
    When the data is highly imbalanced, use the stratify hyperparemeter set to the ground-truth labels so the data is proportionally distributed per class in the training and test sets.
</div>


## <div style="color:#fbb144">   4.1.   K-Means </div>

The proximity between groups with the same degree compared to the distance between the two different degrees hampers the separation of the 4 clusters at the same time. Thus it is more reliable to separate firstly in two big clusters and then separate again inside each cluster individually.

### <div style="color:#fbb144">   4.1.1.   Clustering using K-means </div>
The KMeans algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as the inertia or within-cluster sum-of-squares (see below). This algorithm requires the number of clusters to be specified.

Create a KMeans cluster algorithm by calling <b>KMeans()</b> and hosting it in the <b>kmeans</b> variable. Define the number of clusters.

Required Hyperparameters:

* <b>n_clusters</b> int, default=8
The number of clusters to form as well as the number of centroids to generate.

For more information regarding the method go to https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

In [None]:
# import kmeans package
from sklearn.cluster import KMeans 
# call KMeans cluster method, defining the number of clusters
kmeans = 

Compute cluster centers and predict cluster index for each sample by calling  <b>fit_predict(self, X[, y, sample_weight])</b>

In [None]:
y_pred =

In [None]:
y_pred

In [None]:
# plot result
for pred in range(len(X)):
    plt.scatter(X[pred][1], X[pred][3], color=colors[y_pred[pred]], label=Y[pred])
plt.legend(bbox_to_anchor=(1.1, 1), ncol=4)
plt.show()

In [None]:
i=1
plt.figure(figsize=(15,15))

plt.subplots_adjust(wspace=0.4, hspace=0.3)

for bf in range(len(best_features)):
    for bi in range(len(best_features)):
        if bf != bi:
            plt.subplot(5, 4, i)
            for pred in range(len(X)):
                plt.scatter(X[pred][bf], X[pred][bi], color=colors[y_pred[pred]], label=Y[pred])
                plt.ylabel(best_features[bf])
                plt.title(best_features[bi])

            i+=1

plt.legend(bbox_to_anchor=(1.1, 1), ncol=4)
plt.show()

In [None]:
from sklearn.metrics import v_measure_score 

print('Score: ', v_measure_score(Y, y_pred))

<div style="background:#946db2;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; "> Explore </span> <br>
  <div style="background:#d0b3e6;font-size:12px"> 
    To know which features might be more appropriate, we can see which are linearly separable in the following plots. Change list best_features to those who seem to achieve the best separation.    
</div>

<div style="background:#946db2;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; "> Explore </span> <br>
  <div style="background:#d0b3e6;font-size:12px"> 
    Explore defining some of Kmeans hyperparameters and the inpact on the clustering results. For more information go to: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
</div>

<div style="background:#946db2;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; "> Explore </span> <br>
  <div style="background:#d0b3e6;font-size:12px"> 
    Explore different clustering algorithms. For more information go to: https://scikit-learn.org/stable/modules/clustering.html
</div>

## <div style="color:#fbb144;">   5 Supervised Learning </div>


## <div style="color:#fbb144;"> Train-test split
</div>

Split arrays or matrices into random train and test subsets using the method <b>train_test_split</b> https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

Input Paremeters:
* <b>arrays</b> sequence of indexables with same length / shape[0]
Allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas dataframes.


* <b>test_size</b> float or int, default=None


In [None]:
# import package
from sklearn.model_selection import train_test_split  

# train_test_split of X and Y with test size 30%
X_train, X_test, y_train, y_test = 

## <div style="color:#fbb144;">   5.1.  Decision Tree </div>

<b>DecisionTreeClassifier</b> is a class capable of performing multi-class classification on a dataset.

As with other classifiers, DecisionTreeClassifier takes as input two arrays: an array X, sparse or dense, of size [n_samples, n_features] holding the training samples, and an array Y of integer values, size [n_samples], holding the class labels for the training samples:

### <div style="color:#fbb144;">   5.1.1. Create Decision Tree </div>

Create a Decision Tree classifier by calling <b>tree.DecisionTreeClassifier()</b> and hosting it in the <b>DT</b> variable.

For more information regarding the classifier go to https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier

In [None]:
# import package
from sklearn import tree 

# call decision tree classifier
clf = 

### <div style="color:#fbb144;">   5.1.2. Train Decision Tree </div>

Build a decision tree classifier from the training set (X_train, y_train) by calling the method <b>fit(self, X, y[, sample_weight, …])</b> on the decision tree classifier.

In [None]:
# train the classifier using fit(X_train, y_train)
clf = 

After being fitted, the model can then be used to predict the class of samples.

Use the method <b>predict(self, X[, check_input])</b> to do so.

In [None]:
# use the classifier to infer predictions using predict(X_test)  # Predict class or regression value for X_test.

y_predicted = 
print("Number of mislabeled points out of a total %d points : %d" % (X_test.shape[0], (y_test != y_predicted).sum()))

In [None]:
y_predicted

Observe the Decision Tree classifier predictions agains the ground-truth labels.

To do so, use the function <b>classification_report</b> to build a text report showing the main classification metrics.

Input Paremeters:
* <b>y_true</b> 1d array-like, or label indicator array / sparse matrix
Ground truth (correct) target values.

* <b>y_pred</b> 1d array-like, or label indicator array / sparse matrix
Estimated targets as returned by a classifier.

In [None]:
from sklearn.metrics import classification_report

# classification report
#print(classification_report(y_test, y_predicted))

<div style="background:#48ba57;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; ">  Note </span> <br>
  <div style="background:#9de3a6;font-size:12px"> 
    In the case of highly highly imbalanced data, the F1-score, precision and recall per class are more informative than the accuracy score, which does not take into consideration the class imbalance.
</div>



In [None]:
from sklearn.metrics import confusion_matrix
import seaborn as sb

def plot_confusion_matrix(y_true, y_pred, true_labels=None, normalize=True, title=''):
    """

    :param y_true: type(array), contains true labels
    :param y_pred: type(array), contains predicted labels
    :param true_labels: list of unique labels
    :param normalize: boolean
    :return:
    """
    if true_labels is None:
        true_labels = np.unique(y_true)
    cm = confusion_matrix(y_true, y_pred, labels=true_labels)  # TODO
    if normalize:
        cm = np.round(cm / np.sum(cm, axis=1), 2)
    plt.figure(figsize=(10,5))
    ax = plt.subplot(1,1,1)
    ax.set_title(title)
    # annot=True to annotate cells
    sb.heatmap(cm, annot=True, ax=ax, fmt='g', cmap='Blues')  
    # labels, title and ticks
    ax.set_xlabel('Predicted', fontsize=20)
    ax.xaxis.set_label_position('top')
    ax.xaxis.set_ticklabels(true_labels, fontsize=10)
    ax.xaxis.tick_top()
    ax.set_ylabel('True', fontsize=20)
    ax.yaxis.set_ticklabels(true_labels, fontsize=10)

Compute a confusion matrix to evaluate the accuracy of a classification in a table format. 

Do so by calling the <b>plot_confusion_matrix</b> function.

Input parameters:

* <b>y_true</b>: type(array), contains true labels

* <b>y_pred</b>: type(array), contains predicted labels

* <b>true_labels</b>: list of unique labels

* <b>normalize</b>: boolean

In [None]:
# Confusion Matrix
target_names = np.unique(y_test)
plot_confusion_matrix(y_test, y_predicted, true_labels=np.unique(y_test), normalize=True)

<div style="background:#48ba57;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; ">  Note </span> <br>
  <div style="background:#9de3a6;font-size:12px"> 
    The confusion matrix can be an highly informative visualization tool to observe which classes are getting confused, and which classes show high discrimination.
</div>


In [None]:
r = tree.export_text(clf, feature_names=best_features)
print(r)
print(tree.plot_tree(clf, feature_names = best_features, class_names=np.unique(Y), filled=True))

<div style="background:#946db2;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; "> Explore </span> <br>
  <div style="background:#d0b3e6;font-size:12px"> 
    Explore defining some of its hyperparameters and the impact on the classification results. For more information go to: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier</div>

###
The <b>Naive Bayes</b> methods are a set of supervised learning algorithms based on applying Bayes’ theorem with the “naive” assumption of conditional independence between every pair of features given the value of the class variable. 

#### <div style="color:#fbb144">   5.2.1.  Classification using Naive Bayes </div>

We will implement the Gaussian Naive Bayes, which implements the Gaussian Naive Bayes algorithm for classification. The likelihood of the features is assumed to be Gaussian.

naive_bayes.BernoulliNB(*[, alpha, …]) -- Naive Bayes classifier for multivariate Bernoulli models.

naive_bayes.CategoricalNB(*[, alpha, …]) -- Naive Bayes classifier for categorical features

naive_bayes.ComplementNB(*[, alpha, …]) -- The Complement Naive Bayes classifier described in Rennie et al.

naive_bayes.GaussianNB(*[, priors, …]) -- Gaussian Naive Bayes (GaussianNB)

naive_bayes.MultinomialNB(*[, alpha, …]) -- Naive Bayes classifier for multinomial models

In [None]:
# import the Gaussian naive bayes classifier
from sklearn import naive_bayes  

# call the classifier
clf =  
# Fit Gaussian Naive Bayes according to X_train, y_train
clf =  
# Perform classification on an array of test vectors X.
y_predicted = 

print("Number of mislabeled points out of a total %d points : %d" % (X_test.shape[0], (y_test != y_predicted).sum()))

In [None]:
# Print the classification report
# print(classification_report(y_test, y_predicted))

In [None]:
# Plot the Confusion Matrix
target_names = np.unique(y_test)
plot_confusion_matrix(y_test, y_predicted, true_labels=np.unique(y_test), normalize=True)

<div style="background:#946db2;font-family:'arial', monospace; text-align: center; padding: 10px 0; border-radius:10px; width:70%; margin:auto " >
  <span style="font-size:20px;position:relative;color:white; "> Explore </span> <br>
  <div style="background:#d0b3e6;font-size:12px"> 
    Explore diferent forms for the likelihood of the features and change some of its hyperparameters to observe the impact on the classification results. For more information go to: https://scikit-learn.org/stable/modules/naive_bayes.html
        </div>

Compare the results against the previous classifiers:

# III. Explore
<br>
<div style="width:100%; background:linear-gradient(to right,#FDC86E,#fbb144);font-family:'arial black',monospace; text-align: center; padding: 7px 0; border-radius: 5px 50px;margin-top:-15px" >  </div>


## <div style="color:#fbb144;">  1. Final Notes </div>

In this notebook, we performed activity recognition on the data acquired through Google Science Journal through a comprehensive study of different supervised/unsupervised learning classifiers. The experimental results led to high accuracy results, thus paving the way to the development of systems capable of automatically identifying the activity solely based on accelerometer data.

## <div style="color:#fbb144;">  2. Further Reading  </div>
1. Explore how to acquire your own data [OpenSignals](../A.Signal_Acquisition/A001 Open Signals.ipynb)
2. Explore signal-processing techniques to remove data noise [Noise](../C.Signal_Processing) 
3. Explore autoencoders to extract meaninful information from your data [AutoEncoders](../E.Classification/E001 Autoencoders Respiration.ipynb)
4. Explore other notebooks: <br>[ScienceJournal](../A.Signal_Acquisition/ScienceJournal.ipynb) - Acquire signals through the mobile phone <br>
[SignalClassification_using_SL](../E.Classification/E002 Signal Classification Using SL.ipynb)  - Machine Learning in Signal Classification 
[Hierarchical_Clustering] - Hierarchical Clustering of Activities

<div style="height:100px; background:white;border-radius:10px;text-align:center"> 

<a> <img src="https://github.com/PIA-Group/ScientIST-notebooks/blob/master/_Resources/Images/IT.png?raw=true" alt="it" style=" bottom: 0; width:250px;
    display: inline;
    left: 250px;
    position: absolute;"/> </a>
<img src="https://github.com/PIA-Group/ScientIST-notebooks/blob/master/_Resources/Images/IST.png?raw=true"
         alt="alternate text" 
         style="position: relative;   width:250px; float: left;
    position: absolute;
    display: inline;
    bottom: 0;
    right: 100;"/>
</div> 

<div style="width: 100%; ">
<div style="background:linear-gradient(to right,#FDC86E,#fbb144);color:white;font-family:'arial', monospace; text-align: center; padding: 50px 0; border-radius:10px; height:10px; width:100%; float:left " >
<span style="font-size:12px;position:relative; top:-25px">  Please provide us your feedback <span style="font-size:14px;position:relative;COLOR:WHITE"> <a href="https://forms.gle/C8TdLQUAS9r8BNJM8">here</a>.</span></span> 
<br>
<span style="font-size:17px;position:relative; top:-20px">  Suggestions are welcome! </span> 
</div>

```Contributors: Mariana Abreu, Patrícia Bota```