# Human Activity Recognition Using Smartphones

**Abstract:** In this project, we calculate a model by which a smartphone can detect its owner’s activity precisely. For the dataset, 30 people were used to perform 6 different activities. Each of them was wearing a Samsung Galaxy SII on their waist. Using the smartphone’s embedded sensors (the accelerometer and the gyroscope), the user’s speed and acceleration were measured in 3-axial directions. We use the sensor’s data to predict user’s activity. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data.

The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain.

Attribute information
- For each record in the dataset the following is provided:
- Triaxial acceleration from the accelerometer (total acceleration) and the estimated body acceleration.
- Triaxial Angular velocity from the gyroscope.
- A 561-feature vector with time and frequency domain variables.
- Its activity label.
- An identifier of the subject who carried out the experiment.


## Overall Information
Project’s Website: http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones

Data Sources: https://www.kaggle.com/uciml/human-activity-recognition-with-smartphones

**Highest Achieved precision:** 98.8%

**Goal:** In this project we will try to predict human activity (1-Walking, 2-Walking upstairs, 3-Walking downstairs, 4-Sitting, 5-Standing or 6-Laying) by using the smartphone’s sensors. Meaning that by using the following methods, the smartphone can detect what we are doing at the moment.

## Citation

*Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra and Jorge L. Reyes-Ortiz. A Public Domain Dataset for Human Activity Recognition Using Smartphones. 21st European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, ESANN 2013. Bruges, Belgium 24-26 April 2013.*

# Libraries 

In [1]:
# python 3 environment
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O

# plotly Libraris
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.colors import n_colors
from plotly.subplots import make_subplots

# pycaret for classification
from pycaret.classification import *

# Step 1: Create Uniform Data From Data Set

We have two datasets for the training data and its corresponding labels, and two datasets for the test data and its corresponding labels. The following code will combine these datasets for further machine learning procedure.

In [2]:
ROOT = '../Human Activity Recognition Using Smartphones/Data Sources/'
train = pd.read_csv(ROOT+'train.csv')
test = pd.read_csv(ROOT+'test.csv')

In [3]:
data = pd.concat([train,test]); data.head(7)

Unnamed: 0,tBodyAcc-mean()-X,tBodyAcc-mean()-Y,tBodyAcc-mean()-Z,tBodyAcc-std()-X,tBodyAcc-std()-Y,tBodyAcc-std()-Z,tBodyAcc-mad()-X,tBodyAcc-mad()-Y,tBodyAcc-mad()-Z,tBodyAcc-max()-X,...,fBodyBodyGyroJerkMag-kurtosis(),"angle(tBodyAccMean,gravity)","angle(tBodyAccJerkMean),gravityMean)","angle(tBodyGyroMean,gravityMean)","angle(tBodyGyroJerkMean,gravityMean)","angle(X,gravityMean)","angle(Y,gravityMean)","angle(Z,gravityMean)",subject,Activity
0,0.288585,-0.020294,-0.132905,-0.995279,-0.983111,-0.913526,-0.995112,-0.983185,-0.923527,-0.934724,...,-0.710304,-0.112754,0.030400,-0.464761,-0.018446,-0.841247,0.179941,-0.058627,1,STANDING
1,0.278419,-0.016411,-0.123520,-0.998245,-0.975300,-0.960322,-0.998807,-0.974914,-0.957686,-0.943068,...,-0.861499,0.053477,-0.007435,-0.732626,0.703511,-0.844788,0.180289,-0.054317,1,STANDING
2,0.279653,-0.019467,-0.113462,-0.995380,-0.967187,-0.978944,-0.996520,-0.963668,-0.977469,-0.938692,...,-0.760104,-0.118559,0.177899,0.100699,0.808529,-0.848933,0.180637,-0.049118,1,STANDING
3,0.279174,-0.026201,-0.123283,-0.996091,-0.983403,-0.990675,-0.997099,-0.982750,-0.989302,-0.938692,...,-0.482845,-0.036788,-0.012892,0.640011,-0.485366,-0.848649,0.181935,-0.047663,1,STANDING
4,0.276629,-0.016570,-0.115362,-0.998139,-0.980817,-0.990482,-0.998321,-0.979672,-0.990441,-0.942469,...,-0.699205,0.123320,0.122542,0.693578,-0.615971,-0.847865,0.185151,-0.043892,1,STANDING
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2942,0.310155,-0.053391,-0.099109,-0.287866,-0.140589,-0.215088,-0.356083,-0.148775,-0.232057,0.185361,...,-0.750809,-0.337422,0.346295,0.884904,-0.698885,-0.651732,0.274627,0.184784,24,WALKING_UPSTAIRS
2943,0.363385,-0.039214,-0.105915,-0.305388,0.028148,-0.196373,-0.373540,-0.030036,-0.270237,0.185361,...,-0.700274,-0.736701,-0.372889,-0.657421,0.322549,-0.655181,0.273578,0.182412,24,WALKING_UPSTAIRS
2944,0.349966,0.030077,-0.115788,-0.329638,-0.042143,-0.250181,-0.388017,-0.133257,-0.347029,0.007471,...,-0.467179,-0.181560,0.088574,0.696663,0.363139,-0.655357,0.274479,0.181184,24,WALKING_UPSTAIRS
2945,0.237594,0.018467,-0.096499,-0.323114,-0.229775,-0.207574,-0.392380,-0.279610,-0.289477,0.007471,...,-0.617737,0.444558,-0.819188,0.929294,-0.008398,-0.659719,0.264782,0.187563,24,WALKING_UPSTAIRS


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10299 entries, 0 to 2946
Columns: 563 entries, tBodyAcc-mean()-X to Activity
dtypes: float64(561), int64(1), object(1)
memory usage: 44.3+ MB


# Step 2: Preprocessing 

In [5]:
def preprocessing(data):
    for column in data.columns:
        if data.dtypes[column] == 'float64':
            data[column] = data[column].astype('float32')
        if data.dtypes[column] == 'object':
            data[column] = data[column].astype('category')
    return data

In [6]:
data = data.drop('subject', axis=1)
preprocessing(data)
data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10299 entries, 0 to 2946
Columns: 562 entries, tBodyAcc-mean()-X to Activity
dtypes: category(1), float32(561)
memory usage: 22.1 MB


# Step 3: Model 

## 3.1 Setting up Environment
This function initializes the environment in pycaret and creates the transformation pipeline to prepare the data for modeling and deployment. 

In [None]:
setup(data, 
      target='Activity', 
      normalize=True, 
      transformation=True, 
      session_id=707,
      sampling=False, 
      silent=True);

## 3.2 Compare Models¶
This function uses models in the model library and scores them using K-fold Cross Validation. The output prints a score grid with Accuracy, AUC, Recall, Precision, F1, Kappa and MCC (averaged accross folds), determined by fold parameter.

In [8]:
compare_models(whitelist=['lr', 'ridge', 'xgboost']) #catboost is slightly better but require way too much computational time in this context

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
0,Extreme Gradient Boosting,0.9903,0.0,0.9906,0.9904,0.9903,0.9883,0.9883,32.6213
1,Logistic Regression,0.9816,0.0,0.9824,0.9817,0.9815,0.9778,0.9778,2.359
2,Ridge Classifier,0.9784,0.0,0.9792,0.9785,0.9783,0.974,0.974,0.4328


OneVsRestClassifier(estimator=XGBClassifier(base_score=None, booster=None,
                                            colsample_bylevel=None,
                                            colsample_bynode=None,
                                            colsample_bytree=None, gamma=None,
                                            gpu_id=None, importance_type='gain',
                                            interaction_constraints=None,
                                            learning_rate=None,
                                            max_delta_step=None, max_depth=None,
                                            min_child_weight=None, missing=nan,
                                            monotone_constraints=None,
                                            n_estimators=100, n_jobs=-1,
                                            num_parallel_tree=None,
                                            objective='binary:logistic',
                                            ra

## 3.3 Create Model
This function creates a model and scores it using K-fold Cross Validation. 

In [9]:
model = create_model('xgboost', ensemble=True, method='Bagging') #0 is gpu-id

Unnamed: 0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,0.9875,0.0,0.9868,0.9875,0.9875,0.985,0.985
1,0.9875,0.0,0.9883,0.9875,0.9875,0.985,0.985
2,0.9847,0.0,0.985,0.9847,0.9847,0.9816,0.9816
3,0.9861,0.0,0.987,0.9862,0.9861,0.9833,0.9833
4,0.9889,0.0,0.9892,0.989,0.9889,0.9867,0.9867
5,0.9861,0.0,0.9866,0.9861,0.9861,0.9833,0.9833
6,0.9903,0.0,0.9903,0.9904,0.9903,0.9883,0.9883
7,0.9903,0.0,0.9905,0.9905,0.9903,0.9883,0.9884
8,0.9875,0.0,0.9878,0.9876,0.9875,0.985,0.985
9,0.9944,0.0,0.9947,0.9945,0.9944,0.9933,0.9933


## 3.4 Finalize Model
This function fits the estimator onto the complete dataset passed during the setup() stage. The purpose of this function is to prepare for final model deployment after experimentation.

In [10]:
final_model = finalize_model(model)

## 3.5 Model Evaluation 
This function displays a user interface for all of the available plots for a given estimator. It internally uses the plot_model() function.



In [11]:
evaluate_model(final_model)

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Hyperparameters', 'param…