# Project Name: Human Activity Recognition (`HAR`) utilizing Long short-term memory (`LSTM`) for `PAMAP2`  & `OPPORTUNITY` Dataset

## 1. INTRODUCTION:

### 1.1. Introduction
Nowadays deep learning and machine learning have become an important field that is making a significant contribution to sensor-based human activity recognition. Wearables are an early part of understanding the richness of human activity recognition. It has a wide range of applications in many fields such as intelligent systems, ambient assisted living, security surveillance, the manufacturing industry, sports support, social science, and surveying systems. The primary motive for these applications is the ability to automatically and precisely detect human activities from often small sensors embedded in wearables.
Many approaches have been developed to solve the recognition problem and most machine learning algorithms and deep learning algorithms have been used in this field and evaluated through experiments. Wearables capture a person's activity dynamics by recording continuous measurements over time through different sensor channels and generating multi-channel time-series data streams. To analyze the recognition and classification of human physical activities (e.g. Walking, running, drinking, etc.) we propose that we implement different machine learning algorithms (Like Support Vector Machine, K-nearest Neighbor) and deep learning algorithms (such as Convolutional Neural Networks, LSTM) on 3 datasets which are OPPORTUNITY, UCI HAR, and PAMAP-2. With the help of these datasets our motive of comparing the accuracy and performance of the machine learning algorithm and deep learning algorithm in recognizing human daily physical activities.   
There are some existing papers that work on human activity recognition individually with different datasets. Since we want to do a comparative analysis, we don't have a data collection problem, we will take data for our analysis from different online sites, and we will train and test the data. After that, we will use the data with different machine learning and deep learning algorithms to find out the performance or accuracy, and then we will compare which one gives better performance. 
In our study, we found that Daniel Rogen and his teammates used the OPPORTUNITY and Skoda datasets and their algorithms were CNN and LSTM for their model. In their research, they demonstrated the benefits of a deep architecture based on a combination of convolutional and LSTM recurrent layers to perform activity recognition from wearable sensors.
Shaohua Wan and his teammates used a classification method based on a convolutional neural network (CNN), which uses a CNN to extract local features. Finally, CNN, LSTM, BLSTM, MLP, and SVM models are used on the UCI and Pamap2 datasets.

#### 1.2. Motivation
Many studies have been conducted on solving human activity recognition (HAR) problems with wearables, and Knowledge has been greatly improved with end-to-end deep learning paradigms. Human activity recognition (HAR) using wearable sensors is currently the latest topic in Deep neural networks due to its many applications. Our main goal is to compare and analyze the results obtained by running various machine learning and deep learning algorithms with multiple datasets. Inspired by previous HAR research, our main task is to propose a new HAR framework built on multiple datasets and demonstrate its performance realization using a variety of deep learning and machine learning-based algorithms that generalize to wearable sensor datasets. We have seen many papers that have predicted performance or accuracy with deep learning or machine learning but have not analyzed it on a comparative basis so the idea came from there we will implement and analyze both cases.

#### 1.3. Related Works
F. Attal et. al. (2015), suggest a review of different classification techniques used to recognize human activities from wearable inertial sensor data. Three main steps describe the activity recognition process: sensors' placement, data pre-processing and data classification. The HMM classifier is the one that gives the best results among unsupervised classification algorithms. <br>
According to the study of Ordóñez and Roggen (2016), deep convolutional neural networks are suited to automate feature extraction from raw sensor inputs. However, human activities are made of complex sequences of motor movements. The framework can be applied to homogeneous sensor modalities, but can also fuse multimodal sensors to improve performance.<br>
According to A. Murad et. al. (2017), adopting deep learning methods for human activity recognition has been effective in extracting discriminative features from raw input sequences acquired from body-worn sensors. We propose the use of deep recurrent neural networks for building recognition models that are capable of capturing long-range dependencies in variable-length input sequences.<br>
M.M. Hasan et. al. (2018) suggest, human activity recognition has grabbed considerable attention from pattern recognition and human–computer interaction researchers due to its prominent applications such as smart home health care. The proposed approach was compared with traditional expression recognition approaches such as typical multiclass Support Vector Machine (SVM) and Artificial Neural Network (ANN).<br>
Mobile edge computing is serving as a bridge to narrow the gaps between medical staff and patients. S. Wan’s (2020) paper designs a smartphone inertial accelerometer-based architecture for HAR. A real-time human activity classification method based on a convolutional neural network (CNN) is proposed.<br>
Activity recognition can be seen as a machine learning chain with its particular data preprocessing technique. J. Suto et. al. (2020) examine the efficiency of previously used machine learning methods in real time by an Android-based, self-learning, activity recognition application which has been designed for this study.<br>
Deep Stacked Multilayered Perceptron (DS-MLP) has been proposed in F. Rustam et. al. (2020) study for human activity recognition (HAR). This study uses sensor data from two low-cost sensors, gyroscope and accelerometer along with implementation of an Artificial Neural Network (ANN) for HAR.<br>
According to A. Murad et. al. (2021), Human Activity Recognition (HAR) employing inertial motion data has gained considerable momentum in recent years, both in research and industrial applications. The HAR method is evaluated on a public smartphone-based dataset of UCI-HAR through various combinations of sample generation processes and validation protocols.<br>
Many Artificial intelligence-based models are developed for activity recognition; however, these algorithms fail to extract spatial and temporal features. An extensive ablation study is performed over different machine learning and deep learning models to obtain the optimum solution for HAR. A new dataset is generated that is collected from 20 participants using the Kinect V2 sensor and contains 12 different classes of human activities in I. U. Khan’s study (2022).<br>
<br>
<img src=".\Data\Img\report_final.png" alt="Related Works" />

#### 1.4. Research Objective
The main objectives of this study is as follow: <br>
•	...............................

## **2. DATA ANALYSIS:**

#### **2.1. DATA PREPROCESSING:**

In [6]:
# import the necessary libraries
import pandas as pd
from matplotlib import pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
import seaborn as sns
import numpy as np
from scipy import stats
from scipy import integrate
from IPython.display import HTML, display
from scipy.stats import norm
from scipy.stats import t as the
from sklearn import svm
from sklearn.metrics import classification_report, accuracy_score, precision_score, recall_score, f1_score
from sklearn import tree
%matplotlib inline

pd.set_option('display.max_rows', 20)
pd.set_option('display.max_columns', 70)

<img src=".\Data\Img\data_format.PNG" alt="Data Format" />

In [7]:
# Load data
list_of_files = ['PAMAP2_Dataset/Protocol/subject101.dat',
                 'PAMAP2_Dataset/Protocol/subject102.dat',
                 'PAMAP2_Dataset/Protocol/subject103.dat',
                 'PAMAP2_Dataset/Protocol/subject104.dat',
                 'PAMAP2_Dataset/Protocol/subject105.dat',
                 'PAMAP2_Dataset/Protocol/subject106.dat',
                 'PAMAP2_Dataset/Protocol/subject107.dat',
                 'PAMAP2_Dataset/Protocol/subject108.dat',
                 'PAMAP2_Dataset/Protocol/subject109.dat' ]

subjectID = [1,2,3,4,5,6,7,8,9]

activityIDdict = {0: 'transient',
              1: 'lying',
              2: 'sitting',
              3: 'standing',
              4: 'walking',
              5: 'running',
              6: 'cycling',
              7: 'Nordic_walking',
              9: 'watching_TV',
              10: 'computer_work',
              11: 'car driving',
              12: 'ascending_stairs',
              13: 'descending_stairs',
              16: 'vacuum_cleaning',
              17: 'ironing',
              18: 'folding_laundry',
              19: 'house_cleaning',
              20: 'playing_soccer',
              24: 'rope_jumping' }

colNames = ["timestamp", "activityID","heartrate"]

IMUhand = ['handTemperature', 
           'handAcc16_1', 'handAcc16_2', 'handAcc16_3', 
           'handAcc6_1', 'handAcc6_2', 'handAcc6_3', 
           'handGyro1', 'handGyro2', 'handGyro3', 
           'handMagne1', 'handMagne2', 'handMagne3',
           'handOrientation1', 'handOrientation2', 'handOrientation3', 'handOrientation4']

IMUchest = ['chestTemperature', 
           'chestAcc16_1', 'chestAcc16_2', 'chestAcc16_3', 
           'chestAcc6_1', 'chestAcc6_2', 'chestAcc6_3', 
           'chestGyro1', 'chestGyro2', 'chestGyro3', 
           'chestMagne1', 'chestMagne2', 'chestMagne3',
           'chestOrientation1', 'chestOrientation2', 'chestOrientation3', 'chestOrientation4']

IMUankle = ['ankleTemperature', 
           'ankleAcc16_1', 'ankleAcc16_2', 'ankleAcc16_3', 
           'ankleAcc6_1', 'ankleAcc6_2', 'ankleAcc6_3', 
           'ankleGyro1', 'ankleGyro2', 'ankleGyro3', 
           'ankleMagne1', 'ankleMagne2', 'ankleMagne3',
           'ankleOrientation1', 'ankleOrientation2', 'ankleOrientation3', 'ankleOrientation4']

columns = colNames + IMUhand + IMUchest + IMUankle  #all columns in one list

len(columns)

54

In [8]:
dataCollection = pd.DataFrame()
for file in list_of_files:
    procData = pd.read_table(file, header=None, sep='\s+')
    procData.columns = columns
    procData['subject_id'] = int(file[-5])
    dataCollection = pd.concat([dataCollection, procData], ignore_index=True)

dataCollection.reset_index(drop=True, inplace=True)
dataCollection.head()

FileNotFoundError: [Errno 2] No such file or directory: 'PAMAP2_Dataset/Protocol/subject101.dat'

In [None]:
dataCollection.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2872533 entries, 0 to 2872532
Data columns (total 55 columns):
 #   Column             Dtype  
---  ------             -----  
 0   timestamp          float64
 1   activityID         int64  
 2   heartrate          float64
 3   handTemperature    float64
 4   handAcc16_1        float64
 5   handAcc16_2        float64
 6   handAcc16_3        float64
 7   handAcc6_1         float64
 8   handAcc6_2         float64
 9   handAcc6_3         float64
 10  handGyro1          float64
 11  handGyro2          float64
 12  handGyro3          float64
 13  handMagne1         float64
 14  handMagne2         float64
 15  handMagne3         float64
 16  handOrientation1   float64
 17  handOrientation2   float64
 18  handOrientation3   float64
 19  handOrientation4   float64
 20  chestTemperature   float64
 21  chestAcc16_1       float64
 22  chestAcc16_2       float64
 23  chestAcc16_3       float64
 24  chestAcc6_1        float64
 25  chestAcc6_2       

In [None]:
# Count the number of missing values in each column
dataCollection.isnull().sum()

timestamp                  0
activityID                 0
heartrate            2610265
handTemperature        13141
handAcc16_1            13141
                      ...   
ankleOrientation1      11749
ankleOrientation2      11749
ankleOrientation3      11749
ankleOrientation4      11749
subject_id                 0
Length: 55, dtype: int64

#### Impute missing values

In [None]:
# import the SimpleImputer class
from sklearn.impute import SimpleImputer

# create an instance of the SimpleImputer class
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')

# select the columns you want to impute
columns_to_impute = dataCollection.columns[dataCollection.isnull().any()]

# fit the imputer to the data
imputer.fit(dataCollection[columns_to_impute])

# transform the data
dataCollection[columns_to_impute] = imputer.transform(dataCollection[columns_to_impute])

NameError: name 'np' is not defined

In [None]:
# reset the index
dataCollection.reset_index(drop = True, inplace = True)
dataCollection.sample(10)

NameError: name 'dataCollection' is not defined

In [3]:
# Count the number of missing values in each column
dataCollection.isnull().sum()

NameError: name 'dataCollection' is not defined

#### **2.2. Exploratory Data Analysis (EDA):**

Working on it.

#### **2.3. LSTM model:**

In [41]:
import pandas as pd
import numpy as np
from tensorflow import keras
from keras.layers import Dense
from keras.models import Sequential, load_model
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
import math
from tensorflow.keras.preprocessing.sequence import pad_sequences

# set the number of time steps
n_steps = 128

# set the number of features
n_features = len(dataCollection.columns) - 1

# split the data into input and output
X = dataCollection.iloc[:, :-1].values
y = dataCollection.iloc[:, -1].values

# one-hot encode the output variable
from sklearn.preprocessing import LabelEncoder, OneHotEncoder

# encode labels
le = LabelEncoder()
y = le.fit_transform(y)

# convert labels to one-hot encoding
ohe = OneHotEncoder(sparse=False)
y = y.reshape(len(y), 1)
y = ohe.fit_transform(y)

# split the data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# pad the sequences
X_train = pad_sequences(X_train, maxlen=n_steps, padding='pre', truncating='pre', value=0)
X_test = pad_sequences(X_test, maxlen=n_steps, padding='pre', truncating='pre', value=0)

# define the model architecture
model = Sequential()
model.add(LSTM(64, activation='relu', input_shape=(n_steps, n_features)))
model.add(Dropout(0.2))
model.add(Dense(32, activation='relu'))
model.add(Dense(y_train.shape[1], activation='softmax'))

# compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [42]:
# train the model
history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))


ValueError: Error when checking input: expected lstm_3_input to have 3 dimensions, but got array with shape (2298026, 128)