# Human Activity Recognition with Smartphones 

Human Activity Recognition involves predicting human activity based on sensor data. The dataset used in this problem is built from the recordings of 30 study participants performing activities while carrying a waist-mounted smartphone with embedded inertial sensors. The objective is to classify the activities into one of the six activities - WALKING, WALKING UPSTAIRS, WALKING DOWNSTAIRS, SITTING, STANDING, LAYING. 

The dataset can be downloaded from the following Kaggle link: 
https://www.kaggle.com/datasets/uciml/human-activity-recognition-with-smartphones 

After exploring and modifying the data, I use Logistic Regression for prediction. 

In [1]:
import numpy as np 
import pandas as pd 
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA 
from sklearn.preprocessing import StandardScaler 
from sklearn.linear_model import LogisticRegression 
from sklearn.metrics import classification_report

## Data Exploration 

In [2]:
train = pd.read_csv('..\Downloads\train-1.csv')
test = pd.read_csv('..\Downloads\test.csv')

In [3]:
train['Data'] = 'Train'
test['Data'] = 'Test'

In [4]:
#Combining the train and test datasets into one single dataset 
df = pd.concat([train, test]).reset_index(drop = True)

In [5]:
#Checking to find duplicate entries in the dataset 
df.duplicated().sum()

0

In [6]:
#Checking to find null values 
df.isnull().sum()[df.isnull().sum() != 0]

Series([], dtype: int64)

In [7]:
df.Activity.value_counts()

LAYING                1944
STANDING              1906
SITTING               1777
WALKING               1722
WALKING_UPSTAIRS      1544
WALKING_DOWNSTAIRS    1406
Name: Activity, dtype: int64

In [8]:
df.describe()

Unnamed: 0,tBodyAcc-mean()-X,tBodyAcc-mean()-Y,tBodyAcc-mean()-Z,tBodyAcc-std()-X,tBodyAcc-std()-Y,tBodyAcc-std()-Z,tBodyAcc-mad()-X,tBodyAcc-mad()-Y,tBodyAcc-mad()-Z,tBodyAcc-max()-X,...,fBodyBodyGyroJerkMag-skewness(),fBodyBodyGyroJerkMag-kurtosis(),"angle(tBodyAccMean,gravity)","angle(tBodyAccJerkMean),gravityMean)","angle(tBodyGyroMean,gravityMean)","angle(tBodyGyroJerkMean,gravityMean)","angle(X,gravityMean)","angle(Y,gravityMean)","angle(Z,gravityMean)",subject
count,10299.0,10299.0,10299.0,10299.0,10299.0,10299.0,10299.0,10299.0,10299.0,10299.0,...,10299.0,10299.0,10299.0,10299.0,10299.0,10299.0,10299.0,10299.0,10299.0,10299.0
mean,0.274347,-0.017743,-0.108925,-0.607784,-0.510191,-0.613064,-0.633593,-0.525697,-0.614989,-0.466732,...,-0.298592,-0.6177,0.007705,0.002648,0.017683,-0.009219,-0.496522,0.063255,-0.054284,16.146422
std,0.067628,0.037128,0.053033,0.438694,0.50024,0.403657,0.413333,0.484201,0.399034,0.538707,...,0.320199,0.308796,0.336591,0.447364,0.616188,0.48477,0.511158,0.305468,0.268898,8.679067
min,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,...,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,1.0
25%,0.262625,-0.024902,-0.121019,-0.99236,-0.97699,-0.979137,-0.993293,-0.977017,-0.979064,-0.935788,...,-0.536174,-0.841847,-0.124694,-0.287031,-0.493108,-0.389041,-0.817288,0.002151,-0.13188,9.0
50%,0.277174,-0.017162,-0.108596,-0.94303,-0.835032,-0.850773,-0.948244,-0.84367,-0.845068,-0.874825,...,-0.33516,-0.703402,0.008146,0.007668,0.017192,-0.007186,-0.715631,0.182028,-0.003882,17.0
75%,0.288354,-0.010625,-0.097589,-0.250293,-0.057336,-0.278737,-0.302033,-0.087405,-0.288149,-0.014641,...,-0.113167,-0.487981,0.149005,0.29149,0.536137,0.365996,-0.521503,0.25079,0.10297,24.0
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,30.0


In [9]:
df.head()

Unnamed: 0,tBodyAcc-mean()-X,tBodyAcc-mean()-Y,tBodyAcc-mean()-Z,tBodyAcc-std()-X,tBodyAcc-std()-Y,tBodyAcc-std()-Z,tBodyAcc-mad()-X,tBodyAcc-mad()-Y,tBodyAcc-mad()-Z,tBodyAcc-max()-X,...,"angle(tBodyAccMean,gravity)","angle(tBodyAccJerkMean),gravityMean)","angle(tBodyGyroMean,gravityMean)","angle(tBodyGyroJerkMean,gravityMean)","angle(X,gravityMean)","angle(Y,gravityMean)","angle(Z,gravityMean)",subject,Activity,Data
0,0.288585,-0.020294,-0.132905,-0.995279,-0.983111,-0.913526,-0.995112,-0.983185,-0.923527,-0.934724,...,-0.112754,0.0304,-0.464761,-0.018446,-0.841247,0.179941,-0.058627,1,STANDING,Train
1,0.278419,-0.016411,-0.12352,-0.998245,-0.9753,-0.960322,-0.998807,-0.974914,-0.957686,-0.943068,...,0.053477,-0.007435,-0.732626,0.703511,-0.844788,0.180289,-0.054317,1,STANDING,Train
2,0.279653,-0.019467,-0.113462,-0.99538,-0.967187,-0.978944,-0.99652,-0.963668,-0.977469,-0.938692,...,-0.118559,0.177899,0.100699,0.808529,-0.848933,0.180637,-0.049118,1,STANDING,Train
3,0.279174,-0.026201,-0.123283,-0.996091,-0.983403,-0.990675,-0.997099,-0.98275,-0.989302,-0.938692,...,-0.036788,-0.012892,0.640011,-0.485366,-0.848649,0.181935,-0.047663,1,STANDING,Train
4,0.276629,-0.01657,-0.115362,-0.998139,-0.980817,-0.990482,-0.998321,-0.979672,-0.990441,-0.942469,...,0.12332,0.122542,0.693578,-0.615971,-0.847865,0.185151,-0.043892,1,STANDING,Train


In [10]:
df.dtypes.value_counts()

float64    561
object       2
int64        1
dtype: int64

In [11]:
df.columns[df.dtypes == 'object']

Index(['Activity', 'Data'], dtype='object')

In [12]:
df.columns[df.dtypes == 'int64']

Index(['subject'], dtype='object')

Out of the object data types columns, Activity column values need to be converted from categorical to numerical data. Data column was added by us to identify which entries were from the train and test datasets, so this column can be skipped. 

## Encoding Categorical Values 

In [13]:
df['Activity'] = df['Activity'].astype('category')
activity_list = df['Activity'].tolist()
activity_code = df['Activity'].cat.codes 
df['Activity'] = df['Activity'].cat.codes 
activity_mapping = dict(zip(activity_code, activity_list))

In [14]:
activity_mapping 

{2: 'STANDING',
 1: 'SITTING',
 0: 'LAYING',
 3: 'WALKING',
 4: 'WALKING_DOWNSTAIRS',
 5: 'WALKING_UPSTAIRS'}

## Model 

In [15]:
X_train = df[df['Data'] == 'Train'][df.columns[:-1]]
Y_train = df[df['Data'] == 'Train'][df.columns[-2]]

In [16]:
X_test = df[df['Data'] == 'Test'][df.columns[:-1]]
Y_test = df[df['Data'] == 'Test'][df.columns[-2]]

In [17]:
lda = LDA()
lda.fit(X_train, Y_train)
X_train_lda = lda.fit_transform(X_train, Y_train)
X_test_lda = lda.transform(X_test)

In [18]:
slc = StandardScaler()
slc.fit(X_train_lda)
X_train_sc = slc.transform(X_train_lda)
X_test_sc = slc.transform(X_test_lda)

In [19]:
model = LogisticRegression()
model.fit(X_train_sc, Y_train) 

print("Training score: ", model.score(X_train_sc, Y_train))
print("Testing score: ", model.score(X_test_sc, Y_test))

Training score:  0.9869423286180631
Testing score:  0.9626739056667798


In [20]:
Y_predict = model.predict(X_test_sc)

print(classification_report(Y_test, Y_predict))

              precision    recall  f1-score   support

           0       1.00      1.00      1.00       537
           1       0.95      0.87      0.91       491
           2       0.90      0.96      0.93       532
           3       0.97      0.99      0.98       496
           4       1.00      0.97      0.99       420
           5       0.97      0.97      0.97       471

    accuracy                           0.96      2947
   macro avg       0.96      0.96      0.96      2947
weighted avg       0.96      0.96      0.96      2947



In [21]:
res = pd.crosstab(Y_test, Y_predict)
res 

col_0,0,1,2,3,4,5
Activity,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,537,0,0,0,0,0
1,0,429,60,0,0,2
2,0,20,512,0,0,0
3,0,0,0,492,0,4
4,0,0,0,3,409,8
5,0,1,0,12,0,458


We see an accuracy of 96%, so we can say that this is a good model. 