In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plot
import seaborn as sns
%matplotlib inline

In [3]:
df_kine=pd.read_csv("run_or_walk.csv")

DESCRIPTION

You are supposed to detect whether the person is running or walking based on the sensor data collected from iOS device. The dataset contains a single file which represents sensor data samples collected from accelerometer and gyroscope from iPhone 5c in 10 seconds interval and ~5.4/second frequency.

Objective: Practice classification based on Naive Bayes algorithm. Identify the predictors that can be influential.

Actions to Perform:

1. Load the kinematics dataset as measured on mobile sensors from the file “run_or_walk.csv.”
2. List the columns in the dataset.
3. Let the target variable “y” be the activity, and assign all the columns after it to “x.”
4. Using Scikit-learn, fit a Gaussian Naive Bayes model and observe the accuracy.
5. Generate a classification report using Scikit-learn.
6. Repeat the model once using only the acceleration values as predictors and then using only the gyro values as predictors.
7. Comment on the difference in accuracy between both models.

In [4]:
df_kine.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 88588 entries, 0 to 88587
Data columns (total 11 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   date            88588 non-null  object 
 1   time            88588 non-null  object 
 2   username        88588 non-null  object 
 3   wrist           88588 non-null  int64  
 4   activity        88588 non-null  int64  
 5   acceleration_x  88588 non-null  float64
 6   acceleration_y  88588 non-null  float64
 7   acceleration_z  88588 non-null  float64
 8   gyro_x          88588 non-null  float64
 9   gyro_y          88588 non-null  float64
 10  gyro_z          88588 non-null  float64
dtypes: float64(6), int64(2), object(3)
memory usage: 7.4+ MB


In [9]:
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics
target=df_kine['activity']
feature=df_kine.iloc[:,5:10]
x_train,x_test,y_train,y_test=train_test_split(feature,target, random_state=1)

In [11]:
gb=GaussianNB()
gb.fit(x_train,y_train)
y_pred=gb.predict(x_test)
print(metrics.accuracy_score(y_test,y_pred))

0.954982616155687


In [12]:
print(metrics.confusion_matrix(y_test,y_pred))

[[10857    86]
 [  911 10293]]


In [13]:
print(metrics.classification_report(y_test,y_pred))

              precision    recall  f1-score   support

           0       0.92      0.99      0.96     10943
           1       0.99      0.92      0.95     11204

    accuracy                           0.95     22147
   macro avg       0.96      0.96      0.95     22147
weighted avg       0.96      0.95      0.95     22147



In [20]:
feature_acc=df_kine.iloc[:,5:7]
x_train,x_test,y_train,y_test=train_test_split(feature_acc,target, random_state=1)
gb=GaussianNB()
gb.fit(x_train,y_train)
y_pred_acc=gb.predict(x_test)
print("accuracy_score:",metrics.accuracy_score(y_test,y_pred_acc),"\n")
print("confusion matrix:\n",metrics.confusion_matrix(y_test,y_pred_acc),"\n")
print("classification report:\n",metrics.classification_report(y_test,y_pred_acc))

accuracy_score: 0.8949293358016888 

confusion matrix:
 [[10804   139]
 [ 2188  9016]] 

classification report:
               precision    recall  f1-score   support

           0       0.83      0.99      0.90     10943
           1       0.98      0.80      0.89     11204

    accuracy                           0.89     22147
   macro avg       0.91      0.90      0.89     22147
weighted avg       0.91      0.89      0.89     22147



In [22]:
feature_gyro=df_kine.iloc[:,8:10]
x_train,x_test,y_train,y_test=train_test_split(feature_gyro,target, random_state=1)
gb=GaussianNB()
gb.fit(x_train,y_train)
y_pred_gyro=gb.predict(x_test)
print("accuracy_score:",metrics.accuracy_score(y_test,y_pred_gyro),"\n")
print("confusion matrix:\n",metrics.confusion_matrix(y_test,y_pred_gyro),"\n")
print("classification report:\n",metrics.classification_report(y_test,y_pred_gyro))

accuracy_score: 0.5738474736984693 

confusion matrix:
 [[7905 3038]
 [6400 4804]] 

classification report:
               precision    recall  f1-score   support

           0       0.55      0.72      0.63     10943
           1       0.61      0.43      0.50     11204

    accuracy                           0.57     22147
   macro avg       0.58      0.58      0.57     22147
weighted avg       0.58      0.57      0.56     22147



### comparision between acceleration and gyro:
<p> As I have noticed acceleration is more helpful to tell whether the activity is run or walk. Gyro is angular velocity which is not that great in differentiating the activity.