# Activity Recognition using Machine Learning

In this project, I take the activity recognition dataset. The dataset includes sensor readings of 30 different individuals and the type of activity they were recorded for. Here, I'll use the dataset from Kaggle to classify various activities.

## Import libraries

Let's start by first importing all the necessary libraries. I import `numpy` and `pandas` for managing arrays and dataset. Then, matplotlib in included to be used to create visualisations. To use various machine learning algorithms, I import SVM, Logistic Regression, K Nearest Neighbors Classifier and Random Forest Classifier from sklearn. Also, included are `accuracy_score` to calculate accuracy and `train_test_split` to split the data into training and testing data.

In [1]:
import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

## Understand dataset

I first import the dataset using the pandas method `read_csv`.

In [2]:
dataset = pd.read_csv('dataset.csv')

Next, I analyse the dataset shape to see the number of features and the total dataset records. Also, I check if there are any null values.

In [3]:
print("Dataset: {}".format(dataset.shape))
print("Null values present: {}".format(dataset.isnull().values.any()))

Dataset: (7352, 563)
Null values present: False


There are total 7352 records in the dataset that we can use. Further, there are no null values in the dataset. This is fortunate and thus, we need not write code to handle null values.
Next, I check the top 5 rows of the dataset using the method `head(5)`.

In [4]:
dataset.head(5)

Unnamed: 0,tBodyAcc-mean()-X,tBodyAcc-mean()-Y,tBodyAcc-mean()-Z,tBodyAcc-std()-X,tBodyAcc-std()-Y,tBodyAcc-std()-Z,tBodyAcc-mad()-X,tBodyAcc-mad()-Y,tBodyAcc-mad()-Z,tBodyAcc-max()-X,...,fBodyBodyGyroJerkMag-kurtosis(),"angle(tBodyAccMean,gravity)","angle(tBodyAccJerkMean),gravityMean)","angle(tBodyGyroMean,gravityMean)","angle(tBodyGyroJerkMean,gravityMean)","angle(X,gravityMean)","angle(Y,gravityMean)","angle(Z,gravityMean)",subject,Activity
0,0.288585,-0.020294,-0.132905,-0.995279,-0.983111,-0.913526,-0.995112,-0.983185,-0.923527,-0.934724,...,-0.710304,-0.112754,0.0304,-0.464761,-0.018446,-0.841247,0.179941,-0.058627,1,STANDING
1,0.278419,-0.016411,-0.12352,-0.998245,-0.9753,-0.960322,-0.998807,-0.974914,-0.957686,-0.943068,...,-0.861499,0.053477,-0.007435,-0.732626,0.703511,-0.844788,0.180289,-0.054317,1,STANDING
2,0.279653,-0.019467,-0.113462,-0.99538,-0.967187,-0.978944,-0.99652,-0.963668,-0.977469,-0.938692,...,-0.760104,-0.118559,0.177899,0.100699,0.808529,-0.848933,0.180637,-0.049118,1,STANDING
3,0.279174,-0.026201,-0.123283,-0.996091,-0.983403,-0.990675,-0.997099,-0.98275,-0.989302,-0.938692,...,-0.482845,-0.036788,-0.012892,0.640011,-0.485366,-0.848649,0.181935,-0.047663,1,STANDING
4,0.276629,-0.01657,-0.115362,-0.998139,-0.980817,-0.990482,-0.998321,-0.979672,-0.990441,-0.942469,...,-0.699205,0.12332,0.122542,0.693578,-0.615971,-0.847865,0.185151,-0.043892,1,STANDING


From the output above, I can see that there are set of accelerometer and gyroscope sensor values for each record. Further, the last two columns are `subject` which refers to subject number and `Activity` which defines the type of activity. `subject` is of no use to us, so I can drop it safely. The `Activity` column acts as the label `y` and all the rest columns are features `X`.

In [5]:
dataset.drop(columns = ['subject'], inplace = True)
y = dataset['Activity']
X = dataset.drop(columns = ['Activity'])