# DataSet

## Human Activity Recognition Using Smartphones Dataset

### explanation

- Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) 
- we captured 3-axial linear acceleration and 3-axial angular velocity 
- 70% of the volunteers was selected for generating the training data and 30% the test data.

### given feature

- Triaxial acceleration, estimated body acceleration.
- Triaxial Angular velocity.
- A 561-feature vector with time and frequency domain variables. 
- Its activity label. 
- An identifier of the subject.

### notes

- Features are normalized and bounded within [-1,1].
- Each feature vector is a row on the text file.
- 'g's (gravity of earth -> 9.80665 m/seg2).
- The gyroscope units are rad/seg.


# Code


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
%matplotlib inline
import seaborn as sns

In [None]:
train = pd.read_csv("../input/human-activity-recognition-with-smartphones/train.csv")
test = pd.read_csv("../input/human-activity-recognition-with-smartphones/test.csv")

In [None]:
def numbering(act):
    if act == "WALKING":
        n = 1
    elif act == "WALKING_UPSTAIRS":
        n = 2
    elif act == "WALKING_DOWNSTAIRS":
        n = 3
    elif act == "SITTING":
        n = 4
    elif act == "STANDING":
        n = 5
    elif act == "LAYING":
        n = 6
    return n

train['Activity_num'] = train['Activity'].apply(lambda x : numbering(x))
test['Activity_num'] = test['Activity'].apply(lambda x : numbering(x))

In [None]:
y_train = train['Activity']
x_train = train.drop(['Activity', 'subject', 'Activity_num'], axis=1)
y_test = test['Activity']
x_test = test.drop(['Activity', 'subject', 'Activity_num'], axis=1)

In [None]:
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)

In [None]:
'''
1 WALKING
2 WALKING_UPSTAIRS
3 WALKING_DOWNSTAIRS
4 SITTING
5 STANDING
6 LAYING
'''
y_train.value_counts().sort_index(axis=0)

## Classifier

- 총 6가지 동작을 분류해야 함
- 5가지 분류기 사용(나이브 베이즈, 결정 트리, K-최근접이웃, 랜덤 포레스트, SVM)
- 데이터는 전처리가 되어 있음(Features are normalized and bounded within [-1,1])

In [None]:
from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC

from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix

classifiers = [GaussianNB(), DecisionTreeClassifier(), KNeighborsClassifier(7),
                SVC(), RandomForestClassifier(n_estimators=100)]
clf_names = []
scores = []
conf_matrices = []

for clf in classifiers:
    clf = clf.fit(x_train, y_train)
    pred = clf.predict(x_test)
    
    clf_names.append(clf.__class__.__name__)
    scores.append(accuracy_score(y_test, pred))
    conf_matrices.append(confusion_matrix(y_test, pred))

### Conclusion

- 5개의 분류기 중 SVM이 가장 높은 성능을 보임(95.04%)

In [None]:
result = pd.DataFrame({'Classifier': clf_names, 'Score': scores}) \
            .sort_values(by=['Score'], axis=0, ascending=False)
result

- SVM으로 학습하여 나온 결과의 confusion matrix

In [None]:
htmap = pd.DataFrame(conf_matrices[3])
sns.heatmap(htmap, annot=True, fmt='d')