### Content

1. [Introdaction](#1.-Introdaction)
2. [Import](#2.-Import)
3. [Research data](#3.-Research-data)
4. [Visualization of Accelerometer Signals](#4.-Visualization-of-Accelerometer-Signals)
5. [Feature engineering](#5.-Feature-engineering)

### 1. Introdaction

**What is accelerometer?**

An **accelerometer** is a device that measures proper acceleration. Proper acceleration, being the acceleration (or rate of change of velocity) of a body in its own instantaneous rest frame, is not the same as coordinate acceleration, being the acceleration in a fixed coordinate system. For example, an accelerometer at rest on the surface of the Earth will measure an acceleration due to Earth's gravity, straight upwards (by definition) of g ≈ 9.81 m/s2. By contrast, accelerometers in free fall (falling toward the center of the Earth at a rate of about 9.81 m/s2) will measure zero. [More...](https://en.wikipedia.org/wiki/Accelerometer)

**What is my goal?**

I want to collect accelerometer data from my smart phone. And after that i want to create model, which will predict class of my activity (*for example: running or walking*).


**How will I do it?**

I will collect data with [AndroSensor](https://play.google.com/store/apps/details?id=com.fivasim.androsensor). And I will use this data for modeling.

![app screen](https://github.com/OleksandrKosovan/activity-recognition/blob/master/img/andro-sensor.png?raw=true)

Path to metadata of data collection: **data/metadata/AndroSensorSettings.xml**

### 2. Import 

In [None]:
import os
import datetime

In [None]:
import pandas as pd
import numpy as np

In [None]:
from tqdm import tqdm

In [None]:
from scipy.signal import find_peaks
from scipy.integrate import cumtrapz

In [None]:
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
plt.style.use('ggplot')

### 3. Research data


In [None]:
DATA_PATH = '../input/data-for-activity-recognition/data/data/'

In [None]:
running_folder = 'running'
idle_folder = 'idle'
walking_folder = 'walking'
stairs_folder = 'stairs'

activity_list = [running_folder, idle_folder, walking_folder, stairs_folder]

In [None]:
# checking

for activity in activity_list:
    file_names_list = os.listdir(os.path.join(DATA_PATH, activity))
    print(activity, ': ', len(file_names_list))

**Schema of data preparation:**

![img](https://github.com/OleksandrKosovan/activity-recognition/blob/master/img/data-preparation.jpg?raw=true)

### 4. Visualization of Accelerometer Signals

In [None]:
def plot_3d_trajectory(x, y, z):
    """ 
    Plot 3D Trajectory
    Next we will calculate the phone’s motion 
    by integrating the linear-accelerations, 
    and plot the results.
    """
    x = cumtrapz(x)
    y = cumtrapz(y)
    z = cumtrapz(z)
    
    fig3,ax = plt.subplots()
    fig3.suptitle('3D Trajectory of phone',fontsize=20)
    ax = plt.axes(projection='3d')
    ax.plot3D(x,y,z,c='red',lw=1,label='phone trajectory')
    ax.set_xlabel('X position')
    ax.set_ylabel('Y position')
    ax.set_zlabel('Z position')
    plt.show()

In [None]:
def plot_frequency_spectrum(x, y, z):
    """ Plot Frequency spectrum """
    fig4,[ax1,ax2,ax3] = plt.subplots(3,1,sharex=True,sharey=True)
    fig4.suptitle('Spectrum',fontsize=20)
    ax1.plot(x,c='r',label='x')
    ax1.legend()
    ax2.plot(y,c='b',label='y')
    ax2.legend()
    ax3.plot(z,c='g',label='z')
    ax3.legend()
    ax3.set_xlabel('Freqeuncy (Hz)')
    plt.show()

In [None]:
def select_random_df(folder_name):
    custom_path = os.path.join(DATA_PATH, folder_name)
    data = pd.read_csv(os.path.join(custom_path, os.listdir(custom_path)[0]))
    x = data.accelerometer_X.values
    y = data.accelerometer_Y.values
    z = data.accelerometer_Z.values
    return x, y, z

In [None]:
# running
x,y,z = select_random_df(running_folder)
plot_3d_trajectory(x, y, z)

In [None]:
plot_frequency_spectrum(x, y, z)

In [None]:
# idle
x,y,z = select_random_df(idle_folder)
plot_3d_trajectory(x, y, z)

In [None]:
plot_frequency_spectrum(x, y, z)

In [None]:
# walking
x,y,z = select_random_df(walking_folder)
plot_3d_trajectory(x, y, z)

In [None]:
plot_frequency_spectrum(x, y, z)

In [None]:
# stairs
x,y,z = select_random_df(stairs_folder)
plot_3d_trajectory(x, y, z)

In [None]:
plot_frequency_spectrum(x, y, z)

### 5. Feature engineering

In this section we report the notation and the preliminary definitions that we use throughout the paper. For sake of readability they have also be summarized in Table I. We denote $f$ as the index of a frame containing $N$ threeaxis accelerometer samples $s_{j, n}^{f},$ which is the $n$ -th sample, $n \in[1, N],$ of the $j$ -th axis, $j \in\{x, y, z\}$ for the $f$ -th frame. In the remaining of the paper, when it is not strictly necessary to distinguish among the three axes components, we omit the axis index $j .$ Consequently, the accelerometer sample containing the three axis components $\left\{s_{x, n}^{f}, s_{y, n}^{f}, s_{z, n}^{f}\right\}$ is denoted with $\mathbf{s}_{n}^{f}$
The employed features for each $j$ -th axis are: $i$ ) mean $\left(\mu_{j}^{f}\right),$ ii) standard deviation $\left(\sigma_{j}^{f}\right),$ and iii number of peaks $\left(P_{j}^{f}\right) .$ Being well-known the formulae for $\mu_{j}^{f}$ and $\sigma_{j}^{f},$ we only report the definition of $P_{j}^{f}:$
$$
\begin{array}{c}
P_{j}^{f}=\sum_{n} \rho_{j, n}^{f} \\
\rho_{j, n}^{f}=\left\{\begin{array}{ll}
1, & \text { if }\left(s_{j, n+1}^{f}-s_{j, n}^{f}\right)\left(s_{j, n}^{f}-s_{j, n-1}^{f}\right)<0,\left\|s_{j, n}^{f}\right\| \geq \epsilon \\
0, & \text { otherwise }
\end{array}\right.
\end{array}
$$

The quantity $\epsilon$ is a threshold employed to define a signal peak, empirically set to $\epsilon=0.75$ by of means of practical trials.

In [None]:
def mean_calculator(three_axis):
    """ Return mean of each vectors """
    three_axis = np.array(three_axis)
    vector_x = three_axis[:, 0]
    vector_y = three_axis[:, 1]
    vector_z = three_axis[:, 2]
    x_mean = np.mean(vector_x)
    y_mean = np.mean(vector_y)
    z_mean = np.mean(vector_z)
    return x_mean, y_mean, z_mean

In [None]:
def std_calculator(three_axis):
    """ Return standart deviation of each vectors """
    three_axis = np.array(three_axis)
    vector_x = three_axis[:, 0]
    vector_y = three_axis[:, 1]
    vector_z = three_axis[:, 2]
    x_std = np.std(vector_x)
    y_std = np.std(vector_y)
    z_std = np.std(vector_z)
    return x_std, y_std, z_std

In [None]:
def peaks_calculator(three_axis):
    """ Return number of peaks of each vectors """
    three_axis = np.array(three_axis)
    vector_x = three_axis[:, 0]
    vector_y = three_axis[:, 1]
    vector_z = three_axis[:, 2]
    x_peaks = len(find_peaks(vector_x)[0])
    y_peaks = len(find_peaks(vector_y)[0])
    z_peaks = len(find_peaks(vector_z)[0])
    return x_peaks, y_peaks, z_peaks

In [None]:
def feature_engineer(action, target, df):
    try:
        x_mean, y_mean, z_mean = mean_calculator(action)
        x_std, y_std, z_std = std_calculator(action)
        x_peaks, y_peaks, z_peaks = peaks_calculator(action)
    except:
        print(action.shape, target)
    dictionary = {
        'x_mean': x_mean,
        'y_mean': y_mean, 
        'z_mean': z_mean,
        'x_std': x_std, 
        'y_std': y_std,
        'z_std': z_std,
        'x_peaks': x_peaks, 
        'y_peaks': y_peaks, 
        'z_peaks': z_peaks,
        'target': target
    }
    df = df.append(
        dictionary, 
        ignore_index=True
    )
    return df

In [None]:
columns = [
    'x_mean', 'y_mean', 'z_mean', 
    'x_std', 'y_std', 'z_std', 
    'x_peaks', 'y_peaks', 'z_peaks',
    'target'
]
dataframe = pd.DataFrame(columns=columns)

In [None]:
for activity in activity_list:
    activity_files = os.listdir(os.path.join(DATA_PATH, activity))
    for file in activity_files:
        try:
            df = pd.read_csv(os.path.join(DATA_PATH, activity, file))
            array = df.to_numpy()
            dataframe = feature_engineer(
                action=array, 
                target=activity, 
                df=dataframe
            )
        except:
            print('some error')

In [None]:
print(dataframe.shape)
dataframe.head()

In [None]:
dataframe.target.unique()

In [None]:
dataframe['target'].value_counts()

In [None]:
dataframe['target'].value_counts().plot(kind='barh')

In [None]:
# data frame to csv
# dataframe.to_csv('data/final_data.csv', index=False)

# Accelerometer Signals Classification for Activity and Movement Recognition

![img](https://camo.githubusercontent.com/ec2708af7a740e1a2396b777cf336b7c9804dc28/68747470733a2f2f7777772e616e64726f6964686976652e696e666f2f77702d636f6e74656e742f75706c6f6164732f323031372f31322f616e64726f69642d757365722d61637469766974792d7265636f676e6974696f6e2d7374696c6c2d77616c6b696e672d72756e6e696e672d64726976696e672e6a7067)

### Content
1. [Import Modules](#1.-Import-for-ML)
2. [Read data](#2.-Read-data)
3. [Data preparation](#3.-Data-preparation)
4. [Split data](#4.-Split-data)
5. [Modeling](#5.-Modeling)

 - 5.1. [Logistic Regression](#5.1.-Logistic-Regression)
 - 5.2. [Random Forest](#5.2.-Random-Forest)
 - 5.3. [Support Vector Classification](#5.3.-Support-Vector-Classification)
 - 5.4. [Decision Tree](#5.4.-Decision-Tree-Classifier)
 - 5.5. [Gradient Boosting Classifier](#5.5.-Gradient-Boosting-Classifier)

### 1. Import for ML

In [None]:
import sys

In [None]:
import numpy as np
import pandas as pd

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

In [None]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import GradientBoostingClassifier

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

### 2. Read data

In [None]:
df = dataframe

In [None]:
df.shape

In [None]:
df.head()

### 3. Data preparation

We need to randomize the data

In [None]:
df = df.sample(frac=1).reset_index(drop=True)

In [None]:
df.shape

In [None]:
df.head()

### 4. Split data

In [None]:
x_columns = [
    'x_mean', 'y_mean', 'z_mean', 
    'x_std', 'y_std', 'z_std', 
    'x_peaks', 'y_peaks', 'z_peaks'
]
X = df[x_columns]
y = df.target

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42
)

In [None]:
print('X tran shape:', X_train.shape)
print('X test shape:', X_test.shape)
print('y tran shape:', y_train.shape)
print('y test shape:', y_test.shape)

### 5. Modeling

In [None]:
labels = df.target.unique()

In [None]:
def train_model(model):
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print(classification_report(y_test, y_pred))
    return confusion_matrix(y_test, y_pred)

In [None]:
def visualize_confusion_matrix(cm, labels=labels):
    df_cm = pd.DataFrame(cm, columns=labels, index=labels)
    df_cm.index.name = 'Actual'
    df_cm.columns.name = 'Predicted'
    plt.figure(figsize = (10,7))
    sns.set(font_scale=1.4)#for label size
    sns.heatmap(df_cm, cmap="Blues", annot=True, annot_kws={"size": 16}, fmt='g')

###### 5.1. Logistic Regression

In [None]:
lr = LogisticRegression()
lr_cm = train_model(lr)

In [None]:
visualize_confusion_matrix(lr_cm)

###### 5.2. Random Forest

In [None]:
rf = RandomForestClassifier()
rf_cm = train_model(rf)

In [None]:
visualize_confusion_matrix(rf_cm)

###### 5.3. Support Vector Classification

In [None]:
svc = SVC()
svc_cm = train_model(svc)

In [None]:
visualize_confusion_matrix(svc_cm)

###### 5.4. Decision Tree Classifier

In [None]:
dt = DecisionTreeClassifier()
dt_cm = train_model(dt)

In [None]:
visualize_confusion_matrix(dt_cm)

##### 5.5. Gradient Boosting Classifier

In [None]:
gb = GradientBoostingClassifier()
gb_cm = train_model(gb)

In [None]:
visualize_confusion_matrix(gb_cm)

### [To start of ML](#Accelerometer-Signals-Classification-for-Activity-and-Movement-Recognition)