# MotiononSense Dataset
### Problem definition: predict user's activity based on smartphone sensors data

---
## Part 2:

* Problem Definition and Possible Applications
* Feature Extraction / Engineering
* Classic ML Models - Training and Statistical Evaluation
* Problems and the Need for "Real Data"



---
## Problem Definition and Applications

* Our problem is predicting user's activity from phone sensors data
* This definition might be too wide, so we limit ourself to predicting 1 of 5 possible activities
* Thus, we can define our problem as multiclass classification, where we can label each data point as <br> sitting, standing, walking, going downstaris or going upstairs 
* There are many application for this kind of classification in various fields such as <br> healthcare, intelligence etc. 
* We will further discuss some of these applications in later part of our project

---
## Feature Extraction / Engineering

Our data is a time series - a sequence of measurements over time

* Thus, extracting value for a single data point depends on it's context
* But, classic ML algorithms/classifiers predicts output for a single input data point - independent to adjacent input data point
* So, in order to use our data to train classic ML model we will have to encode our features to represent context data 
* We will present two different features encoding methods - Sliding-Window and Raw-History

### Sliding Window Features


* In this method, we will encode each data sample as a concatenation of analytical functions calculated over a predefined size of previous samples
* For example, here we will use a context size of 10 (calculate over 10 pervious data points)
* Notice that we cannot mix between different expirements who represents different activity label

In [1]:
import numpy as np
import pandas as pd
import os

PROJECT_MAIN_DIR = os.getcwd()

In [2]:
class SlidingWindow:
    
    def __init__(self, orig_df, window_size, num_experiments, num_participants, exclude, fnlist):
        exps = [i for i in range(1,num_experiments + 1) if i != exclude]
        parts = [i for i in range(1,num_participants + 1)]
        smp_df = self.create_sliding_df(orig_df, window_size, fnlist, exps, parts)
        self.window_size = window_size
        self.df = smp_df

    def create_sld_df_single_exp(self, orig_df, window_size, analytic_functions_list):
        dfs_to_concate = []
        base_df = orig_df.drop('action', axis=1)
        for func in analytic_functions_list:
            method_to_call = getattr(base_df.rolling(window=window_size), func)
            analytic_df = method_to_call()
            analytic_df = analytic_df[window_size:]
            analytic_df.columns = [col + "_sld_" + func for col in analytic_df.columns]
            dfs_to_concate.append(analytic_df)

        action_df = orig_df[['action']][window_size:] # [[]] syntax to return DataFrame and not Series
        dfs_to_concate.append(action_df)
        return pd.concat(dfs_to_concate,axis=1)

    def create_sliding_df(self, orig_df, window_size, analytic_functions_list, expirements, participants):
        dfs_to_concate = []
        cols_to_drop = ['partc', 'action_file_index']
        for e in expirements:
            for p in participants:
                exp_df = orig_df[(orig_df['partc'] == p) & (orig_df['action_file_index'] == e)]
                exp_df = exp_df.drop(cols_to_drop, axis=1)
                exp_roll_df = self.create_sld_df_single_exp(exp_df, window_size, analytic_functions_list)

                dfs_to_concate.append(exp_roll_df)
        return pd.concat(dfs_to_concate, axis=0, ignore_index=True)

In [3]:
df = pd.read_csv(os.path.join(PROJECT_MAIN_DIR,'full_data.gz'), compression='gzip') # we will load our data saved as a compressed csv file
df = df.drop(['Unnamed: 0'], axis=1).set_index('time')

In [4]:
# defining variables for the sliding window data frame creation
num_experiments = 16
num_participants = 24
exclude = 10
analytic_functions_list = ['mean', 'sum', 'median', 'min', 'max', 'std']
WINDOW_SIZE = 10

# create the sliding window data frame
win_df = SlidingWindow(df, WINDOW_SIZE, num_experiments, num_participants, exclude, analytic_functions_list)

Viewing our data and performing sanity check

In [5]:
win_df.df.head(5)

Unnamed: 0,attitude.roll_sld_mean,attitude.pitch_sld_mean,attitude.yaw_sld_mean,gravity.x_sld_mean,gravity.y_sld_mean,gravity.z_sld_mean,rotationRate.x_sld_mean,rotationRate.y_sld_mean,rotationRate.z_sld_mean,userAcceleration.x_sld_mean,...,gravity.x_sld_std,gravity.y_sld_std,gravity.z_sld_std,rotationRate.x_sld_std,rotationRate.y_sld_std,rotationRate.z_sld_std,userAcceleration.x_sld_std,userAcceleration.y_sld_std,userAcceleration.z_sld_std,action
0,1.476032,-0.699698,0.659227,0.761074,0.643965,-0.072516,0.327435,-0.23759,0.125294,0.089179,...,0.003243,0.006475,0.029224,0.346436,0.590791,0.249107,0.083854,0.128267,0.114783,dws
1,1.464487,-0.697192,0.650675,0.761804,0.642056,-0.081426,0.344311,-0.346253,0.059212,0.058162,...,0.001706,0.004752,0.029167,0.377046,0.554298,0.172086,0.087612,0.140744,0.099833,dws
2,1.448353,-0.695176,0.63986,0.761559,0.64051,-0.093848,0.481461,-0.525592,0.033799,0.054865,...,0.002168,0.004552,0.032387,0.428049,0.712118,0.14147,0.090179,0.146393,0.097535,dws
3,1.4265,-0.692378,0.625654,0.760575,0.638354,-0.110722,0.602284,-0.699763,0.062317,0.055195,...,0.004032,0.005687,0.043814,0.439572,1.00645,0.168158,0.089927,0.148507,0.10431,dws
4,1.399383,-0.688014,0.609652,0.758815,0.634966,-0.131806,0.70538,-0.951931,0.111215,0.041147,...,0.007015,0.008968,0.062607,0.433246,1.32976,0.224856,0.074548,0.091135,0.131904,dws


Sanity check:

* There are 15 experiments and 24 participants for each expirement
* For sliding window of 10 samples we are loosing 10 data samples of each experiment
* This sums up to 15 \* 24 \* 10 = 3600
* Indeed in the new data set there are exactly 3600 rows fewer than the origial data set
* Furthermore, the new data set has exactly 12 \* {num_analytical_function} + label column = 12 \* 6 + 1 = 73 columns <br> (12 is the number of features in the original data set)

In [6]:
print(win_df.df.shape)
print(df.shape)

(1409265, 73)
(1412865, 15)


### Raw History Features

* In this method, we will simply encode each data sample as a concatenation of the raw features of it's previous x data points
* For example, here we will use a context size of 10. i.e it is aligned with our previous sliding window method, <br> but instead of calculating aggregation of analytical function over the context features, here we simply encode them as a long vector
* Again, we cannot mix between different expirements who represents different activity label

In [7]:
class RawHistory:
    
    def __init__(self, origin_df, history_length, num_experiments, num_participants, exclude):
        exps = [i for i in range(1,num_experiments + 1) if i != exclude]
        parts = [i for i in range(1,num_participants + 1)]
        smp_df = self.create_history_encoded_df(origin_df, history_length, expirements=exps, participants=parts)
        self.history_length = history_length
        self.df = smp_df

    def create_history_encoded_single_exp(self, orig_df, history_length):
        hist_df = orig_df.copy(deep=True) # later operations are "in place" so we need to avoid changing original dataframe
        columns_to_shift = hist_df.columns[:-1] # omit the action column, we don't want to duplicate it
        for i in range(1,history_length + 1):
            shift_df = orig_df.shift(i)
            for col_name in columns_to_shift:
                new_col_name = "prev_{0}_".format(i) + col_name
                hist_df[new_col_name] = shift_df[col_name] # add shifted column, aka history, as a column to orignal dataframe

        hist_df = hist_df[history_length:] # we don't return the first "history_length" sample - they have missing history data
        return hist_df

    def create_history_encoded_df(self, orig_df, history_length, expirements, participants):
        dfs_to_concate = []
        cols_to_drop = ['partc', 'action_file_index']
        for e in expirements:
            for p in participants:
                exp_df = orig_df[(orig_df['partc'] == p) & (orig_df['action_file_index'] == e)]
                exp_df = exp_df.drop(cols_to_drop, axis=1)
                exp_histoy_df = self.create_history_encoded_single_exp(exp_df, history_length)
                dfs_to_concate.append(exp_histoy_df)
        return pd.concat(dfs_to_concate, axis=0, ignore_index=True) 

In [8]:
# defining variables for the raw history data frame creation
num_experiments = 16
num_participants = 24
exclude = 10
HISTORY_LEN = 10

# create the raw history data frame
hist_df = RawHistory(df, HISTORY_LEN, num_experiments, num_participants, exclude)

**Sanity check:**

* There are 15 expirements and 24 participants for each expirement
* For history encoded data with history length of 10 samples we are loosing 10 data samples of each expirement this sums up to 15 \* 24 \* 10 = 3600 
* This sums up to 15 \* 24 \* 10 = 3600 
* Indeed in the new data set there are exactly 3600 rows fewer than the origial data set <\li>
* Furthermore, the new data set has exactly 12 \* {history_length + 1} + label columns = 12 * (10+1) + 1 = 133 columns <br> (addition of one for the original data) 

In [9]:
print(hist_df.df.shape)
print(df.shape)

(1409265, 133)
(1412865, 15)


---
## Classic ML Models - Training and Statistical Evaluation

* Now we have two different encodings of our time series data as independent data points.
* We can feed them into an ML model and evaluate the performance of our predictions 
* First we will define a class to consolidate the functionality of splitting the data into training, development and test sets <br> and performing the evaluation using percision and recall for each activity label


In [10]:
from sklearn.metrics import classification_report, confusion_matrix

class DataProcessingEval():
    
    def __init__(self, origin_df, labels_dict):
        self.labels_dict = labels_dict
        self.classes_names = self.create_classes(labels_dict)
        self.df = origin_df
    
    def create_samples(self, division_ratio=[0.7, 0.1, 0.2]):
        # Define X, y
        df = self.df.sample(frac=1).reset_index(drop=True)
        X, y = df.drop(["action"], axis=1), df["action"]
        y = y.replace(self.labels_dict)

        # Divide to training, validation and test set
        train_ratio, dev_ratio = division_ratio[0], division_ratio[1]
        num_training = int(df.shape[0] * train_ratio)
        num_validation = int(df.shape[0] * dev_ratio)

        X_train, y_train = X[:num_training], y[:num_training]
        X_vald, y_vald = X[num_training:num_training + num_validation], y[num_training:num_training + num_validation]
        X_test, y_test = X[num_training + num_validation:], y[num_training + num_validation:]

        return X_train, y_train, X_vald, y_vald, X_test, y_test

    def create_classes(self, labels_dict):
        classes_indexs = labels_dict.items()
        classes_indexs = sorted(classes_indexs, key=lambda x: x[1])
        classes_names = [label for label, index in classes_indexs]
        return classes_names

    def evaluate_results(self, y_true, y_pred):
            print("---- Printing classification report ----")
            print(classification_report(y_true, y_pred, target_names=self.classes_names))

In [11]:
labels = {'wlk': 0, 'sit': 1, "std": 2, "ups": 3, "jog": 4, "dws": 5}

win_processor = DataProcessingEval(win_df.df, labels_dict=labels)
X_train_win, y_train_win, X_vald_win, y_vald_win, X_test_win, y_test_win = win_processor.create_samples()

hist_processor = DataProcessingEval(hist_df.df, labels_dict=labels)
X_train_hist, y_train_hist, X_vald_hist, y_vald_hist, X_test_hist, y_test_hist = hist_processor.create_samples()

**A few words about our evaluation metrics**

* Our evaluation metrics will be precision, recall and their harmonic average the F1 score
* These metrics are much more relevant to our problem compared to the model total accuracy for few reasons. 
* First, our problem is imbalanced. We've already seen that the labels walk sit and stand are x2 time more frequent than going up/down stairs 
* Second, We also consider our assumption that some activities will be much harder to predict compared to others. <br> i.e separating "sit" from "walk" should be much easier than separating between "upstairs" and "downstairs"
* That is why we'll be interested in the model performance for each activity by its own.


**Logistic Regression Model**

* We will start with a simple linear model and evaluate its performnace using our two different encodings
* We will use logistic regression with L2 loss function (MSE), with the default C=1.0 regularization value <br> (which defines our tradeoff between regularization term and actual loss in the loss function)

In [12]:
from sklearn.linear_model import LogisticRegression
lr_win = LogisticRegression(multi_class='multinomial', solver='lbfgs', verbose=1, max_iter=300)
lr_win.fit(X_train_win, y_train_win)

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  3.4min finished


LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=300, multi_class='multinomial',
          n_jobs=1, penalty='l2', random_state=None, solver='lbfgs',
          tol=0.0001, verbose=1, warm_start=False)

We get an error that our model didn't converge - i.e the training stopped after reaching the maximum number of iterations instead of stopping due to our halting criteria threshold <br>
Let's evalute its performace on the validation set

In [13]:
lr_win_prediction = lr_win.predict(X_vald_win)
win_processor.evaluate_results(y_vald_win, lr_win_prediction)

---- Printing classification report ----
             precision    recall  f1-score   support

        wlk       0.61      0.82      0.70     34388
        sit       0.99      0.99      0.99     33911
        std       0.97      0.98      0.97     30358
        ups       0.59      0.46      0.52     15940
        jog       0.85      0.83      0.84     13245
        dws       0.52      0.21      0.30     13084

avg / total       0.79      0.80      0.78    140926



The model performs quite well on sit and stand labels but not so well on the other activities <br>
We can try and train with stronger regularization and evaluate the results:

In [14]:
lr_win_r = LogisticRegression(multi_class='multinomial', solver='lbfgs', verbose=1, max_iter=300, C=0.1)
lr_win_r.fit(X_train_win, y_train_win)
lr_win_r_prediction = lr_win_r.predict(X_vald_win)
win_processor.evaluate_results(y_vald_win, lr_win_r_prediction)

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  3.4min finished


---- Printing classification report ----
             precision    recall  f1-score   support

        wlk       0.61      0.82      0.70     34388
        sit       0.99      0.99      0.99     33911
        std       0.97      0.98      0.98     30358
        ups       0.59      0.45      0.51     15940
        jog       0.85      0.83      0.84     13245
        dws       0.52      0.22      0.31     13084

avg / total       0.79      0.80      0.78    140926



That didn't change much the results so we can keep our previuos regularization rate

Now we will perform the same analysis over the raw history encoded sample

In [15]:
lr_hist = LogisticRegression(multi_class='multinomial', solver='lbfgs', verbose=1, max_iter=300)
lr_hist.fit(X_train_hist, y_train_hist)
lr_hist_prediction = lr_hist.predict(X_vald_hist)
hist_processor.evaluate_results(y_vald_hist, lr_hist_prediction)

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  4.2min finished


---- Printing classification report ----
             precision    recall  f1-score   support

        wlk       0.49      0.68      0.57     34516
        sit       0.87      0.96      0.91     33832
        std       0.55      0.76      0.64     30528
        ups       0.45      0.17      0.25     15533
        jog       0.53      0.10      0.17     13438
        dws       0.45      0.14      0.21     13079

avg / total       0.59      0.60      0.56    140926



In [16]:
lr_r_hist = LogisticRegression(multi_class='multinomial', solver='lbfgs', verbose=1, max_iter=300, C=0.1)
lr_r_hist.fit(X_train_hist, y_train_hist)
lr_r_hist_prediction = lr_r_hist.predict(X_vald_hist)
hist_processor.evaluate_results(y_vald_hist, lr_r_hist_prediction)

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  4.2min finished


---- Printing classification report ----
             precision    recall  f1-score   support

        wlk       0.49      0.68      0.57     34516
        sit       0.86      0.96      0.91     33832
        std       0.55      0.76      0.64     30528
        ups       0.45      0.17      0.25     15533
        jog       0.54      0.10      0.17     13438
        dws       0.45      0.14      0.21     13079

avg / total       0.59      0.60      0.56    140926



* We can see that the results of the same classifier with the raw history encoding are much worse compared to the sliding window encoding - even with the relatively "easy" to predict activities sit and stand 
* This leads to a conclusion that the data is not linearly separable with the raw history encoding, and might be even the same with the sliding window aggregation encoding 
*  Next, we will try a stronger, non linear model to try and incorporate more complexed relations between our features

### Random Forest Classifier

* We will evaluate the performance of a non linear random forest model over our two different encoding data sets 
* This model can be trained much faster due to options to parallelize the training of independent trees 
* We will run few different architectures of number of trees and evaluate them on our validation set 

In [17]:
from sklearn.ensemble import RandomForestClassifier
for i in range(10, 60, 10):
    rf_win = RandomForestClassifier(n_estimators=i, n_jobs=-1, verbose=1)
    rf_win.fit(X_train_win, y_train_win)
    rf_win_prediction = rf_win.predict(X_vald_win)
    print("----- Evaluating Random Forest model with {0} trees for {1} encoding ----".format(i, "sliding window"))
    win_processor.evaluate_results(y_vald_win, rf_win_prediction)

[Parallel(n_jobs=-1)]: Done   6 out of  10 | elapsed:   37.1s remaining:   24.8s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:   59.7s finished
[Parallel(n_jobs=8)]: Done   6 out of  10 | elapsed:    0.1s remaining:    0.1s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.2s finished


----- Evaluating Random Forest model with 10 trees for sliding window encoding ----
---- Printing classification report ----
             precision    recall  f1-score   support

        wlk       0.97      0.99      0.98     34388
        sit       1.00      1.00      1.00     33911
        std       1.00      1.00      1.00     30358
        ups       0.95      0.94      0.95     15940
        jog       0.99      0.98      0.98     13245
        dws       0.96      0.92      0.94     13084

avg / total       0.98      0.98      0.98    140926



[Parallel(n_jobs=-1)]: Done  20 out of  20 | elapsed:  1.7min finished
[Parallel(n_jobs=8)]: Done  20 out of  20 | elapsed:    0.3s finished


----- Evaluating Random Forest model with 20 trees for sliding window encoding ----
---- Printing classification report ----
             precision    recall  f1-score   support

        wlk       0.98      0.99      0.98     34388
        sit       1.00      1.00      1.00     33911
        std       1.00      1.00      1.00     30358
        ups       0.97      0.96      0.96     15940
        jog       0.99      0.98      0.99     13245
        dws       0.97      0.94      0.96     13084

avg / total       0.99      0.99      0.99    140926



[Parallel(n_jobs=-1)]: Done  30 out of  30 | elapsed:  2.2min finished
[Parallel(n_jobs=8)]: Done  30 out of  30 | elapsed:    0.4s finished


----- Evaluating Random Forest model with 30 trees for sliding window encoding ----
---- Printing classification report ----
             precision    recall  f1-score   support

        wlk       0.98      0.99      0.99     34388
        sit       1.00      1.00      1.00     33911
        std       1.00      1.00      1.00     30358
        ups       0.97      0.96      0.97     15940
        jog       0.99      0.99      0.99     13245
        dws       0.97      0.95      0.96     13084

avg / total       0.99      0.99      0.99    140926



[Parallel(n_jobs=-1)]: Done  40 out of  40 | elapsed:  2.9min finished
[Parallel(n_jobs=8)]: Done  40 out of  40 | elapsed:    0.6s finished


----- Evaluating Random Forest model with 40 trees for sliding window encoding ----
---- Printing classification report ----
             precision    recall  f1-score   support

        wlk       0.98      0.99      0.99     34388
        sit       1.00      1.00      1.00     33911
        std       1.00      1.00      1.00     30358
        ups       0.97      0.96      0.97     15940
        jog       0.99      0.99      0.99     13245
        dws       0.97      0.95      0.96     13084

avg / total       0.99      0.99      0.99    140926



[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:  3.1min
[Parallel(n_jobs=-1)]: Done  50 out of  50 | elapsed:  4.2min finished
[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    0.6s


----- Evaluating Random Forest model with 50 trees for sliding window encoding ----
---- Printing classification report ----
             precision    recall  f1-score   support

        wlk       0.98      0.99      0.99     34388
        sit       1.00      1.00      1.00     33911
        std       1.00      1.00      1.00     30358
        ups       0.97      0.96      0.97     15940
        jog       0.99      0.99      0.99     13245
        dws       0.97      0.96      0.97     13084

avg / total       0.99      0.99      0.99    140926



[Parallel(n_jobs=8)]: Done  50 out of  50 | elapsed:    0.8s finished


In [18]:
for i in range(10, 60, 10):
    rf_hist = RandomForestClassifier(n_estimators=i, n_jobs=-1, verbose=1)
    rf_hist.fit(X_train_hist, y_train_hist)
    rf_hist_prediction = rf_hist.predict(X_vald_hist)
    print("----- Evaluating Random Forest model with {0} trees for {1} encoding ----".format(i, "raw history"))
    hist_processor.evaluate_results(y_vald_hist, rf_hist_prediction)

[Parallel(n_jobs=-1)]: Done   6 out of  10 | elapsed:  1.2min remaining:   46.2s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:  1.9min finished
[Parallel(n_jobs=8)]: Done   6 out of  10 | elapsed:    0.1s remaining:    0.1s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.2s finished


----- Evaluating Random Forest model with 10 trees for raw history encoding ----
---- Printing classification report ----
             precision    recall  f1-score   support

        wlk       0.93      0.98      0.96     34516
        sit       1.00      1.00      1.00     33832
        std       1.00      1.00      1.00     30528
        ups       0.92      0.89      0.90     15533
        jog       0.98      0.97      0.97     13438
        dws       0.92      0.85      0.88     13079

avg / total       0.96      0.97      0.96    140926



[Parallel(n_jobs=-1)]: Done  20 out of  20 | elapsed:  3.2min finished
[Parallel(n_jobs=8)]: Done  20 out of  20 | elapsed:    0.4s finished


----- Evaluating Random Forest model with 20 trees for raw history encoding ----
---- Printing classification report ----
             precision    recall  f1-score   support

        wlk       0.95      0.99      0.97     34516
        sit       1.00      1.00      1.00     33832
        std       1.00      1.00      1.00     30528
        ups       0.94      0.91      0.93     15533
        jog       0.98      0.97      0.98     13438
        dws       0.94      0.90      0.92     13079

avg / total       0.97      0.97      0.97    140926



[Parallel(n_jobs=-1)]: Done  30 out of  30 | elapsed:  4.7min finished
[Parallel(n_jobs=8)]: Done  30 out of  30 | elapsed:    0.6s finished


----- Evaluating Random Forest model with 30 trees for raw history encoding ----
---- Printing classification report ----
             precision    recall  f1-score   support

        wlk       0.96      0.99      0.97     34516
        sit       1.00      1.00      1.00     33832
        std       1.00      1.00      1.00     30528
        ups       0.95      0.92      0.93     15533
        jog       0.99      0.97      0.98     13438
        dws       0.94      0.91      0.93     13079

avg / total       0.98      0.98      0.98    140926



[Parallel(n_jobs=-1)]: Done  40 out of  40 | elapsed:  6.2min finished
[Parallel(n_jobs=8)]: Done  40 out of  40 | elapsed:    0.8s finished


----- Evaluating Random Forest model with 40 trees for raw history encoding ----
---- Printing classification report ----
             precision    recall  f1-score   support

        wlk       0.96      0.99      0.97     34516
        sit       1.00      1.00      1.00     33832
        std       1.00      1.00      1.00     30528
        ups       0.95      0.92      0.93     15533
        jog       0.99      0.97      0.98     13438
        dws       0.94      0.91      0.93     13079

avg / total       0.98      0.98      0.98    140926



[Parallel(n_jobs=-1)]: Done  34 tasks      | elapsed:  6.1min
[Parallel(n_jobs=-1)]: Done  50 out of  50 | elapsed:  8.0min finished
[Parallel(n_jobs=8)]: Done  34 tasks      | elapsed:    0.7s


----- Evaluating Random Forest model with 50 trees for raw history encoding ----
---- Printing classification report ----
             precision    recall  f1-score   support

        wlk       0.96      0.99      0.97     34516
        sit       1.00      1.00      1.00     33832
        std       1.00      1.00      1.00     30528
        ups       0.95      0.93      0.94     15533
        jog       0.99      0.98      0.98     13438
        dws       0.94      0.92      0.93     13079

avg / total       0.98      0.98      0.98    140926



[Parallel(n_jobs=8)]: Done  50 out of  50 | elapsed:    1.0s finished


* As we can see the differences due to the number of estimators is pretty much negligible 
* Using our sliding window encoding performs slightly better, especially for the upstaris and downstairs labels 
* We will choose the more simple model with only 10 tree estimators and the sliding window encoding
* Next, we will anaylize the result of our best model so far over the test set portion we saved to ourselves (random forest with 10 trees using the sliding window encoding) 

**Evaluate best model on the test set**

In [19]:
rf = RandomForestClassifier(n_estimators=10, n_jobs=-1, verbose=1)
rf.fit(X_train_win, y_train_win)
rf_win_test_predictions = rf.predict(X_test_win)
win_processor.evaluate_results(y_test_win, rf_win_test_predictions)

[Parallel(n_jobs=-1)]: Done   6 out of  10 | elapsed:   46.1s remaining:   30.8s
[Parallel(n_jobs=-1)]: Done  10 out of  10 | elapsed:  1.2min finished
[Parallel(n_jobs=8)]: Done   6 out of  10 | elapsed:    0.3s remaining:    0.2s
[Parallel(n_jobs=8)]: Done  10 out of  10 | elapsed:    0.4s finished


---- Printing classification report ----
             precision    recall  f1-score   support

        wlk       0.96      0.99      0.98     68697
        sit       1.00      1.00      1.00     67600
        std       1.00      1.00      1.00     61418
        ups       0.96      0.94      0.95     31245
        jog       0.99      0.98      0.98     26781
        dws       0.97      0.93      0.94     26113

avg / total       0.98      0.98      0.98    281854



---
## Problems and the Need for "Real Data"

* As we can see, even on our left aside test data the performance of the model is too good to be true.
* This is a reason to suspect that although we tested our model on data was not used to train it, **we are over-fitting to the current data set as a whole**
* We suspect that **the generation process of the data was too "synthetic"**, not represeting "real world" data obtained from phone sensors
* We used this experiment framework to **extract real data obtained from our activities** during the day and labeled them accordingly 
* In the next notebook we will present our model performance on our "real world" data and try to train more complex models to improve our performance over the real world data 
