In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import Imputer
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import VarianceThreshold
from sklearn.feature_selection import SelectFromModel
from sklearn.utils import shuffle
from sklearn.ensemble import RandomForestClassifier
import pickle
%matplotlib inline

In [2]:
# Load full database
df = pickle.load(open("../data/full_dataset.p", "rb"))

### Data at first sight

Here is an excerpt of the the data description from the file README.txt:

* Dataset Description: Inertial Measurement Unit Fall Detection Dataset (IMU Dataset)

* IMU Dataset is a dataset devised to benchmark fall detection and prediction algorithms based on <font color='red'>**acceleration, angular velocity and magnetic fields**</font> of body-worn APDM Opal IMU sensors at 7 body locations (right ankle, left ankle, right thigh, left thigh, head, sternum, and waist).

* Each one of the 10 subject underwent 60 trials (15 Activity of Daily Livings - ADLs, 24 Falls, and 15 Near Falls)

#### Dataset columns
- Time: timestamp (the number of microseconds that has elapsed since 1 January 1970), unit = uS
- r.ankle Acceleration X (m/s^2): Right ankle's acceleration along X axis, unit = m/s^2
- r.ankle Acceleration Y (m/s^2): Right ankle's acceleration along Y axis, unit = m/s^2
- r.ankle Acceleration Z (m/s^2): Right ankle's acceleration along Z axis, unit = m/s^2
- r.ankle Angular Velocity X (rad/s): Right ankle's angular velocity along X axis, unit = rad/s
- r.ankle Angular Velocity Y (rad/s): Right ankle's angular velocity along Y axis, unit = rad/s
- r.ankle Angular Velocity Z (rad/s): Right ankle's angular velocity along Z axis, unit = rad/s
- r.ankle Magnetic Field X (uT): Right ankle's magnetic field along X axis, unit = uT
- r.ankle Magnetic Field Y (uT): Right ankle's magnetic field along Y axis, unit = uT
- r.ankle Magnetic Field Z (uT): Right ankle's magnetic field along Z axis, unit = uT
- l.ankle Acceleration X (m/s^2): Left ankle's acceleration along X axis, unit = m/s^2
- l.ankle Acceleration Y (m/s^2): Left ankle's acceleration along Y axis, unit = m/s^2
- l.ankle Acceleration Z (m/s^2): Left ankle's acceleration along Z axis, unit = m/s^2
- l.ankle Angular Velocity X (rad/s): Left ankle's angular velocity along X axis, unit = rad/s
- l.ankle Angular Velocity Y (rad/s): Left ankle's angular velocity along Y axis, unit = rad/s
- l.ankle Angular Velocity Z (rad/s): Left ankle's angular velocity along Z axis, unit = rad/s
- l.ankle Magnetic Field X (uT): Left ankle's magnetic field along X axis, unit = uT
- l.ankle Magnetic Field Y (uT): Left ankle's magnetic field along Y axis, unit = uT
- l.ankle Magnetic Field Z (uT): Left ankle's magnetic field along Z axis, unit = uT
- r.thigh Acceleration X (m/s^2): Right thigh's acceleration along X axis, unit = m/s^2
- r.thigh Acceleration Y (m/s^2): Right thigh's acceleration along Y axis, unit = m/s^2
- r.thigh Acceleration Z (m/s^2): Right thigh's acceleration along Z axis, unit = m/s^2
- r.thigh Angular Velocity X (rad/s): Right thigh's angular velocity along X axis, unit = rad/s
- r.thigh Angular Velocity Y (rad/s): Right thigh's angular velocity along Y axis, unit = rad/s
- r.thigh Angular Velocity Z (rad/s): Right thigh's angular velocity along Z axis, unit = rad/s
- r.thigh Magnetic Field X (uT): Right thigh's magnetic field along X axis, unit = uT
- r.thigh Magnetic Field Y (uT): Right thigh's magnetic field along Y axis, unit = uT
- r.thigh Magnetic Field Z (uT): Right thigh's magnetic field along Z axis, unit = uT
- l.thigh Acceleration X (m/s^2): Left thigh's acceleration along X axis, unit = m/s^2
- l.thigh Acceleration Y (m/s^2): Left thigh's acceleration along Y axis, unit = m/s^2
- l.thigh Acceleration Z (m/s^2): Left thigh's acceleration along Z axis, unit = m/s^2
- l.thigh Angular Velocity X (rad/s): Left thigh's angular velocity along X axis, unit = rad/s
- l.thigh Angular Velocity Y (rad/s): Left thigh's angular velocity along Y axis, unit = rad/s
- l.thigh Angular Velocity Z (rad/s): Left thigh's angular velocity along Z axis, unit = rad/s
- l.thigh Magnetic Field X (uT): Left thigh's magnetic field along X axis, unit = uT
- l.thigh Magnetic Field Y (uT): Left thigh's magnetic field along Y axis, unit = uT
- l.thigh Magnetic Field Z (uT): Left thigh's magnetic field along Z axis, unit = uT
- head Acceleration X (m/s^2): Head's acceleration along X axis, unit = m/s^2
- head Acceleration Y (m/s^2): Head's acceleration along Y axis, unit = m/s^2
- head Acceleration Z (m/s^2): Head's acceleration along Z axis, unit = m/s^2
- head Angular Velocity X (rad/s): Head's angular velocity along X axis, unit = rad/s
- head Angular Velocity Y (rad/s): Head's angular velocity along Y axis, unit = rad/s
- head Angular Velocity Z (rad/s): Head's angular velocity along Z axis, unit = rad/s
- head Magnetic Field X (uT): Head's magnetic field along X axis, unit = uT
- head Magnetic Field Y (uT): Head's magnetic field along Y axis, unit = uT
- head Magnetic Field Z (uT): Head's magnetic field along Z axis, unit = uT
- sternum Acceleration X (m/s^2): Sternum's acceleration along X axis, unit = m/s^2
- sternum Acceleration Y (m/s^2): Sternum's acceleration along Y axis, unit = m/s^2
- sternum Acceleration Z (m/s^2): Sternum's acceleration along Z axis, unit = m/s^2
- sternum Angular Velocity X (rad/s): Sternum's angular velocity along X axis, unit = rad/s
- sternum Angular Velocity Y (rad/s): Sternum's angular velocity along Y axis, unit = rad/s
- sternum Angular Velocity Z (rad/s): Sternum's angular velocity along Z axis, unit = rad/s
- sternum Magnetic Field X (uT): Sternum's magnetic field along X axis, unit = uT
- sternum Magnetic Field Y (uT): Sternum's magnetic field along Y axis, unit = uT
- sternum Magnetic Field Z (uT): Sternum's magnetic field along Z axis, unit = uT
- waist Acceleration X (m/s^2): Waist's acceleration along X axis, unit = m/s^2
- waist Acceleration Y (m/s^2): Waist's acceleration along Y axis, unit = m/s^2
- waist Acceleration Z (m/s^2): Waist's acceleration along Z axis, unit = m/s^2
- waist Angular Velocity X (rad/s): Waist's angular velocity along X axis, unit = rad/s
- waist Angular Velocity Y (rad/s): Waist's angular velocity along Y axis, unit = rad/s
- waist Angular Velocity Z (rad/s): Waist's angular velocity along Z axis, unit = rad/s
- waist Magnetic Field X (uT): Waist's magnetic field along X axis, unit = uT
- waist Magnetic Field Y (uT): Waist's magnetic field along Y axis, unit = uT
- waist Magnetic Field Z (uT): Waist's magnetic field along Z axis, unit = uT

#### Notes
- Magnetic field data from two sensors (at sternum and waist) seems to be more noisy compared to other sensors.

#### Units
- Aceleration: $m/s^2$
- Angular Velocity: rad/s
- Magnectic Field: uT <font color='red'>(microTesla?)</font>


### Checking duplicates

In [3]:
print(df.shape)
df = df.drop_duplicates()
print(df.shape)

(1190369, 67)
(1190369, 67)


##### Conclusion: No duplicates

### Overview of the dataset

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1190369 entries, 0 to 1920
Data columns (total 67 columns):
Time                                  1190369 non-null int64
r.ankle Acceleration X (m/s^2)        1190369 non-null float64
r.ankle Acceleration Y (m/s^2)        1190369 non-null float64
r.ankle Acceleration Z (m/s^2)        1190369 non-null float64
r.ankle Angular Velocity X (rad/s)    1190369 non-null float64
r.ankle Angular Velocity Y (rad/s)    1190369 non-null float64
r.ankle Angular Velocity Z (rad/s)    1190369 non-null float64
r.ankle Magnetic Field X (uT)         1190369 non-null float64
r.ankle Magnetic Field Y (uT)         1190369 non-null float64
r.ankle Magnetic Field Z (uT)         1190369 non-null float64
l.ankle Acceleration X (m/s^2)        1190369 non-null float64
l.ankle Acceleration Y (m/s^2)        1190369 non-null float64
l.ankle Acceleration Z (m/s^2)        1190369 non-null float64
l.ankle Angular Velocity X (rad/s)    1190369 non-null float64
l.ankle An

### Defining Target Variable

##### target = 1 if Trial Type is equal to 'Falls' and target = 0 otherwise (Trial Type = ADLs or Near_Falls)

# <font color='red'> EVALUATE: Do we need to differenciate ADLs and Near Falls? For the primary purpose of the project I think we don't, but if we are going to do the 'if students have time...' part, we will need it </font>

In [5]:
list(df['Trial Type'].unique())

['ADLs', 'Falls', 'Near_Falls']

In [6]:
df['target'] = np.where(df['Trial Type'] == 'Falls', 1, 0)

### Metadata

To facilitate the data management, we'll store meta-information about the variables in a DataFrame. This will be helpful when we want to select specific variables for analysis, visualization, modeling, ...

Concretely we will store:

* Body Location: r.ankle, l.ankle, r.thigh, l.thigh, head, sternum, waist
* Axes: X, Y, Z
* Unit: m/s^2, rad/s, uT
* Measurements: aceleration, angular Velocity, magnetic field
* dtype: int, float, str



In [18]:
data = []
for f in df.columns:

    measure = ''
    # Defining the measure
    if 'Acceleration' in f:
        measure = 'acceleration'
    elif 'Angular Velocity' in f:
        measure = 'angular velocity'
    elif 'Magnetic Field' in f:
        measure = 'magnetic field'
         
    # Defining the body location
    body_location = ''
    if 'r.ankle' in f:
        body_location = 'r.ankle'
    elif 'l.ankle' in f:
        body_location = 'l.ankle'
    elif 'r.thigh' in f:
        body_location = 'r.thigh'
    elif 'l.thigh' in f:
        body_location = 'l.thigh'
    elif 'head' in f:
        body_location = 'head'
    elif 'sternum' in f:
        body_location = 'sternum'
    elif 'waist' in f:
        body_location = 'waist'

    axis = ''
    # Defining the Axes
    if ' X ' in f:
        axis = 'X'
    elif ' Y ' in f:
        axis = 'Y'
    elif ' Z ' in f:
        axis = 'Z'       
        
    unit = ''
    # Defining the Axes
    if 'rad/s' in f:
        unit = 'rad/s'
    elif 'm/s^2' in f:
        unit = 'm/s^2'
    elif 'uT' in f:
        unit = 'uT'           
    
    # Defining the data type 
    dtype = df[f].dtype
    
    # Creating a Dict that contains all the metadata for the variable
    f_dict = {
        'varname': f,
        'body_location': body_location,
        'axis': axis,
        'unit': unit,
        'measure': measure,
        'dtype': dtype
    }
    data.append(f_dict)
    
meta = pd.DataFrame(data, columns=['varname', 'body_location', 'axis', 'unit', 'measure', 'dtype'])
meta.set_index('varname', inplace=True)

In [19]:
# save the dataframe
pickle.dump(meta, open("metadata.p", "wb"))

### Check Min and Max values per feature

Conclusion: Necessary to standardize the data
<font color='red'>Question: Should we standardize before or after dividing the dataset in train and test?</font>

In [13]:
data = []
for f in df.columns[1:64]:
    column = f
    min_value = df[f].min()
    max_value = df[f].max()
    f_dict = {
        'column': f,
        'min_value': min_value,
        'max_value': max_value
    }
    data.append(f_dict)

min_max_df = pd.DataFrame(data, columns=['column','min_value','max_value'])
min_max_df

Unnamed: 0,column,min_value,max_value
0,r.ankle Acceleration X (m/s^2),-74.744122,73.109635
1,r.ankle Acceleration Y (m/s^2),-72.541823,73.010028
2,r.ankle Acceleration Z (m/s^2),-72.207326,73.885095
3,r.ankle Angular Velocity X (rad/s),-31.686683,37.780859
4,r.ankle Angular Velocity Y (rad/s),-12.539948,18.571146
5,r.ankle Angular Velocity Z (rad/s),-16.058934,22.525777
6,r.ankle Magnetic Field X (uT),-46.939510,112.029493
7,r.ankle Magnetic Field Y (uT),-81.755597,87.859931
8,r.ankle Magnetic Field Z (uT),-66.438300,83.543245
9,l.ankle Acceleration X (m/s^2),-73.692398,70.356544


### Handling imbalanced classes

Checking the class balancing

In [18]:
df.target.value_counts()

0    786967
1    403402
Name: target, dtype: int64

As we can see, the class values are not balanced. This can lead to a model that has great accuracy but does have any added value in practice. Two possible strategies to deal with this problem are:

* oversampling records with target=1
* undersampling records with target=0

# EVALUATE IF WE ARE GOING TO UNDER/OVERSAMPLE IT. THIS MAY WORK FOR THE FALL/NON FALL CLASSIFICATION, BUT FOR THE FALL PREDICTION WE WILL NEED THE WHOLE DATASET (3 CLASSES)