# The SPHERE Challenge: Activity Recognition with Multimodal Sensor Data



We are very excited to share our approach to win third prize The SPHERE Challenge. It was organized by [DrivenData](http://www.drivendata.org), the [ECML-PKDD](http://www.ecmlpkdd2016.org) conference and the [AARP](http://www.aarp.org/aarp-foundation/) foundation. 

In this post, we walk you through steps to work with the provied data (https://www.drivendata.org/competitions/42/data/) to generate best single model.

There are four main sections in this post: 

1. Requirements modules. 
2. Directory structure. 
3. Generate single model




# Requirements
- Python 2.7
- These are the library versions we worked with to produce our results. (requirements.txt)
- Hardware configuration:
    + Ubuntu 14.04
    + Memory: 16G
 

# Directory structure

```
├── README.md         
├── input
│   ├── public_data    <- The original data.
│   │   ├── accelerometer_axes.json
│   │   ├── access_point_names.json
│   │   ├── annotations.json
│   │   ├── pir_locations.json
│   │   ├── rooms.json
│   │   ├── sample_submission.csv
│   │   ├── train
│   │   │   ├── 00001
│   │   │   ├── ...
│   │   │   └── 00010
│   │   ├── test
│   │   │   ├── 00011
│   │   │   ├── ...
│   │   │   └── 00882
│   │   ├── video_feature_names.json
│   │   └── video_locations.json
│
├── sub               <- Submissions - predictions on test.
│
├── models             <- Trained models, predictions on train.
││
│
├── requirements.txt   <- The requirements file for reproducing.
│
├── src                <- Source code for use in this project.
│   │
│   ├── visualise_data.py <- Scripts to read raw data as sequence.
│   │
│   ├── feature_extraction_v5.py       <- Scripts to turn raw data into features for modeling.
│   ├── feature_extraction_v7.py       <- Scripts to turn raw data into features for modeling.
│   ├── feature_extraction_v8.py       <- Scripts to turn raw data into features for modeling.
│   │
│   ├── xgb_v18a.py <- Scripts to train level best single model and make predictions.

# Generate best single model

## Shortcut
You can run the following cell to run all source codes to carry out feature engineering, model training, and submission generating. If you want to run it one by one, please proceed to the next cell 'Extracting Features'.

## Extracting Features

In this cell, we will extract features.

In [5]:
%%bash
cd ./src/
python feature_extraction_v5.py    
python feature_extraction_v7.py
python feature_extraction_v8.py    

Extracting features from training data.

Starting feature extraction for train/00010
50    100   150   200   250   300   350   400   450   500  
550   600   650   700   750   800   850   900   950   1000 
1050  1100  1150  1200  1250  1300  1350  1400  1450  1500 
1550  1600  1650  1700  1750 
Finished feature extraction for train/00010

Starting feature extraction for train/00009
50    100   150   200   250   300   350   400   450   500  
550   600   650   700   750   800   850   900   950   1000 
1050  1100  1150  1200  1250  1300  1350  1400  1450  1500 
1550  1600  1650  1700  1750  1800 
Finished feature extraction for train/00009

Starting feature extraction for train/00002
50    100   150   200   250   300   350   400   450   500  
550   600   650   700   750   800   850   900   950   1000 
1050  1100  1150  1200  1250  1300  1350  1400  1450  1500 
1550  1600  1650 
Finished feature extraction for train/00002

Starting feature extraction for train/00008
50    100   150   200   

## Verifying Feature Set

In [6]:
%%bash
ls ./input/public_data/train/00001

acceleration.csv
annotations_0.csv
annotations_1.csv
annotations_2.csv
columns.csv
columns_v11.csv
columns_v14.csv
columns_v16.csv
columns_v17.csv
columns_v3.csv
columns_v4.csv
columns_v5.csv
columns_v6.csv
columns_v7.csv
columns_v8.csv
columns_v9.csv
location_0.csv
location_1.csv
location_2.csv
meta.json
pir.csv
targets.csv
video_hallway.csv
video_kitchen.csv
video_living_room.csv


## Training and predicting

1. We do cross validation (fold = 10) and use those to predict for all train instances.
2. We train model on full data, and use those to predict for all test instances. 



In [7]:
%%bash
cd ./src/
python xgb_v18a.py #feature v5, v7, v8

training time=  52.8491611481
testing time =  9.2025270462
training time=  81.6508729458
testing time =  16.639248848
[0]	train-mlogloss:2.95193	val-mlogloss:2.96467
training time=  11.4690439701
testing time =  0.135967969894
[0]	train-mlogloss:2.95209	val-mlogloss:2.95961
training time=  11.3325390816
testing time =  0.128569126129
[0]	train-mlogloss:2.95227	val-mlogloss:2.96131
training time=  11.4145541191
testing time =  0.128687858582
[0]	train-mlogloss:2.95184	val-mlogloss:2.95841
training time=  11.365776062
testing time =  0.134619951248
[0]	train-mlogloss:2.95246	val-mlogloss:2.96212
training time=  11.4367358685
testing time =  0.131473064423
[0]	train-mlogloss:2.95236	val-mlogloss:2.96458
training time=  12.1367428303
testing time =  0.138630151749
[0]	train-mlogloss:2.95271	val-mlogloss:2.95857
training time=  11.9255521297
testing time =  0.134116172791
[0]	train-mlogloss:2.95142	val-mlogloss:2.96067
training time=  11.8417429924
testing time =  0.135874032974
[0]	train-m

2016-09-27 13:53:35,742   INFO   Loading data - 2016-09-27 13:53:35.742242
2016-09-27 13:53:37,442   INFO   Training data shapes:
2016-09-27 13:53:37,442   INFO   train_x.shape: (16124, 861)
2016-09-27 13:53:37,442   INFO   train_y.shape: (16124, 20)
2016-09-27 13:53:37,731   INFO   Pre processing data - 2016-09-27 13:53:37.731456
2016-09-27 13:53:37,731   INFO   Building model - 2016-09-27 13:53:37.731633
2016-09-27 13:53:37,875   INFO   Cross validation... 
Traceback (most recent call last):
  File "/home/quy/anaconda/lib/python2.7/logging/__init__.py", line 861, in emit
    msg = self.format(record)
  File "/home/quy/anaconda/lib/python2.7/logging/__init__.py", line 734, in format
    return fmt.format(record)
  File "/home/quy/anaconda/lib/python2.7/logging/__init__.py", line 465, in format
    record.message = record.getMessage()
  File "/home/quy/anaconda/lib/python2.7/logging/__init__.py", line 329, in getMessage
    msg = msg % self.args
TypeError: not all arguments converted d

## Verifying Models

In [16]:
%%bash
ls ./models

knn_v5.val.txt
nn_v20.val.txt
rf_v13.val.txt
sgd_esb_v20.val.txt
xgb_v10.val.txt
xgb_v14.val.txt
xgb_v16.val.txt
xgb_v19.val.txt
xgb_v21.val.txt
xgb_v24.val.txt
xgb_v5.val.txt


## Verifying Final Submission

In [14]:
%%bash
ls ./sub

ens13.csv
knn_v5.tst.csv
nn_esb_v20.tst.csv
nn_v20.tst.csv
rf_v13.tst.csv
sgd_esb_v20.tst.csv
xgb_esb_v25.tst.csv
xgb_esb_v3.tst.csv
xgb_esb_v4.tst.csv
xgb_esb_v5.tst.csv
xgb_esb_v7.tst.csv
xgb_v10.tst.csv
xgb_v11.tst.csv
xgb_v12.tst.csv
xgb_v14.tst.csv
xgb_v14.valtst.csv
xgb_v15.tst.csv
xgb_v15.valtst.csv
xgb_v16.tst.csv
xgb_v16.valtst.csv
xgb_v18.tst.csv
xgb_v18.valtst.csv
xgb_v19.tst.csv
xgb_v19.valtst.csv
xgb_v20.tst.csv
xgb_v21.tst.csv
xgb_v21.valtst.csv
xgb_v22.tst.csv
xgb_v22.valtst.csv
xgb_v24.tst.csv
xgb_v25.tst.csv
xgb_v5.tst.csv
xgb_v8.tst.csv
