# The SPHERE Challenge: Activity Recognition with Multimodal Sensor Data



We are very excited to share our approach to win third prize The SPHERE Challenge. It was organized by [DrivenData](http://www.drivendata.org), the [ECML-PKDD](http://www.ecmlpkdd2016.org) conference and the [AARP](http://www.aarp.org/aarp-foundation/) foundation. 

In this post, we walk you through steps to work with the provied data (https://www.drivendata.org/competitions/42/data/) to generate best single model.

There are four main sections in this post: 

1. Requirements modules. 
2. Directory structure. 
3. Generate single model




# Requirements
- Python 2.7
- These are the library versions we worked with to produce our results. (requirements.txt)
- Hardware configuration:
    + Ubuntu 14.04
    + Memory: 16G
 

# Directory structure

```
├── README.md         
├── input
│   ├── public_data    <- The original data.
│   │   ├── accelerometer_axes.json
│   │   ├── access_point_names.json
│   │   ├── annotations.json
│   │   ├── pir_locations.json
│   │   ├── rooms.json
│   │   ├── sample_submission.csv
│   │   ├── train
│   │   │   ├── 00001
│   │   │   ├── ...
│   │   │   └── 00010
│   │   ├── test
│   │   │   ├── 00011
│   │   │   ├── ...
│   │   │   └── 00882
│   │   ├── video_feature_names.json
│   │   └── video_locations.json
│
├── sub               <- Submissions - predictions on test.
│
├── models             <- Trained models, predictions on train.
│
│
├── requirements.txt   <- The requirements file for reproducing.
│
├── src                <- Source code for use in this project.
│   │
│   ├── visualise_data.py <- Scripts to read raw data as sequence.
│   │
│   ├── feature_extraction_v5.py       <- Scripts to turn raw data into features for modeling.
│   ├── feature_extraction_v7.py       <- Scripts to turn raw data into features for modeling.
│   ├── feature_extraction_v8.py       <- Scripts to turn raw data into features for modeling.
│   │
│   ├── xgb_v18a.py <- Scripts to train level best single model and make predictions.
```

# Generate best single model

## Shortcut
You can run the following cell to run all source codes to carry out feature engineering, model training, and submission generating. If you want to run it one by one, please proceed to the next cell 'Extracting Features'.

## Extracting Features

In this cell, we will extract features.

In [5]:
%%bash
cd ./src/
python feature_extraction_v5.py    
python feature_extraction_v7.py
python feature_extraction_v8.py    

Extracting features from training data.

Starting feature extraction for train/00010
50    100   150   200   250   300   350   400   450   500  
550   600   650   700   750   800   850   900   950   1000 
1050  1100  1150  1200  1250  1300  1350  1400  1450  1500 
1550  1600  1650  1700  1750 
Finished feature extraction for train/00010

Starting feature extraction for train/00009
50    100   150   200   250   300   350   400   450   500  
550   600   650   700   750   800   850   900   950   1000 
1050  1100  1150  1200  1250  1300  1350  1400  1450  1500 
1550  1600  1650  1700  1750  1800 
Finished feature extraction for train/00009

Starting feature extraction for train/00002
50    100   150   200   250   300   350   400   450   500  
550   600   650   700   750   800   850   900   950   1000 
1050  1100  1150  1200  1250  1300  1350  1400  1450  1500 
1550  1600  1650 
Finished feature extraction for train/00002

Starting feature extraction for train/00008
50    100   150   200   

## Verifying Feature Set

In [2]:
%%bash
ls ./input/public_data/train/00001

acceleration.csv
annotations_0.csv
annotations_1.csv
annotations_2.csv
columns.csv
location_0.csv
location_1.csv
location_2.csv
meta.json
pir.csv
targets.csv
video_hallway.csv
video_kitchen.csv
video_living_room.csv


## Training and predicting

1. We do cross validation (fold = 10) and use those to predict for all train instances.
2. We train model on full data, and use those to predict for all test instances. 



In [6]:
%%bash
cd ./src/
python xgb_v18a.py #feature v5, v7, v8

[0]	train-mlogloss:2.9492	val-mlogloss:2.94535
training time=  32.0644490719
testing time =  0.204326868057
[0]	train-mlogloss:2.947	val-mlogloss:2.96205
training time=  42.9950039387
testing time =  0.216506004333
[0]	train-mlogloss:2.94737	val-mlogloss:2.95932
training time=  43.2643151283
testing time =  0.265043973923
[0]	train-mlogloss:2.9483	val-mlogloss:2.95202
training time=  41.272758007
testing time =  0.246116876602
[0]	train-mlogloss:2.94823	val-mlogloss:2.94999
training time=  41.8277161121
testing time =  0.217398881912
[0]	train-mlogloss:2.94674	val-mlogloss:2.95841
training time=  41.4844288826
testing time =  0.239130020142
[0]	train-mlogloss:2.94827	val-mlogloss:2.94881
training time=  41.1928610802
testing time =  0.208256006241
[0]	train-mlogloss:2.94687	val-mlogloss:2.95997
training time=  41.6011600494
testing time =  0.219200849533
[0]	train-mlogloss:2.94761	val-mlogloss:2.95744
training time=  38.0068900585
testing time =  0.126241922379
[0]	train-mlogloss:2.947

2017-02-26 16:22:12,425   INFO   Loading data - 2017-02-26 16:22:12.425845
2017-02-26 16:22:16,135   INFO   Training data shapes:
2017-02-26 16:22:16,135   INFO   train_x.shape: (16124, 1402)
2017-02-26 16:22:16,135   INFO   train_y.shape: (16124, 20)
2017-02-26 16:22:17,374   INFO   Building model - 2017-02-26 16:22:17.374595
2017-02-26 16:22:17,374   INFO   Modelling with ntrees: 1
2017-02-26 16:22:17,374   INFO   Modelling with 5608 features ...
2017-02-26 16:22:17,374   INFO   Cross validation... 
2017-02-26 16:22:49,944   INFO   fold #0: 0.262713
2017-02-26 16:23:34,226   INFO   fold #1: 0.293333
2017-02-26 16:24:18,746   INFO   fold #2: 0.303516
2017-02-26 16:25:01,096   INFO   fold #3: 0.300505
2017-02-26 16:25:43,958   INFO   fold #4: 0.294613
2017-02-26 16:26:26,484   INFO   fold #5: 0.298577
2017-02-26 16:27:08,731   INFO   fold #6: 0.325428
2017-02-26 16:27:51,426   INFO   fold #7: 0.317490
2017-02-26 16:28:30,440   INFO   fold #8: 0.317672
2017-02-26 16:29:05,541   INFO   f

## Verifying Models

In [7]:
%%bash
ls ./models

xgb_v18.val.txt


## Verifying Final Submission

In [8]:
%%bash
ls ./sub

xgb_v18.tst.csv
xgb_v18.valtst.csv
