## The Competition

Robots are smart… by design. To fully understand and properly navigate a task, however, they need input about their environment.

In this competition, you’ll help robots recognize the floor surface they’re standing on using data collected from Inertial Measurement Units (IMU sensors).

We’ve collected IMU sensor data while driving a small mobile robot over different floor surfaces on the university premises. The task is to predict which one of the nine floor types (carpet, tiles, concrete) the robot is on using sensor data such as acceleration and velocity. Succeed and you'll help improve the navigation of robots without assistance across many different surfaces, so they won’t fall down on the job.

### Libraries

In [1]:
import pandas as pd

### Load & check out the data

In [2]:
# Load sensor data
X_train = pd.read_csv('data/X_train.csv')

In [15]:
X_train.head(135)

Unnamed: 0,row_id,series_id,measurement_number,orientation_X,orientation_Y,orientation_Z,orientation_W,angular_velocity_X,angular_velocity_Y,angular_velocity_Z,linear_acceleration_X,linear_acceleration_Y,linear_acceleration_Z
0,0_0,0,0,-0.75853,-0.63435,-0.104880,-0.10597,0.107650,0.017561,0.000767,-0.748570,2.103000,-9.7532
1,0_1,0,1,-0.75853,-0.63434,-0.104900,-0.10600,0.067851,0.029939,0.003385,0.339950,1.506400,-9.4128
2,0_2,0,2,-0.75853,-0.63435,-0.104920,-0.10597,0.007275,0.028934,-0.005978,-0.264290,1.592200,-8.7267
3,0_3,0,3,-0.75852,-0.63436,-0.104950,-0.10597,-0.013053,0.019448,-0.008974,0.426840,1.099300,-10.0960
4,0_4,0,4,-0.75852,-0.63435,-0.104950,-0.10596,0.005135,0.007652,0.005245,-0.509690,1.468900,-10.4410
5,0_5,0,5,-0.75853,-0.63439,-0.104830,-0.10580,0.059664,0.013043,-0.013231,-0.447450,0.992810,-10.4020
6,0_6,0,6,-0.75853,-0.63441,-0.104810,-0.10569,0.082140,0.044356,-0.002696,-0.141630,0.734970,-9.4296
7,0_7,0,7,-0.75852,-0.63444,-0.104800,-0.10561,0.056218,0.038162,-0.022931,-0.121600,0.075417,-8.6088
8,0_8,0,8,-0.75851,-0.63445,-0.104850,-0.10559,-0.012846,0.039004,-0.007831,1.600000,0.816110,-7.6426
9,0_9,0,9,-0.75851,-0.63443,-0.104890,-0.10567,-0.090082,0.027299,-0.009970,0.474960,0.909600,-8.8120


In [12]:
X_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 487680 entries, 0 to 487679
Data columns (total 13 columns):
row_id                   487680 non-null object
series_id                487680 non-null int64
measurement_number       487680 non-null int64
orientation_X            487680 non-null float64
orientation_Y            487680 non-null float64
orientation_Z            487680 non-null float64
orientation_W            487680 non-null float64
angular_velocity_X       487680 non-null float64
angular_velocity_Y       487680 non-null float64
angular_velocity_Z       487680 non-null float64
linear_acceleration_X    487680 non-null float64
linear_acceleration_Y    487680 non-null float64
linear_acceleration_Z    487680 non-null float64
dtypes: float64(10), int64(2), object(1)
memory usage: 48.4+ MB


**X_[train/test].csv** - the input data, covering 10 sensor channels and 128 measurements per time series plus three ID columns:

* row_id: The ID for this row.
* series_id: ID number for the measurement series. Foreign key to y_train/sample_submission.
* measurement_number: Measurement number within the series.

The **10 sensor channels** are:

The **orientation** channels encode the current angles how the robot is oriented as a quaternion
* orientation_X
* orientation_Y
* orientation_Z
* orientation_W

**Angular velocity** describes the angle and speed of motion
* angular_velocity_X
* angular_velocity_Y
* angular_velocity_Z

**Linear acceleration** components describe how the speed is changing at different times
* linear_acceleration_X
* linear_acceleration_Y
* linear_acceleration_Z

**_Note_**  
In the context of relational databases, a **foreign key** is a field (or collection of fields) in one table that uniquely identifies a row of another table or the same table.  
Source [Wikipedia - Foreign key](https://en.wikipedia.org/wiki/Foreign_key)

In [6]:
# Load label data
y_train = pd.read_csv('data/y_train.csv')

In [7]:
y_train.head()

Unnamed: 0,series_id,group_id,surface
0,0,13,fine_concrete
1,1,31,concrete
2,2,20,concrete
3,3,31,concrete
4,4,22,soft_tiles


In [13]:
y_train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3810 entries, 0 to 3809
Data columns (total 3 columns):
series_id    3810 non-null int64
group_id     3810 non-null int64
surface      3810 non-null object
dtypes: int64(2), object(1)
memory usage: 89.4+ KB


**y_train.csv** - the surfaces for training set.

* series_id: ID number for the measurement series.
* group_id: ID number for all of the measurements taken in a recording session. Provided for the training set only, to enable more cross validation strategies.
* surface: the target for this competition.

### Understanding the Data

#### Simplified Problem Description

A scientist walks a robot every day for a cetain period of time on different floor types.  
During each "walk" different kinds of sensor data is collected (The data comes from 10 sensors).  
Each "walk" is $n$ steps (time intervals $\Delta t$) long where after each interval each sensor records a datum.  
The scientist walks the robot for $d$ days and after each day annotates (labels) the collected sensor data set with the corresponding floor type.

So our scientist should have $n * d$ data points in total.  

Each "walk" can be described as a Time Series.  
We know from the description of the dataset that

* $n = 128$ from **X_train**
* $d = 3810$ from **y_train**

So the number of total data points should be

$$
n * d = 128 * 3810 = 487,680
$$

which is the number of data point in **X_train**.

In [9]:
sample_submission = pd.read_csv('data/sample_submission.csv')

In [10]:
sample_submission.head()

Unnamed: 0,series_id,surface
0,0,concrete
1,1,concrete
2,2,concrete
3,3,concrete
4,4,concrete


In [11]:
sample_submission.tail()

Unnamed: 0,series_id,surface
3811,3811,concrete
3812,3812,concrete
3813,3813,concrete
3814,3814,concrete
3815,3815,concrete


### Questions to answer

### EDA

### Solution Approach

* What kind of problem?
* Which solutions could be used?