In [1]:
%load_ext autoreload
%autoreload 2

%matplotlib inline

In [2]:
%%html
<style>
table {float:left}
</style>

In [3]:
import pandas as pd

# Training Data

In [4]:
train_data = pd.read_csv('data/car_breakdown_train.tsv', sep='\t', header=0)
train_data.head()

Unnamed: 0,vehicleId,days,ecoMode,cityMode,sportMode,s1,s2,s3,s4,s5,...,s12,s13,s14,s15,s16,s17,s18,s19,s20,s21
0,1,1,-0.0007,-0.0004,100,518.67,641.82,1589.7,1400.6,14.62,...,521.66,2388.02,8138.62,8.4195,0.03,392,2388,100,39.06,23.419
1,1,2,0.0019,-0.0003,100,518.67,642.15,1591.82,1403.14,14.62,...,522.28,2388.07,8131.49,8.4318,0.03,392,2388,100,39.0,23.4236
2,1,3,-0.0043,0.0003,100,518.67,642.35,1587.99,1404.2,14.62,...,522.42,2388.03,8133.23,8.4178,0.03,390,2388,100,38.95,23.3442
3,1,4,0.0007,0.0,100,518.67,642.35,1582.79,1401.87,14.62,...,522.86,2388.08,8133.83,8.3682,0.03,392,2388,100,38.88,23.3739
4,1,5,-0.0019,-0.0002,100,518.67,642.37,1582.85,1406.22,14.62,...,522.19,2388.04,8133.8,8.4294,0.03,393,2388,100,38.9,23.4044


## How the training data is arranged.

|**Field**|**Description**|
|:---------|:---------------|
|**vechicleId**|unique id of the vehicle in the fleet|
|**days**|number of days passed so far|
|**ecoMode**|eco mode nob setting used for the day|
|**cityMode**|city mode nob setting used for the day|
|**sportMode**|sport mode nob setting used for the day|
|**s1**|reading form sensor 1|
|**s2**|reading form sensor 2|
|**s3**|reading form sensor 3|
| ... | ... |
|**s20**|reading form sensor 20|
|**s21**|reading form sensor 21|

The data is arranged as per above columns. Rows are grouped by **vehicleId**, with **days** in increasing order, representing the state of the car on that day, in a time series manner.

The last day of for the particular **vehicleId** is the day, when the state of it was so bad, that it broke down.
e.g. in the following case, **vehicleId** = 1, broke down on 192nd day

In [9]:
train_data[train_data["vehicleId"] == 1]

Unnamed: 0,vehicleId,days,ecoMode,cityMode,sportMode,s1,s2,s3,s4,s5,...,s12,s13,s14,s15,s16,s17,s18,s19,s20,s21
0,1,1,-0.0007,-0.0004,100,518.67,641.82,1589.70,1400.60,14.62,...,521.66,2388.02,8138.62,8.4195,0.03,392,2388,100,39.06,23.4190
1,1,2,0.0019,-0.0003,100,518.67,642.15,1591.82,1403.14,14.62,...,522.28,2388.07,8131.49,8.4318,0.03,392,2388,100,39.00,23.4236
2,1,3,-0.0043,0.0003,100,518.67,642.35,1587.99,1404.20,14.62,...,522.42,2388.03,8133.23,8.4178,0.03,390,2388,100,38.95,23.3442
3,1,4,0.0007,0.0000,100,518.67,642.35,1582.79,1401.87,14.62,...,522.86,2388.08,8133.83,8.3682,0.03,392,2388,100,38.88,23.3739
4,1,5,-0.0019,-0.0002,100,518.67,642.37,1582.85,1406.22,14.62,...,522.19,2388.04,8133.80,8.4294,0.03,393,2388,100,38.90,23.4044
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
187,1,188,-0.0067,0.0003,100,518.67,643.75,1602.38,1422.78,14.62,...,519.79,2388.23,8117.69,8.5207,0.03,396,2388,100,38.51,22.9588
188,1,189,-0.0006,0.0002,100,518.67,644.18,1596.17,1428.01,14.62,...,519.58,2388.33,8117.51,8.5183,0.03,395,2388,100,38.48,23.1127
189,1,190,-0.0027,0.0001,100,518.67,643.64,1599.22,1425.95,14.62,...,520.04,2388.35,8112.58,8.5223,0.03,398,2388,100,38.49,23.0675
190,1,191,0.0000,-0.0004,100,518.67,643.34,1602.36,1425.77,14.62,...,519.57,2388.30,8114.61,8.5174,0.03,394,2388,100,38.45,23.1295


# Test Data

In [6]:
test_data = pd.read_csv('data/car_breakdown_test.tsv', sep='\t', header=0)
test_data.head()

Unnamed: 0,vehicleId,days,ecoMode,cityMode,sportMode,s1,s2,s3,s4,s5,...,s12,s13,s14,s15,s16,s17,s18,s19,s20,s21
0,1,1,0.0023,0.0003,100,518.67,643.02,1585.29,1398.21,14.62,...,521.72,2388.03,8125.55,8.4052,0.03,392,2388,100,38.86,23.3735
1,1,2,-0.0027,-0.0003,100,518.67,641.71,1588.45,1395.42,14.62,...,522.16,2388.06,8139.62,8.3803,0.03,393,2388,100,39.02,23.3916
2,1,3,0.0003,0.0001,100,518.67,642.46,1586.94,1401.34,14.62,...,521.97,2388.03,8130.1,8.4441,0.03,393,2388,100,39.08,23.4166
3,1,4,0.0042,0.0,100,518.67,642.44,1584.12,1406.42,14.62,...,521.38,2388.05,8132.9,8.3917,0.03,391,2388,100,39.0,23.3737
4,1,5,0.0014,0.0,100,518.67,642.51,1587.19,1401.92,14.62,...,522.15,2388.03,8129.54,8.4031,0.03,390,2388,100,38.99,23.413


## How the test data is arranged.

Test data has the exactly same schema as the training data. Except the fact that the data doesn't represent when the failure has occurrs, in other words the last row for a given **vehicleId** doesn't represent the day of breakdown(it has happened earlier than the last row).


## Ground truth test data

The **Remaining Useful Day(RUL)** as calculated from the first day for a particular **vehicleId** is present in the ground truth file.

In [7]:
test_truth_data = pd.read_csv('data/car_breakdown_test_truth.tsv', sep='\t', header=0)
test_truth_data.head()

Unnamed: 0,vehicleId,RUL
0,1,112
1,2,98
2,3,69
3,4,82
4,5,91


e.g. The **vehicleId** = 1 in the test data can run another 112 days before it fails.