In [1]:
import pandas as pd

In [2]:
faults = pd.read_csv("../data/J1939Faults.csv", nrows = 100)
faults.head()

Unnamed: 0,RecordID,ESS_Id,EventTimeStamp,eventDescription,actionDescription,ecuSoftwareVersion,ecuSerialNumber,ecuModel,ecuMake,ecuSource,spn,fmi,active,activeTransitionCount,faultValue,EquipmentID,MCTNumber,Latitude,Longitude,LocationTimeStamp
0,1,990349,2015-02-21 10:47:13.000,Low (Severity Low) Engine Coolant Level,,unknown,unknown,unknown,unknown,0,111,17,True,2,,1439,105354361,38.857638,-84.626851,2015-02-21 11:34:25.000
1,2,990360,2015-02-21 11:34:34.000,,,unknown,unknown,unknown,unknown,11,629,12,True,127,,1439,105354361,38.857638,-84.626851,2015-02-21 11:35:10.000
2,3,990364,2015-02-21 11:35:31.000,Incorrect Data Steering Wheel Angle,,unknown,unknown,unknown,unknown,11,1807,2,False,127,,1369,105336226,41.42125,-87.767361,2015-02-21 11:35:26.000
3,4,990370,2015-02-21 11:35:33.000,Incorrect Data Steering Wheel Angle,,unknown,unknown,unknown,unknown,11,1807,2,True,127,,1369,105336226,41.421018,-87.767361,2015-02-21 11:36:08.000
4,5,990416,2015-02-21 11:39:41.000,,,22281684P01*22357957P01*22362082P01*,13063430,0USA13_13_0415_2238A,VOLVO,0,4364,17,False,2,,1674,105427130,38.416481,-89.442638,2015-02-21 11:39:37.000


In [21]:
drop_list = ['ESS_Id', 
             'actionDescription', 
             'ecuSoftwareVersion', 
             'ecuSerialNumber', 
             'ecuModel', 
             'ecuMake', 
             'ecuSource', 
             'faultValue',
             'Latitude',
             'Longitude', 	
             'MCTNumber']

faults_filtered = faults.drop(columns=drop_list)
faults_filtered.head()

Unnamed: 0,RecordID,EventTimeStamp,eventDescription,spn,fmi,active,activeTransitionCount,EquipmentID,LocationTimeStamp
0,1,2015-02-21 10:47:13.000,Low (Severity Low) Engine Coolant Level,111,17,True,2,1439,2015-02-21 11:34:25.000
1,2,2015-02-21 11:34:34.000,,629,12,True,127,1439,2015-02-21 11:35:10.000
2,3,2015-02-21 11:35:31.000,Incorrect Data Steering Wheel Angle,1807,2,False,127,1369,2015-02-21 11:35:26.000
3,4,2015-02-21 11:35:33.000,Incorrect Data Steering Wheel Angle,1807,2,True,127,1369,2015-02-21 11:36:08.000
4,5,2015-02-21 11:39:41.000,,4364,17,False,2,1674,2015-02-21 11:39:37.000


In [23]:
var_filter = 1439
filter = faults_filtered[faults_filtered['EquipmentID'] == var_filter]
filter.head()

Unnamed: 0,RecordID,EventTimeStamp,eventDescription,spn,fmi,active,activeTransitionCount,EquipmentID,LocationTimeStamp
0,1,2015-02-21 10:47:13.000,Low (Severity Low) Engine Coolant Level,111,17,True,2,1439,2015-02-21 11:34:25.000
1,2,2015-02-21 11:34:34.000,,629,12,True,127,1439,2015-02-21 11:35:10.000
11,12,2015-02-21 11:43:18.000,Low (Severity Low) Engine Coolant Level,111,17,False,2,1439,2015-02-21 11:43:13.000


In [5]:
faults.loc[0]

RecordID                                                       1
ESS_Id                                                    990349
EventTimeStamp                           2015-02-21 10:47:13.000
eventDescription         Low (Severity Low) Engine Coolant Level
actionDescription                                            NaN
ecuSoftwareVersion                                       unknown
ecuSerialNumber                                          unknown
ecuModel                                                 unknown
ecuMake                                                  unknown
ecuSource                                                      0
spn                                                          111
fmi                                                           17
active                                                      True
activeTransitionCount                                          2
faultValue                                                   NaN
EquipmentID              

Looking at the first record, here is a breakdown of the important values.

* ESS_Id, actionDescription, ecuSoftwareVersion, ecuSerialNumber, ecuModel, ecuMake, ecuSource, faultValue, and MCTNumber are unlikely to provide any predictive value.
* We can see the time of the event in the **EventTimeStamp** column. Note that this may be different from the **LocationTimeStamp** value, which indicates when the Latitude/Longitude values were recorded.
* The **spn** and **fmi** columns together indicate the type of fault, and there may be a description of that fault in the **eventDescription** column, although this column is sometimes missing.
* Faults are recorded when the light goes on and when it goes off, which is indicated by the **active** column, with True indicating the light turning on and False indicating turning off. The number of times the code has been set or unset is in the **faultValue** column, although this value can be unreliable. 
* Each truck has an identifier, the **EquipmentID** value.
* Each record can be linked to the on-board diagnostics data through the **RecordID** column.

In [None]:
diagnostics = pd.read_csv("../data/VehicleDiagnosticOnboardData.csv", nrows = 100)

diagnostics.head()

To get the on-board diagnostics at the time of the fault code, we can match the **RecordID** to the **FaultId**.

In [None]:
diagnostics.loc[diagnostics['FaultId'] == 1]

This data is in long-format, so each FaultId can have potentially many diagnostic values.

**Note:** Not all diagnostic values are recorded for all faults, so you will have a large number of missing values.

For example, for the second fault code in our dataset, only the ignition status and lamp status were recorded.

In [None]:
diagnostics.loc[diagnostics['FaultId'] == 2]

Finally, we can get a little bit more information about the different fault codes from the Service Fault Codes spreadsheet.

In [None]:
sfc = pd.read_excel("../data/Service Fault Codes_1_0_0_167.xlsx")
sfc.head()

For a large number of fault codes, there are multiple records. For example, if we look at the rows for the first fault in our dataset, we see that there are two rows.

In [None]:
(
    sfc
    .loc[sfc['SPN'] == 111]
    .loc[sfc['J1939 FMI'] == 17]
)

Or even more.

In [None]:
(
    sfc
    .loc[sfc['SPN'] == 629]
    .loc[sfc['J1939 FMI'] == 12]
)