In [1]:
import pandas as pd

In [2]:
faults = pd.read_csv("../data/J1939Faults.csv", nrows = 100)
diagnostics = pd.read_csv("../data/VehicleDiagnosticOnboardData.csv", nrows = 100)

In [3]:
faults_diagnostics = pd.merge(faults, diagnostics, 
                              left_on='RecordID', right_on='FaultId', 
                              how='left')

In [4]:
drop_list = ['ESS_Id', 
             'actionDescription', 
             'ecuSoftwareVersion', 
             'ecuSerialNumber', 
             'ecuModel', 
             'ecuMake', 
             'ecuSource', 
             'faultValue',
             'Latitude',
             'LocationTimeStamp',
             'Longitude',
             'Id',
             'MCTNumber']

faults_diagnostics = faults_diagnostics.drop(columns=drop_list)

In [19]:
#var_filter = ['False']
#filter = faults_diagnostics[faults_diagnostics['Value'] == var_filter]
var_filter = ['False', 'True']
filter = faults_diagnostics[~faults_diagnostics['Value'].isin(var_filter)]
filter.head(3)

Unnamed: 0,RecordID,EventTimeStamp,eventDescription,spn,fmi,active,activeTransitionCount,EquipmentID,Name,Value,FaultId
1,1,2015-02-21 10:47:13.000,Low (Severity Low) Engine Coolant Level,111,17,True,2,1439,EngineOilPressure,0.0,1.0
2,1,2015-02-21 10:47:13.000,Low (Severity Low) Engine Coolant Level,111,17,True,2,1439,EngineOilTemperature,96.74375,1.0
3,1,2015-02-21 10:47:13.000,Low (Severity Low) Engine Coolant Level,111,17,True,2,1439,TurboBoostPressure,0.0,1.0


In [11]:
faults_diagnostics['Value'].value_counts()

Value
1023        10
True         9
0            8
False        5
3276.75      4
            ..
13.6022      1
9480         1
96.74375     1
470381.4     1
32           1
Name: count, Length: 61, dtype: int64

In [None]:
filter.to_csv('equipmentID_1439.csv', index=False)

In [None]:
faults.loc[0]

Looking at the first record, here is a breakdown of the important values.

* ESS_Id, actionDescription, ecuSoftwareVersion, ecuSerialNumber, ecuModel, ecuMake, ecuSource, faultValue, and MCTNumber are unlikely to provide any predictive value.
* We can see the time of the event in the **EventTimeStamp** column. Note that this may be different from the **LocationTimeStamp** value, which indicates when the Latitude/Longitude values were recorded.
* The **spn** and **fmi** columns together indicate the type of fault, and there may be a description of that fault in the **eventDescription** column, although this column is sometimes missing.
* Faults are recorded when the light goes on and when it goes off, which is indicated by the **active** column, with True indicating the light turning on and False indicating turning off. The number of times the code has been set or unset is in the **faultValue** column, although this value can be unreliable. 
* Each truck has an identifier, the **EquipmentID** value.
* Each record can be linked to the on-board diagnostics data through the **RecordID** column.

In [None]:
diagnostics.head()

To get the on-board diagnostics at the time of the fault code, we can match the **RecordID** to the **FaultId**.

In [None]:
diagnostics.loc[diagnostics['FaultId'] == 1]

This data is in long-format, so each FaultId can have potentially many diagnostic values.

**Note:** Not all diagnostic values are recorded for all faults, so you will have a large number of missing values.

For example, for the second fault code in our dataset, only the ignition status and lamp status were recorded.

In [None]:
diagnostics.loc[diagnostics['FaultId'] == 2]

Finally, we can get a little bit more information about the different fault codes from the Service Fault Codes spreadsheet.

In [None]:
sfc = pd.read_excel("../data/Service Fault Codes_1_0_0_167.xlsx")
sfc.head()

For a large number of fault codes, there are multiple records. For example, if we look at the rows for the first fault in our dataset, we see that there are two rows.

In [None]:
(
    sfc
    .loc[sfc['SPN'] == 111]
    .loc[sfc['J1939 FMI'] == 17]
)

Or even more.

In [None]:
(
    sfc
    .loc[sfc['SPN'] == 629]
    .loc[sfc['J1939 FMI'] == 12]
)