# Chapter 1: Import and Clean
Import and clean data-logs collected from a COBB AccessPort v3 from [Cobb Tuning](http://www.cobbtuning.com/products/accessport/subaru-accessport-v3-ap3-sub-002)

## 1. Import a single datalog file:
note: Assume we are running jupyter from the root directory of the repo

Import the logfile into a pandas dataframe

### Errors and solutions:
1. UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 16: invalid start byte

    FIX: use file to guess the charset
        ```bash
        > file --mime raw_data/roxy_log_21072016_a.csv 
        raw_data/roxy_log_21072016_a.csv: text/plain; charset=iso-8859-1
        ```

In [75]:
import pandas as pd
df = pd.read_csv('../raw_data/roxy_log.201607210800.csv', encoding='iso-8859-1')
df.head()

Unnamed: 0,Time (sec),A/F Sens 1 Ratio (AFR),Boost (PSI),Calculated Load (g/rev),Feedback Knock (°),Fine Knock Learn (°),Gear Position (Gear),Ignition Timing (°),MAF (g/s),Primary Ign. (°),RPM (RPM),TD Burst (%),TD Continuous (%),Target Boost (PSI),Throttle Pos. (%),Wastegate Duty (%),Wastegate Max (%),AP Info:[AP3-SUB-001 v1.7.2.0-11034][USDM 2004 Impreza WRX][Reflash: EXPERIMENTAL-TP-cobb2.ptm - Realtime: EXPERIMENTAL-TP-cobb2.ptm]
0,0.0,14.47,-8.97,0.43,0.0,0.0,1,40.0,16.26,38.36,2278,0.0,5.0,-5.34,7,0.0,0.0,0
1,0.002,14.47,-8.97,0.42,0.0,0.0,1,40.0,16.12,38.36,2270,0.0,5.0,-5.34,7,0.0,0.5,0
2,0.003,14.59,-9.03,0.42,0.0,0.0,1,40.0,16.19,38.36,2320,0.0,5.0,-5.34,7,0.39,1.18,0
3,0.004,14.59,-9.09,0.42,0.0,0.0,1,40.0,16.19,38.01,2318,0.0,5.0,-5.34,7,1.18,1.05,0
4,0.016,14.47,-9.05,0.42,0.0,0.0,1,39.5,16.12,38.36,2274,0.0,5.0,-5.34,7,0.0,0.0,0


### Notes
The column names are verbose, and we will want to rename them to something easier to work with

The meta-data is tacked onto the last column heading, which creates a supurfluous column.

Datalogs will not always have the same columns, we will want to map column names dynamically.

In [76]:
# rename columns, remapping to friendlier names

col_map = {
    'Time (sec)': 'time',
    'A/F Sens 1 Ratio (AFR)': 'afr',
    'Boost (PSI)': 'boost',
    'Calculated Load (g/rev)': 'load',
    'Feedback Knock (°)': 'feedback_knock',
    'Fine Knock Learn (°)': 'fine_knock_learn',
    'Gear Position (Gear)': 'gear_position',
    'Ignition Timing (°)': 'timing',
    'MAF (g/s)': 'maf',
    'Primary Ign. (°)': 'primary_timing',
    'RPM (RPM)': 'rpm',
    'TD Burst (%)': 'td_burst',
    'TD Continuous (%)': 'td_continuous',
    'Target Boost (PSI)': 'boost_target',
    'Throttle Pos. (%)': 'throttle',
    'Wastegate Duty (%)': 'wg_duty',
    'Wastegate Max (%)': 'wg_max',
}

print( data.columns )

# inplace=True to change names without creating a new dataframe
df.rename(columns=col_map, inplace=True)
df.head()

Index(['time', 'afr', 'boost', 'load', 'feedback_knock', 'fine_knock_learn',
       'gear_position', 'timing', 'maf', 'primary_timing', 'rpm', 'td_burst',
       'td_continuous', 'boost_target', 'throttle', 'wg_duty', 'wg_max',
       'AP Info:[AP3-SUB-001 v1.7.2.0-11034][USDM 2004 Impreza WRX][Reflash: EXPERIMENTAL-TP-cobb2.ptm - Realtime: EXPERIMENTAL-TP-cobb2.ptm]',
       'time_stamp'],
      dtype='object')


Unnamed: 0,time,afr,boost,load,feedback_knock,fine_knock_learn,gear_position,timing,maf,primary_timing,rpm,td_burst,td_continuous,boost_target,throttle,wg_duty,wg_max,AP Info:[AP3-SUB-001 v1.7.2.0-11034][USDM 2004 Impreza WRX][Reflash: EXPERIMENTAL-TP-cobb2.ptm - Realtime: EXPERIMENTAL-TP-cobb2.ptm]
0,0.0,14.47,-8.97,0.43,0.0,0.0,1,40.0,16.26,38.36,2278,0.0,5.0,-5.34,7,0.0,0.0,0
1,0.002,14.47,-8.97,0.42,0.0,0.0,1,40.0,16.12,38.36,2270,0.0,5.0,-5.34,7,0.0,0.5,0
2,0.003,14.59,-9.03,0.42,0.0,0.0,1,40.0,16.19,38.36,2320,0.0,5.0,-5.34,7,0.39,1.18,0
3,0.004,14.59,-9.09,0.42,0.0,0.0,1,40.0,16.19,38.01,2318,0.0,5.0,-5.34,7,1.18,1.05,0
4,0.016,14.47,-9.05,0.42,0.0,0.0,1,39.5,16.12,38.36,2274,0.0,5.0,-5.34,7,0.0,0.0,0


## 2. Import and merge multiple datalogs

### get a list of files in the raw_data directory

In [77]:
from os import listdir
from os.path import isfile, join

raw_data_path = '../raw_data/'
logfiles = [f for f in listdir(raw_data_path) if isfile(join(raw_data_path, f))]
logfiles

['roxy_log.201607210800.csv',
 'roxy_log.201607211200.csv',
 'roxy_log.201607220800.csv']

### dealing with time-series
modified logfile names to include the start date-time

may use the start-time to convert "Time (sec)" to timestamps


In [78]:
# Get start_times from the logfile names

for log in logfiles:
    start_time = log.split('.')[1]
    print(start_time)
    
# then convert them to timestamps
pd.Timestamp('201607210800')
start_times = [pd.Timestamp(log.split('.')[1]) for log in logfiles]
start_times

201607210800
201607211200
201607220800


[Timestamp('2016-07-21 08:00:00'),
 Timestamp('2016-07-21 12:00:00'),
 Timestamp('2016-07-22 08:00:00')]

### change Time (sec) to pandas timestamps, starting @ start_time from the filename

In [79]:
# converting "Time (sec)"/'time' to timestamps
data = df
delta_t = pd.to_timedelta(data['time'], unit='s')
data['time'] = delta_t + start_times[0]
data.head()

Unnamed: 0,time,afr,boost,load,feedback_knock,fine_knock_learn,gear_position,timing,maf,primary_timing,rpm,td_burst,td_continuous,boost_target,throttle,wg_duty,wg_max,AP Info:[AP3-SUB-001 v1.7.2.0-11034][USDM 2004 Impreza WRX][Reflash: EXPERIMENTAL-TP-cobb2.ptm - Realtime: EXPERIMENTAL-TP-cobb2.ptm]
0,2016-07-21 08:00:00.000,14.47,-8.97,0.43,0.0,0.0,1,40.0,16.26,38.36,2278,0.0,5.0,-5.34,7,0.0,0.0,0
1,2016-07-21 08:00:00.002,14.47,-8.97,0.42,0.0,0.0,1,40.0,16.12,38.36,2270,0.0,5.0,-5.34,7,0.0,0.5,0
2,2016-07-21 08:00:00.003,14.59,-9.03,0.42,0.0,0.0,1,40.0,16.19,38.36,2320,0.0,5.0,-5.34,7,0.39,1.18,0
3,2016-07-21 08:00:00.004,14.59,-9.09,0.42,0.0,0.0,1,40.0,16.19,38.01,2318,0.0,5.0,-5.34,7,1.18,1.05,0
4,2016-07-21 08:00:00.016,14.47,-9.05,0.42,0.0,0.0,1,39.5,16.12,38.36,2274,0.0,5.0,-5.34,7,0.0,0.0,0


### reindex the dataframe, using the timestamps as the new index

In [83]:
# set the series as the new index
data.index = data.time
# and drop the duplicate column
data = data.drop('time', 1)

In [84]:
data.head()

Unnamed: 0_level_0,afr,boost,load,feedback_knock,fine_knock_learn,gear_position,timing,maf,primary_timing,rpm,td_burst,td_continuous,boost_target,throttle,wg_duty,wg_max,AP Info:[AP3-SUB-001 v1.7.2.0-11034][USDM 2004 Impreza WRX][Reflash: EXPERIMENTAL-TP-cobb2.ptm - Realtime: EXPERIMENTAL-TP-cobb2.ptm]
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
2016-07-21 08:00:00.000,14.47,-8.97,0.43,0.0,0.0,1,40.0,16.26,38.36,2278,0.0,5.0,-5.34,7,0.0,0.0,0
2016-07-21 08:00:00.002,14.47,-8.97,0.42,0.0,0.0,1,40.0,16.12,38.36,2270,0.0,5.0,-5.34,7,0.0,0.5,0
2016-07-21 08:00:00.003,14.59,-9.03,0.42,0.0,0.0,1,40.0,16.19,38.36,2320,0.0,5.0,-5.34,7,0.39,1.18,0
2016-07-21 08:00:00.004,14.59,-9.09,0.42,0.0,0.0,1,40.0,16.19,38.01,2318,0.0,5.0,-5.34,7,1.18,1.05,0
2016-07-21 08:00:00.016,14.47,-9.05,0.42,0.0,0.0,1,39.5,16.12,38.36,2274,0.0,5.0,-5.34,7,0.0,0.0,0


### resample the data
L = 1 millisecond, which is the highest resolution available in the logs.

In [98]:
d2 = data.resample('L').mean()
d2

Unnamed: 0_level_0,afr,boost,load,feedback_knock,fine_knock_learn,gear_position,timing,maf,primary_timing,rpm,td_burst,td_continuous,boost_target,throttle,wg_duty,wg_max,AP Info:[AP3-SUB-001 v1.7.2.0-11034][USDM 2004 Impreza WRX][Reflash: EXPERIMENTAL-TP-cobb2.ptm - Realtime: EXPERIMENTAL-TP-cobb2.ptm]
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
2016-07-21 08:00:00.000,14.47,-8.97,0.43,0.0,0.0,1.0,40.0,16.26,38.36,2278.0,0.0,5.0,-5.34,7.0,0.00,0.00,0.0
2016-07-21 08:00:00.001,,,,,,,,,,,,,,,,,
2016-07-21 08:00:00.002,14.47,-8.97,0.42,0.0,0.0,1.0,40.0,16.12,38.36,2270.0,0.0,5.0,-5.34,7.0,0.00,0.50,0.0
2016-07-21 08:00:00.003,14.59,-9.03,0.42,0.0,0.0,1.0,40.0,16.19,38.36,2320.0,0.0,5.0,-5.34,7.0,0.39,1.18,0.0
2016-07-21 08:00:00.004,14.59,-9.09,0.42,0.0,0.0,1.0,40.0,16.19,38.01,2318.0,0.0,5.0,-5.34,7.0,1.18,1.05,0.0
2016-07-21 08:00:00.005,,,,,,,,,,,,,,,,,
2016-07-21 08:00:00.006,,,,,,,,,,,,,,,,,
2016-07-21 08:00:00.007,,,,,,,,,,,,,,,,,
2016-07-21 08:00:00.008,,,,,,,,,,,,,,,,,
2016-07-21 08:00:00.009,,,,,,,,,,,,,,,,,
