# Chapter 1: Import and Clean
Import and clean data-logs collected from a COBB AccessPort v3 from [Cobb Tuning](http://www.cobbtuning.com/products/accessport/subaru-accessport-v3-ap3-sub-002)

## 1. Import a single datalog file:
note: Assume we are running jupyter from the root directory of the repo

Import the logfile into a pandas dataframe

### Errors and solutions:
1. UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 16: invalid start byte

    FIX: use file to guess the charset
        ```bash
        > file --mime raw_data/roxy_log_21072016_a.csv 
        raw_data/roxy_log_21072016_a.csv: text/plain; charset=iso-8859-1
        ```

In [24]:
import pandas as pd
df = pd.read_csv('../raw_data/roxy_log.201607210800.csv', encoding='iso-8859-1')
df.head()

Unnamed: 0,Time (sec),A/F Sens 1 Ratio (AFR),Boost (PSI),Calculated Load (g/rev),Feedback Knock (°),Fine Knock Learn (°),Gear Position (Gear),Ignition Timing (°),MAF (g/s),Primary Ign. (°),RPM (RPM),TD Burst (%),TD Continuous (%),Target Boost (PSI),Throttle Pos. (%),Wastegate Duty (%),Wastegate Max (%),AP Info:[AP3-SUB-001 v1.7.2.0-11034][USDM 2004 Impreza WRX][Reflash: EXPERIMENTAL-TP-cobb2.ptm - Realtime: EXPERIMENTAL-TP-cobb2.ptm]
0,0.0,14.47,-8.97,0.43,0.0,0.0,1,40.0,16.26,38.36,2278,0.0,5.0,-5.34,7,0.0,0.0,0
1,0.002,14.47,-8.97,0.42,0.0,0.0,1,40.0,16.12,38.36,2270,0.0,5.0,-5.34,7,0.0,0.5,0
2,0.003,14.59,-9.03,0.42,0.0,0.0,1,40.0,16.19,38.36,2320,0.0,5.0,-5.34,7,0.39,1.18,0
3,0.004,14.59,-9.09,0.42,0.0,0.0,1,40.0,16.19,38.01,2318,0.0,5.0,-5.34,7,1.18,1.05,0
4,0.016,14.47,-9.05,0.42,0.0,0.0,1,39.5,16.12,38.36,2274,0.0,5.0,-5.34,7,0.0,0.0,0


### Notes
The column names are verbose, and we will want to rename them to something easier to work with

The meta-data is tacked onto the last column heading, which creates a supurfluous column.

Datalogs will not always have the same columns, we will want to map column names dynamically.

In [40]:
# rename columns, remapping to friendlier names

col_map = {
    'Time (sec)': 'time',
    'A/F Sens 1 Ratio (AFR)': 'afr',
    'Boost (PSI)': 'boost',
    'Calculated Load (g/rev)': 'load',
    'Feedback Knock (°)': 'feedback_knock',
    'Fine Knock Learn (°)': 'fine_knock_learn',
    'Gear Position (Gear)': 'gear_position',
    'Ignition Timing (°)': 'timing',
    'MAF (g/s)': 'maf',
    'Primary Ign. (°)': 'primary_timing',
    'RPM (RPM)': 'rpm',
    'TD Burst (%)': 'td_burst',
    'TD Continuous (%)': 'td_continuous',
    'Target Boost (PSI)': 'boost_target',
    'Throttle Pos. (%)': 'throttle',
    'Wastegate Duty (%)': 'wg_duty',
    'Wastegate Max (%)': 'wg_max',
}

print( data.columns )

# inplace=True to change names without creating a new dataframe
df.rename(columns=col_map, inplace=True)
df.head()

Index(['Time (sec)', 'A/F Sens 1 Ratio (AFR)', 'Boost (PSI)',
       'Calculated Load (g/rev)', 'Feedback Knock (°)', 'Fine Knock Learn (°)',
       'Gear Position (Gear)', 'Ignition Timing (°)', 'MAF (g/s)',
       'Primary Ign. (°)', 'RPM (RPM)', 'TD Burst (%)', 'TD Continuous (%)',
       'Target Boost (PSI)', 'Throttle Pos. (%)', 'Wastegate Duty (%)',
       'Wastegate Max (%)',
       'AP Info:[AP3-SUB-001 v1.7.2.0-11034][USDM 2004 Impreza WRX][Reflash: EXPERIMENTAL-TP-cobb2.ptm - Realtime: EXPERIMENTAL-TP-cobb2.ptm]'],
      dtype='object')


Unnamed: 0,time,afr,boost,load,feedback_knock,fine_knock_learn,gear_position,timing,maf,primary_timing,rpm,td_burst,td_continuous,boost_target,throttle,wg_duty,wg_max,AP Info:[AP3-SUB-001 v1.7.2.0-11034][USDM 2004 Impreza WRX][Reflash: EXPERIMENTAL-TP-cobb2.ptm - Realtime: EXPERIMENTAL-TP-cobb2.ptm]
0,0.0,14.47,-8.97,0.43,0.0,0.0,1,40.0,16.26,38.36,2278,0.0,5.0,-5.34,7,0.0,0.0,0
1,0.002,14.47,-8.97,0.42,0.0,0.0,1,40.0,16.12,38.36,2270,0.0,5.0,-5.34,7,0.0,0.5,0
2,0.003,14.59,-9.03,0.42,0.0,0.0,1,40.0,16.19,38.36,2320,0.0,5.0,-5.34,7,0.39,1.18,0
3,0.004,14.59,-9.09,0.42,0.0,0.0,1,40.0,16.19,38.01,2318,0.0,5.0,-5.34,7,1.18,1.05,0
4,0.016,14.47,-9.05,0.42,0.0,0.0,1,39.5,16.12,38.36,2274,0.0,5.0,-5.34,7,0.0,0.0,0


## 2. Import and merge multiple datalogs

### get a list of files in the raw_data directory

In [23]:
from os import listdir
from os.path import isfile, join

raw_data_path = '../raw_data/'
logfiles = [f for f in listdir(raw_data_path) if isfile(join(raw_data_path, f))]
logfiles

['roxy_log.201607210800.csv',
 'roxy_log.201607211200.csv',
 'roxy_log.201607220800.csv']

### dealing with time-series
modified logfile names to include the start date-time

may use the start-time to convert "Time (sec)" to timestamps


In [21]:
# Get start_times from the logfile names

for log in logfiles:
    start_time = log.split('.')[1]
    print(start_time)
    
# then convert them to timestamps
pd.Timestamp('201607210800')
start_times = [pd.Timestamp(log.split('.')[1]) for log in logfiles]
start_times

201607210800
201607211200
201607220800


[Timestamp('2016-07-21 08:00:00'),
 Timestamp('2016-07-21 12:00:00'),
 Timestamp('2016-07-22 08:00:00')]

In [52]:
# Try converting "Time (sec)" to timestamps
delta_t = df.time
df['time_stamp'] = pd.to_timedelta(delta_t, unit='s')
df['time_stamp'] += start_times[0]
df.head()

Unnamed: 0,time,afr,boost,load,feedback_knock,fine_knock_learn,gear_position,timing,maf,primary_timing,rpm,td_burst,td_continuous,boost_target,throttle,wg_duty,wg_max,AP Info:[AP3-SUB-001 v1.7.2.0-11034][USDM 2004 Impreza WRX][Reflash: EXPERIMENTAL-TP-cobb2.ptm - Realtime: EXPERIMENTAL-TP-cobb2.ptm],time_stamp
0,0.0,14.47,-8.97,0.43,0.0,0.0,1,40.0,16.26,38.36,2278,0.0,5.0,-5.34,7,0.0,0.0,0,2016-07-21 08:00:00.000
1,0.002,14.47,-8.97,0.42,0.0,0.0,1,40.0,16.12,38.36,2270,0.0,5.0,-5.34,7,0.0,0.5,0,2016-07-21 08:00:00.002
2,0.003,14.59,-9.03,0.42,0.0,0.0,1,40.0,16.19,38.36,2320,0.0,5.0,-5.34,7,0.39,1.18,0,2016-07-21 08:00:00.003
3,0.004,14.59,-9.09,0.42,0.0,0.0,1,40.0,16.19,38.01,2318,0.0,5.0,-5.34,7,1.18,1.05,0,2016-07-21 08:00:00.004
4,0.016,14.47,-9.05,0.42,0.0,0.0,1,39.5,16.12,38.36,2274,0.0,5.0,-5.34,7,0.0,0.0,0,2016-07-21 08:00:00.016


In [55]:
delta_t = df.time
df['time'] = pd.to_timedelta(df.time, unit='s') + start_times[0]
df.head()

OverflowError: Python int too large to convert to C long