# 2-Import Data
In this notebook, we read data from the downloaded CSV files and Excel files, and import it into a SQLite database.

**Requirements:**

- Please run the `1-download-ved.ipynb` notebook first.
- Recommended install: [ipywidgets](https://ipywidgets.readthedocs.io/en/stable/user_install.html)

In [1]:
import numpy as np
import pandas as pd
import sqlite3
import os

from tqdm.notebook import tqdm
from sqlapi import VedDb

Set the data path and target file name.

In [2]:
data_path = "./data"

The `read_data_frame` reads data from a single CSV file into a Pandas DataFrame.

In [3]:
def read_data_frame(filename):
    columns = ['DayNum', 'VehId', 'Trip', 'Timestamp(ms)', 
               'Latitude[deg]', 'Longitude[deg]', 
               'Vehicle Speed[km/h]', 'MAF[g/sec]', 
               'Engine RPM[RPM]', 'Absolute Load[%]',
               'OAT[DegC]', 'Fuel Rate[L/hr]', 
               'Air Conditioning Power[kW]', 'Air Conditioning Power[Watts]',
               'Heater Power[Watts]', 'HV Battery Current[A]', 
               'HV Battery SOC[%]', 'HV Battery Voltage[V]',
               'Short Term Fuel Trim Bank 1[%]', 'Short Term Fuel Trim Bank 2[%]',
               'Long Term Fuel Trim Bank 1[%]', 'Long Term Fuel Trim Bank 2[%]'
              ]
    types = {'VehId': np.int64,
             'Trip': np.int64,
             'Timestamp(ms)': np.int64
            }
    df = pd.read_csv(filename, usecols=columns, dtype=types)
    return df

Enumerate the CSV files from the expanded data directory.

In [4]:
files = [os.path.join(data_path, file) for file in tqdm(os.listdir(data_path)) if file.endswith(".csv")]

HBox(children=(FloatProgress(value=0.0, max=55.0), HTML(value='')))




Create a `VedDb` object. This is the API to interface with the SQLite database.

In [5]:
db = VedDb()

Iterate through the data files, import them into a Pandas DataFrame and then insert the signals into the database. Note that the signals are bulk-inserted, so the DataFrame is actually converted to a list of tuples.

In [6]:
for file in tqdm(files):
    df = read_data_frame(file)
    
    signals = []
    for row in df.itertuples(index=False):
        signals.append(row)
        
    db.insert_signals(signals)

HBox(children=(FloatProgress(value=0.0, max=54.0), HTML(value='')))




Now we can load the static data from the ICE & HEV vehicles.

In [7]:
df_ice_hev = pd.read_excel("./ved/Data/VED_Static_Data_ICE&HEV.xlsx").replace('NO DATA', np.nan)

In [8]:
df_ice_hev.head()

Unnamed: 0,VehId,Vehicle Type,Vehicle Class,Engine Configuration & Displacement,Transmission,Drive Wheels,Generalized_Weight
0,2,ICE,Car,4-FI 2.0L T/C,,,3500.0
1,5,HEV,Car,4-GAS/ELECTRIC 2.0L,,,3500.0
2,7,ICE,SUV,6-FI 3.6L,AUTOMATIC,,4500.0
3,8,ICE,Car,4-FI 1.5L,5-SP MANUAL,,2500.0
4,12,ICE,Car,4-FI 1.8L,,,2500.0


We follow by reading the PHEV and EV static data.

In [9]:
df_phev_ev = pd.read_excel("./ved/Data/VED_Static_Data_PHEV&EV.xlsx").replace('NO DATA', np.nan)

In [10]:
df_phev_ev.head()

Unnamed: 0,VehId,EngineType,Vehicle Class,Engine Configuration & Displacement,Transmission,Drive Wheels,Generalized_Weight
0,9,PHEV,Car,4-GAS/ELECTRIC 1.4L,,FWD,4000
1,10,EV,Car,ELECTRIC,,FWD,3500
2,11,PHEV,Car,4-GAS/ELECTRIC 2.0L,CVT,FWD,4000
3,371,PHEV,Car,4-GAS/ELECTRIC 2.0L,CVT,FWD,4000
4,379,PHEV,Car,4-GAS/ELECTRIC 1.4L,,FWD,4000


Next, we collect all vehicle definitions in one list and bulk insert it into the database.

In [12]:
vehicles = []

for row in df_ice_hev.itertuples(index=False):
    vehicles.append(row)
for row in df_phev_ev.itertuples(index=False):
    vehicles.append(row)
    
db.insert_vehicles(vehicles)

This creates the first image of the database. In subsequent notebooks we will use it to further analyse the data and derive some (hopefully) interesting models.

In [13]:
db.generate_moves()