In [4]:
import sys
sys.path.insert(0, "../")

import h5py
# from make_dataset import *

# make_dataset pipeline
This notebook seeks to explain the data-processing pipeline of our make_dataset.py file. It will take you through each step, going from the raw data to the segmented data with computed KPIs, while explaining the functions and choices made at each step.

<div style="text-align:center">
    <img src="../reports/figures/jupyter/make_dataset/data_warehouse.png" style="width:90%">
</div>

For each step, the command to run the *step* will be stated, and then an explanation of the relevant parts of the code running in the back will be made. Additioanlly, we will showcase what happens to the data after each step. For simplicity, all commands will be listed here for a quick overview.

#### ***Step-by-step***  
**Step 1 - Convert:**       `python src/data/make_dataset.py convert`  
**Step 2 - Validate:**      `python src/data/make_dataset.py validate` (\*)    
**Step 3 - Segment:**       `python src/data/make_dataset.py segment`  
**Step 4 - Match:**         `python src/data/make_dataset.py match`  
**Step 5 - Resample:**      `python src/data/make_dataset.py resample` (\*)  
**Step 6 - KPI:**           `python src/data/make_dataset.py kpi`

Note that (*) statements also have a verbose functionalioty that can be added as `--verbose`, where plots and additional tqdm progress bars may be displayed.

#### ***Everything-at-once***  
If you wish to run everything at once simply use the following command, `python src/data/make_dataset.py all`, and adding `--verbose` will still activate at the relevant steps.
#### ***Begin-from-and-do-the-rest***
We added a functionality that allows you to continue from any point in the data-pipeline, meaning e.g. if you have already done `convert`, `validate` and `segment` and wishes to run everything else at once, use the following command, `python src/data/make_dataset.py --begin-from match`.


# Raw data

Before we dive into the data pipeline, lets get an overview of the different datasets we are dealing with. The raw data consists of the 4 datasets:

#### ***1. GM data (AutoPi and CAN)***  

Something about what the data is. Yada yada

>platoon_CPH1.hdf5<br>
│── GM<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── Car ID [int]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── Pass ID [int]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── acc.xyz (11359, 4)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── acc_long (16011, 2)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── acc_trans (16011, 2)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── $\ldots$<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── whl_trq_pot_ri (16010, 2)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$\cdots$<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$\cdots$<br>
$\cdots$ (Not used)


#### ***2. GoPro***

Det er sgu en orker


#### ***3. ARAN***

Used for computing KPIs

In [2]:
import pandas as pd

aran_raw = pd.read_csv("../data/raw/ref_data/cph1_aran_hh.csv", sep=';', encoding='unicode_escape')
aran_raw.head()

Unnamed: 0,L_Route_ID,DCSTimeStamp,BeginChainage,EndChainage,Venstre IRI (m/km),Højre IRI (m/km),Rivninger MeanRI (cm³/m²),Rivninger MeanExistingRI (cm³/m²),Rivninger MeanRPI (cm³/m²),Rivninger MeanAVC (cm³/m²),...,Latitude To (rad),Longitude From (rad),Longitude To (rad),Heading (rad),Elevation (m),Lat,Lon,Alt,Heading,Bearing
0,9990001-0-HVB1,44055.48542,-53.642797,-52.642797,0.0,0.0,,,,,...,0.971321,0.217917,0.217917,1.548209,38.407361,55.652595,12.485727,38.407361,88.705816,89.282278
1,9990001-0-HVB1,44055.48542,-52.642797,-51.642797,,,,,,,...,0.971321,0.217917,0.217917,1.548716,38.409111,55.652596,12.485742,38.409111,88.734917,89.282278
2,9990001-0-HVB1,44055.48542,-51.642797,-50.642797,,,,,,,...,0.971321,0.217917,0.217918,1.549302,38.414436,55.652596,12.485758,38.414436,88.768446,89.271525
3,9990001-0-HVB1,44055.48542,-50.642797,-49.642797,,,28.416636,32.142211,2238.577759,2266.9944,...,0.971321,0.217918,0.217918,1.550076,38.423914,55.652596,12.485774,38.423914,88.812826,89.265799
4,9990001-0-HVB1,44055.48542,-49.642797,-48.642797,,,,,,,...,0.971321,0.217918,0.217918,1.550698,38.431759,55.652596,12.48579,38.431759,88.848426,89.300827


#### ***4. P79***

Not actually used for now, but it is a very precise laser measurement of the surface of the road.
Using a model such as the ***quarter car model*** one can simulate a vehicle driving on the road.

In [3]:
p79_raw = pd.read_csv("../data/raw/ref_data/cph1_zp_hh.csv", sep=';', encoding='unicode_escape')
p79_raw.head()

Unnamed: 0,Distance [m],Laser 1 [mm],Laser 2 [mm],Laser 3 [mm],Laser 4 [mm],Laser 5 [mm],Laser 6 [mm],Laser 7 [mm],Laser 8 [mm],Laser 9 [mm],...,Laser 22 [mm],Laser 23 [mm],Laser 24 [mm],Laser 25 [mm],Lat,Lon,Højde,GeoHøjde,Alt,Bearing
0,0.0,77.979963,76.129577,73.432504,72.343552,71.085527,70.748823,71.22128,69.414377,66.948905,...,47.506321,44.715228,42.602378,40.864356,55.652685,12.488391,12.8,38.400002,51.200002,87.366884
1,0.100709,77.577069,75.484964,72.750423,71.660123,70.377488,69.797778,70.464433,69.102455,66.336974,...,47.03387,44.391341,42.129228,41.160619,55.652685,12.488392,12.8,38.400002,51.200002,87.366884
2,0.201419,76.67461,74.928609,72.165267,70.970617,69.58861,69.407065,69.937898,68.307188,65.791922,...,46.652856,44.162898,42.144453,41.047513,55.652685,12.488394,12.8,38.400002,51.200002,87.366884
3,0.302128,76.192724,74.53589,71.536658,70.283476,69.134717,68.887796,69.609289,68.062037,65.154582,...,46.271888,43.681742,41.807248,40.788764,55.652685,12.488396,12.8,38.400002,51.200002,87.366884
4,0.402838,75.589468,73.560795,70.603412,69.809506,68.749661,68.331567,69.250653,67.711902,64.769605,...,45.98265,43.383675,41.542447,40.4437,55.652685,12.488397,12.8,38.400002,51.200002,87.309725


# Step 1 - Converting data 

**Command:** `python src/data/make_dataset.py convert`

**Code in notebook:**

In [4]:
from src.data.data_functions.converting import convert
from src.data.data_functions.extract_gopro import preprocess_gopro_data

skip_convert = True

if not skip_convert:
    # Convert measurements of GM data
    hh = '../data/raw/AutoPi_CAN/platoon_CPH1_HH.hdf5'
    vh = '../data/raw/AutoPi_CAN/platoon_CPH1_VH.hdf5'
    convert(hh, vh)

    # Convert file structure of GoPro data
    folder = "../data/raw/gopro/"
    preprocess_gopro_data(folder)

#### ***AutoPi and CAN data***  
The raw data is the CPH1 route from the LiRA data-set, as can be found in [Table 7](https://doi.org/10.1016/j.dib.2023.109426) of the LiRA-CD paper. Our first goal, as described in *section 3* of the paper, is to perform translation (conversion) of some of the car sensors, 

$$
\begin{align*}
    s = ((s_{\text{LiRA-CD}} - b^* \cdot r^*) - b) \cdot r,
\end{align*}
$$
where $s_{LiRA-CD}$ is the sensor signal stored in LiRA-CD, $b^*$ and $r^*$ are the offset and resolution values (values achieved through the CanZE application) and $b$ and $r$ are the corrected offset and resolution values (found in the LiRA project). The values are found in the following [paper](https://doi.org/10.1016/j.dib.2023.109426) and are further specified below,

```Python
CONVERT_PARAMETER_DICT = {
    'acc_long':     {'bstar': 198,      'rstar': 1,     'b': 198,   'r': 0.05   },
    'acc_trans':    {'bstar': 32768,    'rstar': 1,     'b': 32768, 'r': 0.04   },
    'acc_yaw':      {'bstar': 2047,     'rstar': 1,     'b': 2047,  'r': 0.1    },
    'brk_trq_elec': {'bstar': 4096,     'rstar': -1,    'b': 4098,  'r': -1     },
    'whl_trq_est':  {'bstar': 12800,    'rstar': 0.5,   'b': 12700, 'r': 1      },
    'trac_cons':    {'bstar': 80,       'rstar': 1,     'b': 79,    'r': 1      },
    'trip_cons':    {'bstar': 0,        'rstar': 0.1,   'b': 0,     'r': 1      }
}
```
In addition to performing the conversion, we also smooth the data of some of the car sensor signals, as they are prone to noise and can have alot of sporadic behavior. To smoothen the signals we use Locally Weighted Scatterplot Smoothing (LOWESS).

```Python
SMOOTH_PARAMETER_DICT = {
    'acc.xyz':       {'kind': 'lowess', 'frac': 0.005},
    'spd_veh':       {'kind': 'lowess', 'frac': 0.005},
    'acc_long':      {'kind': 'lowess', 'frac': 0.005},
    'acc_trans':     {'kind': 'lowess', 'frac': 0.005}
}
```

#### ***GoPro Data***  

The values of the GoPro data are not altered by any convertion, but we do change the structure of the files. Instead of storing the measurements according to each GoPro recording, the measurements are paired with the corresponding car (***16006***, ***16009*** or ***16011***), and all passes on the road in the given trip are joined in a single csv file for each measurement type (***accl***, ***gps5***, ***gyro***). By doing this, we have aligned the structures of the possible input data, ***GM*** and ***GoPro***, leading to easier matching in the coming steps.


# Step 2 - Validation
**Command:** `python src/data/make_dataset.py validate` (`--verbose` can be added)

**Code in notebook:**

In [5]:
from src.data.data_functions.validating import validate

skip_validation = True

if not skip_validation:
    # Validate the GM measurements after conversion
    hh = "../data/interim/gm/converted_platoon_CPH1_HH.hdf5"
    vh = "../data/interim/gm/converted_platoon_CPH1_VH.hdf5"
    validate(hh=hh, vh=vh, threshold=0.9, verbose=False)

To validate that our conversions have been done corretly and the smoothing with LOWESS has improved the signal, we wish to compare the the AutoPi data with that of CAN.

#### Process data for comparison
1. We fix the sampling frequency (used to calculate the time) to $f_s=10$.
2. Extract the speed distance calculated based on the vehicle speed from the GM data (`spd_veh`).
3. Extract GPS (`gps`) longtitude and latitude data.
4. Extract Odometer (`odo`) distance measure and adding the fine distance measure (`f_dist`), all computed in meters.
5. Extract and normalise AutoPi 3D accelerations (`acc.xyz`).
6. Extract and normalise transverse (`acc_trans`) and longitudinal (`acc_long`) acceleration.
7. Resample time into 100hz
8. Resample all extracted data into 100hz via interpolation in the function clean_int(...).
9. Ensure accelerations are in $m/s^2$ and not $g$.
10. Determnine the orientation of the sensors and reorient if needed - computed based on the correlation of th AutoPi 3D x- and y-accelerations with the CAN accelerations.

#### Compare and calculate correlation coefficients
1. Compare x-accelerations
2. Compare y-accelerations
3. Compare speed distance with gps and speed distance with odometer.

**NOTE**: This step does not change the structure of the data.

Below we see examples for each of the three comparisons made in our validation step.

<div style="text-align:center">
    <img src="../reports/figures/jupyter/make_dataset/validate_data.png" style="width:80%">
</div>

As seen from the plot above, we have the the data from AutoPi and CAN follow each other relative good with correlation coefficients above our threshold of 0.8.

# Step 3 - Segment

**Command:** `python src/data/make_dataset.py segment` (`--verbose` can be added)

**Code in notebook:**

In [7]:
from src.data.data_functions.segmenting import segment

skip_segment = True

if not skip_segment:
    # Validate the GM measurements after conversion
    hh = "../data/interim/gm/converted_platoon_CPH1_HH.hdf5"
    vh = "../data/interim/gm/converted_platoon_CPH1_VH.hdf5"
    speed_threshold = 5
    time_threshold = 10
    segment(hh=hh, vh=vh, speed_threshold=speed_threshold, time_threshold=time_threshold)

The following segmentation step is *only* done on the ***GM*** data, and it arises from the observation that data from time-stamps where the car is (close to) stationary i.e. is sitting in traffic or at a traffic light adds little to no information about the road profile. To accounting for this, we split all trips and passes for all cars into segments, and restructure the hdf5 tree into the following.

#### The restructured data
>segments.hdf5<br>
│── Segment ID [int]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── *attributes*<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── *direction* [str]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── *trip_name* [str]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *pass_name* [str]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── acc.xyz [array] (1805, 4)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── acc_long [array] (2550, 2)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── acc_trans [array] (2550, 2)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── $\ldots$<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── whl_trq_pot_ri [array] (2548, 2)<br>
│── $\cdots$<br>
$\cdots$

As can be seen in the new data-structure, we reformat the data such that the direction, trip-name and pass-name all are saved as attributes in each segment, instead of it being a hiearchy going from `direction -> trip-name -> pass-name -> segment -> measurements` we now have `segment -> {direction, trip-name, pass-name, measurements}`. By doing this, we allow for a more comprehensible work-flow when training machine learning models.

**Now, how do we do it?** To asses whether a car is stationary or almost stationary we say that if the speed is less than the `speed_threshold` for more than `time_threshold` seconds, then we *clip* it out and split it into segments. For our data, we used,

- `speed_threshold = 5` 
- `time_threshold = 10` 

# Step 4 - Match
**Command:** `python src/data/make_dataset.py match`

**Code in notebook:**

In [8]:
from src.data.data_functions.matching import match_data

skip_match = True

if not skip_match:
    aran_hh = "../data/raw/ref_data/cph1_aran_hh.csv"
    aran_vh = "../data/raw/ref_data/cph1_aran_vh.csv"
    p79_hh = "../data/raw/ref_data/cph1_zp_hh.csv"
    p79_vh = "../data/raw/ref_data/cph1_zp_vh.csv"

    # Matching the reference data and the GoPro data with the segments of the GM data
    match_data(aran_hh, aran_vh, p79_hh, p79_vh)

Now that we have extracted the segments of the GM data which *hopefully* contain meaningful measurements, we simply need to extract the corresponding segments from the three other datasets, namely: ***GoPro***, ***ARAN*** and ***P79***.

In the case of the ***GoPro*** data, we can not find the appropriate data for each segment using the longitudal and lattitudal coordinates, since each car drove multiple labs on the route.
Therefore, we instead simply match the segments using the timestamps of the beginning and end measurements in each segment with the timestamps of the ***GoPro*** data.

Tne ***ARAN*** and ***P79*** data contains only a single lab of the route (which is a subsection of the entire route of the refrence data cars). Therefore, we can match up the reference data measurements with the GM segments by finding comparing the longitudal and lattitudal coordinates of the reference data with the coordinates of the beginning and end measurements in each GM segment.

The matching segments for ***GoPro***, ***ARAN*** and ***P79*** are stored in seperate `segments.hdf5` files within each appertaining folders in the `interim/gopro`, `interim/aran` and `interim/p79`.

### Segment data structures

For complete clarity, we show the structure of the interim segment data for ***GoPro***, ***ARAN*** and ***P79***, in line with how ***GM*** was represented in the previous section.

#### ***GoPro***

>interim/gopro/segments.hdf5<br>
│── Segment ID [int]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── *attributes*<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── *direction* [str]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── *trip_name* [str]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *pass_name* [str]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── accl<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── 'Accelerometer (x) [m_s2]' [array] (4911,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── 'Accelerometer (y) [m_s2]' [array] (4911,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── 'Accelerometer (z) [m_s2]' [array] (4911,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── cts [array] (4911,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── date [array] (4911,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── 'temperature [°C]' [array] (4911,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── gps5<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── 'GPS (2D speed) [m_s]' [array] (442,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── 'GPS (3D speed) [m_s]' [array] (442,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── 'GPS (Alt.) [m]' [array] (442,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── 'GPS (Lat.) [deg]' [array] (442,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── 'GPS (Long.) [deg]' [array] (442,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── cts [array] (442,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── date [array] (442,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── gyro<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── 'Gyroscope (x) [rad_s]' [array] (4911,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── 'Gyroscope (y) [rad_s]' [array] (4911,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── 'Gyroscope (z) [rad_s]' [array] (4911,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── cts [array] (4911,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── date [array] (4911,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── 'temperature [°C]' [array] (4911,)<br>
│── $\cdots$<br>
$\cdots$


#### ***ARAN***

>interim/aran/segments.hdf5<br>
│── Segment ID [int]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── *attributes*<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── *direction* [str]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── *trip_name* [str]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *pass_name* [str]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── Alt [array] (309,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── Bearing [array] (309,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── BeginChainage [array] (309,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── $\ldots$<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── 'Venstre Wheelpath Texture MPD (mm)' [array] (309,)<br>
│── $\cdots$<br>
$\cdots$


#### ***P79***

>interim/p79/segments.hdf5<br>
│── Segment ID [int]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── *attributes*<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── *direction* [str]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── *trip_name* [str]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *pass_name* [str]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── GeoHøjde [array] (3061,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── Højde [array] (3061,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── 'Laser 1 [mm]' [array] (3061,)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── $\ldots$<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── Lon [array] (3061,)<br>
│── $\cdots$<br>
$\cdots$

# Step 5 - Resample
**Command:** `python src/data/make_dataset.py resample` (`--verbose` can be added)

**Code in notebook:**

In [4]:
from src.data.data_functions.resampling import resample

skip_resample = True

if not skip_resample:

    gm_file = "../data/interim/gm/segments.hdf5"
    gopro_file = "../data/interim/gopro/segments.hdf5"
    aran_file = "../data/interim/aran/segments.hdf5"
    p79_file = "../data/interim/p79/segments.hdf5"

    # Resampling the data to a common time base
    resample(gm_file, gopro_file, aran_file, p79_file, verbose=False)

Finally, we are ready to combine the 4 datasets into a single dataset with input data (***GM*** and ***GoPro***) and appertaining targets (***ARAN*** and ***P79***).

However, as it currently stands with the input data, all sensors measure values with different frequencies, meaning that in order to line up the input data as $X_j = \begin{bmatrix}\boldsymbol{x_1}, \boldsymbol{x_2}, \ldots, \boldsymbol{x_n}\end{bmatrix}^T$, where $\boldsymbol{x_i}$ is a vector of a single measurement from each relevant sensor (*hopefully*) at the same location on the road, we need to resample the measurements from each sensor to a fixed frequency (chosen to be 250Hz). This is achieved using simple linear interpolation with respect to the relative time driven in each segment.

Afterwards, each segment is split into data points $X_j$, called *bits*, corresponding of 250 resampled measurements of each sensor. For each *bit*, we then extract the corresponding target data from ***ARAN*** and ***P79*** by finding the closest longitudal and lattitudal coordinates, like in step 4, Matching.

The end result is a single hdf5-file of processed and (*hopefully*) well-aligned data, which is stored in the file `processed/wo_kpis/segments.hdf5`. The naming `wo_kpis` is short for *with out kpis", as they still need to be computed from the ***ARAN*** *bits*. This process is described in the next step.

#### The processed data (with out kpis)
>processed/wo_kpis/segments.hdf5<br>
│── Segment ID [int]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── *attributes*<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── *direction* [str]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── *trip_name* [str]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *pass_name* [str]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── Second [int]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── aran [array] (14, 79)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *attributes*<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *col_name* [str] : *col_idx* [int]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── gm [array] (250, 42)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *attributes*<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *col_name* [str] : *col_idx* [int]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── gopro [array] (250, 15)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *attributes*<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *col_name* [str] : *col_idx* [int]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── p79 [array] (128, 32)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *attributes*<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *col_name* [str] : *col_idx* [int]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── Second [int]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── $\cdots$<br>
│── $\cdots$<br>
$\cdots$

# Step 6 - KPIs
**Command:** `python src/data/make_dataset.py kpi`

**Code in notebook:**

In [9]:
from src.data.data_functions.kpis import compute_kpis

skip_kpi = True

if not skip_kpi:
    segment_path = "../data/processed/wo_kpis/segments.hdf5"
    window_size = [1, 2]
    compute_kpis(segment_path=segment_path, window_size=window_size)

We are interested in certain Key Performance indices (KPIs) that we wish to predict at the end of our pipeline with a machine learning model. For each second in each segment, we calculate KPIs based on a specified window size. We compute the KPIs based on the LiRA practical guide, but with some small variantions.

In [Practical Guide], they present the following five distress types for which they calculate a Damage Index (DI) and a Key Performance Index (KPI):
\begin{align*}
    \text{CrackingSum} &=  (\text{LCS}^2+\text{LCM}^3+\text{LCM}^4 +\text{LCSe}^2+3 \text{TCS} +4 \text{TCM}+5 \text{TCL}+2\text{TCSe})^{0.1} \\
    \text{AlligatorSum} &=  (3\text{ACS} + 4\text{ACM} + 5\text{ACL})^{0.3} \\
    \text{PotholesSum} &=  (5\text{PAS}+7 \text{PAM}+10 \text{PAL}+5 \text{PAD})^{0.1} \\
    \text{RuttingMean} &= ((\text{RDL} + \text{RDR})/2)^{0.5} \\
    \text{IRIMean} &=  ((\text{IRL} + \text{IRR})/2)^{0.2}
\end{align*}
where the first two letters of each variable of the equations are used to identify the distress type or a road property: *LC* is Longitudinal Cracks, *TC* is Transversal Cracks, *AC* is Alligator Cracks, *PA* is Pothole Areas, *RD* is Rutting Depth, and *IR* is the International Roughness Index. The third letter defines the class: *S* is Small, *M* is Medium, *L* is Large, *Se* is Sealed, and *D* is Delamination; in the last two equations, the third letter specifies the wheel path: *L* stands for Left and *R* for Right.

Based on the information from (doi:10.1061/JPEODX.PVENG-1385) and conversations with our supervisor, it was deemed favourable to have an additional index for when patching occurs on the road. It is a reoccurring problem for "Vejdirektoratet" that they do not know where they have patched previous potholes; we propose an additional metric called PatchIndex (PI), which in turn alters the CrackingSum metric,

\begin{align*}
    \text{PatchingSum} &= (\text{LCSe}^2 + 2\text{TCSe})^{0.4}\\
    \text{CrackingSum} &=  (\text{LCS}^2+\text{LCM}^3+\text{LCM}^4+3 \text{TCS} +4 \text{TCM}+5 \text{TCL})^{0.1} \\
    \text{AlligatorSum} &=  (3\text{ACS} + 4\text{ACM} + 5\text{ACL})^{0.3} \\
    \text{PotholesSum} &=  (5\text{PAS}+7 \text{PAM}+10 \text{PAL}+5 \text{PAD})^{0.1} \\
    \text{RuttingMean} &= ((\text{RDL} + \text{RDR})/2)^{0.5} \\
    \text{IRIMean} &= ((\text{IRL} + \text{IRR})/2)^{0.2}
\end{align*}
and our tailored KPIs then become
\begin{align*}
    \text{KPI}_{\text{DI}} &= \text{CrackingSum} + \text {AlligatorSum} + \text{PotholesSum} \\
    \text{KPI}_{\text{RUT}} &= \text{RuttingMean}\\
    \text{KPI}_{\text{PI}} &= \text{PatchingSum} \\
    \text{KPI}_{\text{IRI}} &= \text{IRIMean}\\
\end{align*}

**NOTE**: With these new KPI values for each second, we get the final structure of the processed data.

#### The complete processed data
>processed/w_kpis/segments.hdf5<br>
│── Segment ID [int]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── *attributes*<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── *direction* [str]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── *trip_name* [str]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *pass_name* [str]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── Second [int]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── aran [array] (14, 79)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *attributes*<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *col_name* [str] : *col_idx* [int]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── gm [array] (250, 42)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *attributes*<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *col_name* [str] : *col_idx* [int]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── gopro [array] (250, 15)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *attributes*<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *col_name* [str] : *col_idx* [int]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── p79 [array] (128, 32)<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *attributes*<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── *col_name* [str] : *col_idx* [int]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── kpis [array]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── window_size[0] [array] (4, )<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;└── window_size[1] [array] (4, )<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── Second [int]<br>
│&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;│── $\cdots$<br>
│── $\cdots$<br>
$\cdots$

## Training Data

#### ***NOTE:***
In the training process, we will only deal with the ***GM*** and (*possibly*) the ***GoPro*** parts as training data. The targets of the training process are the ***KPIs***, calculated in step 6, so the ***ARAN*** and ***P79*** parts are no longer relevant.