In [1]:
import pandas as pd
from sktime.utils.data_processing import from_multi_index_to_nested

import plotly.express as px

In [2]:
data = pd.read_csv('cleaned_data.csv', index_col=['vehicle_id', 'timestep_time'])

In order to work with the time-series algorithms from the `sktime` package, it is necessary to transform our dataset into a *nested* one; a dataset in which each row contains an instance (vehicle ID), each column contains a feature and each cell is an entire time-series. To do this, we will first index our dataframe by both `vehicle_id` and `timestep_time`, making it a muti-indexed dataframe. Then, the function `from_multi_index_to_nested` will transform our structure into a nested frame.

In [3]:
# transforms dataframe into a nested sctructure of time-series
index = data.index.get_level_values('vehicle_id').unique()
data = from_multi_index_to_nested(data, instance_index='vehicle_id')
data.index = index

# after the transformation we have a series in each entry containing the repeated values for vehicle type, 
# this makes the feature a single value column again
data['vehicle_type'] = data['vehicle_type'].map(lambda vehicle_type_series: vehicle_type_series.iloc[0])

We'll now create some new features based on the speed and orientation of vehicles and see how they behave with respect to the class they belong to. To do so, we'll compute the mean and standard deviation of the speed of each vehicle, as well as the absolute variation of speed and angle for each timestep and their mean for each vehicle.

In [19]:
# speed mean and standard deviation
data['mean_vehicle_speed'] = data['vehicle_speed'].map(lambda speed: speed.mean())
data['std_vehicle_speed'] = data['vehicle_speed'].map(lambda speed: speed.std())

# speed and angle variation
data['speed_variation'] = data['vehicle_speed'].map(lambda speed: speed.diff().abs())
data['angle_variation'] = data['vehicle_angle'].map(lambda angle: angle.diff().abs())

# speed and angle variation mean
data['mean_speed_variation'] = data['speed_variation'].map(lambda speed_variation: speed_variation.mean())
data['mean_angle_variation'] = data['angle_variation'].map(lambda angle_variation: angle_variation.mean())

In [78]:
for feature in ['mean_vehicle_speed', 'std_vehicle_speed', 'mean_speed_variation', 'mean_angle_variation']:

    fig = px.histogram(data, x=feature, 
                        histnorm='percent', color='vehicle_type', 
                        title=feature + ' percent distribuion', height=300, width=700)
    fig.show()

Based on the plots above, it seems like looking only at the data in a tabular manned (that is, not considering the *time* aspect) is not a good way to properly distinguish cars from busses. We'll try to approach this problem using algorithms that consider the aspect of time.