# Multi-environment automotive QoS prediction using AI/ML
## Starter notebook

Example of a solution to the problem statement "Multi-environment automotive QoS prediction using AI/ML" from [ITU AI/ML in 5G Challenge 2023](https://aiforgood.itu.int/about-ai-for-good/aiml-in-5g-challenge/)


In [1]:
from pathlib import Path
import pandas as pd
from sklearn.ensemble import RandomForestRegressor

### Data loading

The data is available at [IEEE Dataport](https://ieee-dataport.org/open-access/berlin-v2x).

You can directly load the preprocessed dataset `cellular_dataframe.parquet`. Alternatively, you can modify the preprocessing chain to boost performance (e.g. by using a finer resolution as 1s). For more information, check the [preprocessing notebooks](https://github.com/fraunhoferhhi/BerlinV2X/tree/main/preprocess).



In [2]:
# Optional: preprocess the data from the sources

data_path = Path.cwd().parent/"data"
cell_df = pd.read_parquet(data_path/"cellular_dataframe.parquet")

### Problem setup

In this example, we focus on downlink datarate as our QoS value to predict and we define train and test data as coming from operator 1 and 2, respectively.

In [3]:
# Filter only for downlink measurements and a target datarate of 350 Mbps, which was set for DL datarate measurements

filtered_data = cell_df.query("direction == 'downlink' & target_datarate == 350000000")

# Remove incomplete measurements without datarate
filtered_data = filtered_data.dropna(subset='datarate')

# Train and test split along operators
train_data = filtered_data.query("operator == 1")
test_data = filtered_data.query("operator == 2")

### Feature selection

In [5]:
qos_column = 'datarate'
feature_columns = [
     'PCell_RSRP_max',
     'PCell_RSRQ_max',
     'PCell_RSSI_max',
     'PCell_SNR_1',
     'PCell_SNR_2',
     'PCell_Downlink_Num_RBs',
     'PCell_Downlink_TB_Size',
     'PCell_Downlink_Average_MCS',
     'PCell_Downlink_bandwidth_MHz',
     'PCell_Cell_Identity',
     'PCell_freq_MHz',
     'SCell_RSRP_max',
     'SCell_RSRQ_max',
     'SCell_RSSI_max',
     'SCell_SNR_1',
     'SCell_SNR_2',
     'SCell_Downlink_Num_RBs',
     'SCell_Downlink_TB_Size',
     'SCell_Downlink_Average_MCS',
     'SCell_Downlink_bandwidth_MHz',
     'SCell_Cell_Identity',
     'SCell_freq_MHz',
     'Latitude',
     'Longitude',
     'Altitude',
     'speed_kmh',
     'COG',
     'precipIntensity',
     'precipProbability',
     'temperature',
     'apparentTemperature',
     'dewPoint',
     'humidity',
     'pressure',
     'windSpeed',
     'cloudCover',
     'uvIndex',
     'visibility',
     'Traffic Jam Factor']

x_train, y_train = train_data[feature_columns], train_data[qos_column]
x_test, y_test = test_data[feature_columns], test_data[qos_column]

# Missing value imputation
x_train = x_train.fillna(0)
x_test = x_test.fillna(0)

### Prediction algorithm

You may refine your algorithm with domain adaptation or transfer learning techniques (Check for instance the library [ADAPT](https://adapt-python.github.io/)).

#### Train

In [6]:
# Create algorithm
rf = RandomForestRegressor()

# Train
rf.fit(x_train, y_train)


#### Compute test score

In [7]:
rf.score(x_test, y_test)

0.8207325503480274