![Traffic](traffic.png)

Traffic data fluctuates constantly or is affected by time. Predicting it can be challenging, but this task will help sharpen your time-series skills. With deep learning, you can use abstract patterns in data that can help boost predictability.

Your task is to build a system that can be applied to help you predict traffic volume or the number of vehicles passing at a specific point and time. Determining this can help reduce road congestion, support new designs for roads or intersections, improve safety, and more! Or, you can use to help plan your commute to avoid traffic!

The dataset provided contains the hourly traffic volume on an interstate highway in Minnesota, USA. It also includes weather features and holidays, which often impact traffic volume.

Time to predict some traffic!

### The data:

The dataset is collected and maintained by UCI Machine Learning Repository. The target variable is `traffic_volume`. The dataset contains the following and has already been normalized and saved into training and test sets:

`train_scaled.csv`, `test_scaled.csv`
| Column     | Type       | Description              |
|------------|------------|--------------------------|
|`temp`                   |Numeric            |Average temp in kelvin|
|`rain_1h`                |Numeric            |Amount in mm of rain that occurred in the hour|
|`snow_1h`                |Numeric            |Amount in mm of snow that occurred in the hour|
|`clouds_all`             |Numeric            |Percentage of cloud cover|
|`date_time`              |DateTime           |Hour of the data collected in local CST time|
|`holiday_` (11 columns)  |Categorical        |US National holidays plus regional holiday, Minnesota State Fair|
|`weather_main_` (11 columns)|Categorical     |Short textual description of the current weather|
|`weather_description_` (35 columns)|Categorical|Longer textual description of the current weather|
|`traffic_volume`         |Numeric            |Hourly I-94 ATR 301 reported westbound traffic volume|
|`hour_of_day`|Numeric|The hour of the day|
|`day_of_week`|Numeric|The day of the week (0=Monday, Sunday=6)|
|`day_of_month`|Numeric|The day of the month|
|`month`|Numeric|The number of the month|
|`traffic_volume`         |Numeric            |Hourly I-94 ATR 301 reported westbound traffic volume|

In [61]:
# Import the relevant libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.svm import SVR

from xgboost import XGBRegressor

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader

In [3]:
# Read the traffic data from the CSV training and test files
train_scaled_df = pd.read_csv('train_scaled.csv')
test_scaled_df = pd.read_csv('test_scaled.csv')

# Convert the DataFrame to NumPy arrays
train_scaled = train_scaled_df.to_numpy()
test_scaled = test_scaled_df.to_numpy()

In [4]:
train_scaled_df

Unnamed: 0,temp,rain_1h,snow_1h,clouds_all,holiday_Christmas Day,holiday_Columbus Day,holiday_Independence Day,holiday_Labor Day,holiday_Martin Luther King Jr Day,holiday_Memorial Day,holiday_New Years Day,holiday_State Fair,holiday_Thanksgiving Day,holiday_Veterans Day,holiday_Washingtons Birthday,weather_main_Clear,weather_main_Clouds,weather_main_Drizzle,weather_main_Fog,weather_main_Haze,weather_main_Mist,weather_main_Rain,weather_main_Smoke,weather_main_Snow,weather_main_Squall,weather_main_Thunderstorm,weather_description_SQUALLS,weather_description_Sky is Clear,weather_description_broken clouds,weather_description_drizzle,weather_description_few clouds,weather_description_fog,weather_description_haze,weather_description_heavy intensity drizzle,weather_description_heavy intensity rain,weather_description_heavy snow,weather_description_light intensity drizzle,weather_description_light intensity shower rain,weather_description_light rain,weather_description_light rain and snow,weather_description_light shower snow,weather_description_light snow,weather_description_mist,weather_description_moderate rain,weather_description_overcast clouds,weather_description_proximity shower rain,weather_description_proximity thunderstorm,weather_description_proximity thunderstorm with drizzle,weather_description_proximity thunderstorm with rain,weather_description_scattered clouds,weather_description_shower drizzle,weather_description_shower snow,weather_description_sky is clear,weather_description_smoke,weather_description_snow,weather_description_thunderstorm,weather_description_thunderstorm with heavy rain,weather_description_thunderstorm with light drizzle,weather_description_thunderstorm with light rain,weather_description_thunderstorm with rain,weather_description_very heavy rain,hour_of_day,day_of_week,day_of_month,month,traffic_volume
0,0.935245,0.0,0.0,0.40,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.391304,0.166667,0.033333,0.818182,0.761676
1,0.938749,0.0,0.0,0.75,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.434783,0.166667,0.033333,0.818182,0.620330
2,0.939463,0.0,0.0,0.90,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.478261,0.166667,0.033333,0.818182,0.654808
3,0.941247,0.0,0.0,0.90,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.521739,0.166667,0.033333,0.818182,0.690385
4,0.944524,0.0,0.0,0.75,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.565217,0.166667,0.033333,0.818182,0.675549
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
34037,0.810537,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.826087,1.000000,1.000000,1.000000,0.411401
34038,0.810375,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.869565,1.000000,1.000000,1.000000,0.375137
34039,0.809791,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.913043,1.000000,1.000000,1.000000,0.341071
34040,0.809434,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.956522,1.000000,1.000000,1.000000,0.280357


In [5]:
methods = ["pearson", "spearman", "kendall"]
target = "traffic_volume" 
treshold = 0.15
for method in methods:
    print(f"\n--- Correlation ({method}) ---")
    correlations = train_scaled_df.corr(method=method)[target]
    filtered_correlations = correlations[(correlations > treshold) | (correlations < -treshold)]
    print(filtered_correlations)



--- Correlation (pearson) ---
hour_of_day       0.35465
traffic_volume    1.00000
Name: traffic_volume, dtype: float64

--- Correlation (spearman) ---
hour_of_day       0.335520
day_of_week      -0.151113
traffic_volume    1.000000
Name: traffic_volume, dtype: float64

--- Correlation (kendall) ---
hour_of_day       0.171012
traffic_volume    1.000000
Name: traffic_volume, dtype: float64


In [37]:
model = XGBRegressor()
model.fit(train_scaled_df.drop(["traffic_volume"], axis = 1), train_scaled_df["traffic_volume"])

# Importance des features
importance = model.feature_importances_
features = train_scaled_df.drop(["traffic_volume"], axis=1).columns
sorted_idx = np.argsort(importance)

for i, value in enumerate(importance[sorted_idx]):
    if value > 0.15:
        print(features[sorted_idx][i], "              ", value)

hour_of_day                0.7510283


For this prediction only the hours_of_day and the day of the week are important and linked to the traffic volum, we will use these variable to predict it.

In [48]:
train_scaled_df_to_predict = train_scaled_df[["hour_of_day", "day_of_week", "traffic_volume"]]
train_data = train_scaled_df_to_predict.to_numpy()

test_scaled_df_to_predict = test_scaled_df[["hour_of_day", "day_of_week", "traffic_volume"]]
test_data = test_scaled_df_to_predict.to_numpy()


# ---- 1. Extract X and y ----
X = np.delete(train_data, -1, axis=1)  # Supprime la 2ᵉ colonne (cible)
y = train_data[:, -1]  # Garde uniquement la 2ᵉ colonne

# ---- 2. SPLIT TRAIN/VALID ----
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
X_test, y_test = test_data[:, :-1], test_data[:,-1]

We will use an lstm model to predict the traffick volume in first part

# 1. LSTM Model

In this section, we modifed our data to put in a model

In [7]:
# ---- 1. TENSORS PYTORCH ----
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32).unsqueeze(1)  # Ajout d'une dimension
X_val_tensor = torch.tensor(X_val, dtype=torch.float32)
y_val_tensor = torch.tensor(y_val, dtype=torch.float32).unsqueeze(1)

# ---- 2. DATALOADERS ----
BATCH_SIZE = 32  
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
val_dataset = TensorDataset(X_val_tensor, y_val_tensor)

train_loader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True, drop_last = True)
val_loader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False, drop_last = True)

print(f"length train : {len(train_loader.dataset)} | Length val : {len(val_loader.dataset)}")


length train : 27233 | Length val : 6809


In [17]:
class LSTMModel(nn.Module):
    def __init__(self, input_dim, seq_length, BATCH_SIZE, hidden_dim=64, num_layers=2, dropout=0.1):
        super(LSTMModel, self).__init__()

        self.seq_length = seq_length

        # Définir les couches LSTM
        self.lstm = nn.LSTM(input_dim, hidden_dim, num_layers=num_layers, batch_first=True, dropout=dropout)

        # Couche fully connected pour prédiction
        self.fc_out = nn.Linear(hidden_dim, BATCH_SIZE)

    def forward(self, x):
        # x : (batch_size, seq_length, input_dim)
        # Passer à travers la couche LSTM
        lstm_out, (hn, cn) = self.lstm(x)
        
        # Utiliser la dernière sortie (hidden state) du LSTM pour la prédiction
        # hn[-1] donne la sortie de la dernière couche du LSTM
        x = hn[-1, :]  # (batch_size, hidden_dim)

        # Passer à travers la couche de sortie
        x = self.fc_out(x)  # (batch_size, output_dim)
        return x

# ---- Parameters ----
input_dim = X_train.shape[1]  # 2 caractéristiques (par exemple)

seq_length = 5  
BATCH_SIZE = 16
LR = 0.001
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"


model = LSTMModel(input_dim=input_dim, BATCH_SIZE = BATCH_SIZE, seq_length=seq_length)

# Optimizer configuration
optimizer = optim.Adam(model.parameters(), lr=LR)
loss_fn = nn.MSELoss()

# ---- MODELE in trainning ----
model = model.to(DEVICE)


In [19]:
def train(model, train_loader, optimizer, loss_fn, epochs=50):
    for epoch in range(epochs):
        model.train()  # Mode entraînement
        epoch_loss = 0

        for X_batch, y_batch in train_loader:
            X_batch = X_batch.to(DEVICE)
            y_batch = y_batch.to(DEVICE)

            # Zero gradients
            optimizer.zero_grad()

            # Forward pass
            y_pred = model(X_batch)

            # Loss calculation
            loss = loss_fn(y_pred, y_batch)
            epoch_loss += loss.item()

            # Backward pass
            loss.backward()

            # Optimization
            optimizer.step()

        avg_epoch_loss = epoch_loss / len(train_loader)
        print(f"Epoch {epoch+1}/{epochs}, Loss: {avg_epoch_loss:.4f}")

train(model, train_loader, optimizer, loss_fn, epochs=10)


Epoch 1/10, Loss: 0.0775
Epoch 2/10, Loss: 0.0748
Epoch 3/10, Loss: 0.0747
Epoch 4/10, Loss: 0.0747
Epoch 5/10, Loss: 0.0746


In [21]:
def evaluate(model, val_loader):
    model.eval()  # Mode évaluation
    val_loss = 0
    with torch.no_grad():  # Pas de calcul de gradients
        for X_batch, y_batch in val_loader:
            X_batch = X_batch.to(DEVICE)
            y_batch = y_batch.to(DEVICE)

            # Prediction
            y_pred = model(X_batch)
            # Loss
            loss = loss_fn(y_pred, y_batch)
            val_loss += loss.item()

    avg_val_loss = val_loss / len(val_loader)
    print(f"Validation Loss: {avg_val_loss:.4f}")

evaluate(model, val_loader)


Validation Loss: 0.0749


Even though we modified our model parameters, made it more complex, and even changed the model (trying LSTM and Transformer models), we barely achieved any satisfactory results. The neural network can't capture any relationships between variables. Let's try another model.

# 2. RandomForest

In [43]:
def mse(y_true, y_pred):
    return np.mean((np.array(y_true) - np.array(y_pred))**2)

In [44]:
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)
y_pred = model.predict(X_val)
mse_rf = mse(y_pred,y_val)

In [47]:
print(f"We got an mse for the rf model of {mse_rf}")

We got an mse for the rf model of 0.004748604420103246


In [45]:
print("Let's compare the y_pred and y_validation")
y_pred[:10], y_val[:10]

(array([0.10680628, 0.56844644, 0.37581648, 0.15642276, 0.35960718,
        0.10372203, 0.69745894, 0.68252516, 0.05795166, 0.68032369]),
 array([0.10824176, 0.56799451, 0.38832418, 0.14313187, 0.38763736,
        0.09464286, 0.69532967, 0.7271978 , 0.05041209, 0.62142857]))

Let's use our test data

In [51]:
y_pred = model.predict(X_test)
mse_rf = mse(y_pred,y_test)
print(f"We got an mse for the rf model of {mse_rf}")
y_pred[:10], y_val[:10]

We got an mse for the rf model of 0.004158103005777831


(array([0.08437096, 0.05313975, 0.03942879, 0.04635717, 0.10949877,
        0.35776978, 0.70048014, 0.81489135, 0.73209014, 0.63067366]),
 array([0.10824176, 0.56799451, 0.38832418, 0.14313187, 0.38763736,
        0.09464286, 0.69532967, 0.7271978 , 0.05041209, 0.62142857]))

RandomForest model give a good prediction for our traffick volume

# 3. Perceptron

## 3.1 Without pytorch

In [52]:
model = MLPRegressor(hidden_layer_sizes=(100, 50), activation='relu', solver='adam', max_iter=500)
model.fit(X_train, y_train)
y_pred = model.predict(X_val)
mse_percep = mse(y_pred,y_val)
print(f"We got an mse for the perceptron model of {mse_percep}")
y_pred[:10], y_val[:10]

We got an mse for the perceptron model of 0.005501778067727286


(array([0.1258431 , 0.60157645, 0.38939445, 0.13704699, 0.32353024,
        0.08563507, 0.71483557, 0.71999945, 0.04756623, 0.66295536]),
 array([0.10824176, 0.56799451, 0.38832418, 0.14313187, 0.38763736,
        0.09464286, 0.69532967, 0.7271978 , 0.05041209, 0.62142857]))

In [54]:
print("In the test data we have the following results.")

y_pred = model.predict(X_test)
mse_percep = mse(y_pred,y_test)
print(f"We got an mse for the perceptron model of {mse_percep}")
y_pred[:10], y_val[:10]

In the test data we have the following results.
We got an mse for the perceptron model of 0.004913315939513915


(array([0.08351998, 0.06247367, 0.05279718, 0.04269203, 0.13407955,
        0.36447061, 0.69458166, 0.81737396, 0.73243737, 0.64684801]),
 array([0.10824176, 0.56799451, 0.38832418, 0.14313187, 0.38763736,
        0.09464286, 0.69532967, 0.7271978 , 0.05041209, 0.62142857]))

## 3.2 With pytorch

In [91]:
class MLP(nn.Module):
    def __init__(self, input_size, dropout_rate=0.5):
        super(MLP, self).__init__()
        self.hidden1 = nn.Linear(input_size, 50)
        self.hidden2 = nn.Linear(50, 50)
        
        self.dropout1 = nn.Dropout(dropout_rate)
        self.dropout2 = nn.Dropout(dropout_rate)
        
        self.output = nn.Linear(50, 1)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.hidden1(x))
        x = self.dropout1(x)  # Dropout après la première couche cachée
        
        x = self.relu(self.hidden2(x))
        x = self.dropout2(x)  # Dropout après la deuxième couche cachée
        
        x = self.output(x)
        return x
    
    
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32).view(-1, 1)
X_val_tensor = torch.tensor(X_val, dtype=torch.float32)
y_val_tensor = torch.tensor(y_val, dtype=torch.float32).view(-1, 1)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32).view(-1, 1)
    
    
input_size = X_train.shape[1]
model = MLP(input_size)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

In [93]:
# Training
epochs = 1000
for epoch in range(epochs):
    optimizer.zero_grad()         # Réinitialiser les gradients
    y_pred_train = model(X_train_tensor)  # Prédiction
    loss = criterion(y_pred_train, y_train_tensor)  # Calcul de la MSE
    loss.backward()  # Rétropropagation
    optimizer.step()  # Mise à jour des poids
    if epoch % 50 == 0:
        print(f"epoch {epoch}/{epochs}")
        print(f"We got a loss of {loss}")
print(f"epoch {epoch+1}/{epochs}")

epoch 0/1000
We got a loss of 0.027682336047291756
epoch 50/1000
We got a loss of 0.025823771953582764
epoch 100/1000
We got a loss of 0.02492072619497776
epoch 150/1000
We got a loss of 0.024510309100151062
epoch 200/1000
We got a loss of 0.024210751056671143
epoch 250/1000
We got a loss of 0.02348221279680729
epoch 300/1000
We got a loss of 0.02274532988667488
epoch 350/1000
We got a loss of 0.02252703718841076
epoch 400/1000
We got a loss of 0.022166889160871506
epoch 450/1000
We got a loss of 0.021985113620758057
epoch 500/1000
We got a loss of 0.021794702857732773
epoch 550/1000
We got a loss of 0.021038426086306572
epoch 600/1000
We got a loss of 0.020769309252500534
epoch 650/1000
We got a loss of 0.020838811993598938
epoch 700/1000
We got a loss of 0.02053014002740383
epoch 750/1000
We got a loss of 0.020498789846897125
epoch 800/1000
We got a loss of 0.020295361056923866
epoch 850/1000
We got a loss of 0.020103152841329575
epoch 900/1000
We got a loss of 0.01986355148255825
ep

In [94]:
# Validations

# Prédictions sur les données de validation
y_pred_val = model(X_val_tensor).detach().numpy()
y_val_numpy = y_val_tensor.numpy()

# Calcul de l'erreur quadratique moyenne (MSE)
mse_percep = np.mean((y_pred_val - y_val_numpy) ** 2)

# Affichage des résultats
print(f"We got an MSE for the perceptron model of {mse_percep}")
print("First 10 predictions:", y_pred_val[:10].flatten())
print("First 10 actual values:", y_val_numpy[:10].flatten())

We got an MSE for the perceptron model of 0.01924896612763405
First 10 predictions: [0.1194335  0.52425367 0.2713595  0.22906551 0.26318723 0.14323276
 0.6237337  0.8541675  0.19013226 0.6503935 ]
First 10 actual values: [0.10824176 0.56799453 0.38832417 0.14313187 0.38763738 0.09464286
 0.69532967 0.7271978  0.05041209 0.62142855]


In [95]:
# Predictions test data

# Prédictions sur les données de validation
y_pred_test = model(X_test_tensor).detach().numpy()
y_test_numpy = y_test_tensor.numpy()

# Calcul de l'erreur quadratique moyenne (MSE)
mse_percep = np.mean((y_pred_test - y_test_numpy) ** 2)

# Affichage des résultats
print(f"We got an MSE for the perceptron model of {mse_percep}")
print("First 10 predictions:", y_pred_test[:10].flatten())
print("First 10 actual values:", y_test_numpy[:10].flatten())

We got an MSE for the perceptron model of 0.019152067601680756
First 10 predictions: [0.08832495 0.01784754 0.08288351 0.15581873 0.3027485  0.31637144
 0.65479517 0.67319405 0.8661939  0.858151  ]
First 10 actual values: [0.20302197 0.1934066  0.12788461 0.08035714 0.05233517 0.05961538
 0.09423077 0.0929945  0.1114011  0.15412088]


# 4. SVR

In [58]:
model = SVR(kernel="rbf", C=1.0, gamma="scale")
model.fit(X_train, y_train)
y_pred = model.predict(X_val)
mse_SVR = mse(y_pred,y_val)
print(f"We got an mse for the svr model of {mse_SVR}")
y_pred[:10], y_val[:10]

We got an mse for the svr model of 0.009191844964729082


(array([0.05697215, 0.63388069, 0.45887852, 0.20621173, 0.28817294,
        0.14383606, 0.75516033, 0.69452207, 0.05344295, 0.71852822]),
 array([0.10824176, 0.56799451, 0.38832418, 0.14313187, 0.38763736,
        0.09464286, 0.69532967, 0.7271978 , 0.05041209, 0.62142857]))

In [59]:
print("In the test data we have the following results.")

y_pred = model.predict(X_test)
mse_SVR = mse(y_pred,y_test)
print(f"We got an mse for the svr model of {mse_SVR}")
y_pred[:10], y_val[:10]

In the test data we have the following results.
We got an mse for the svr model of 0.008442426161720407


(array([ 0.08355288, -0.04192276, -0.02759946,  0.08959908,  0.26212587,
         0.4419421 ,  0.59019814,  0.68416771,  0.71982636,  0.70960643]),
 array([0.10824176, 0.56799451, 0.38832418, 0.14313187, 0.38763736,
        0.09464286, 0.69532967, 0.7271978 , 0.05041209, 0.62142857]))

# Conclusion


Random forest give the best predictions between all of this 4 (5 with the perceptron with pytorch) model.