# Model: LSTM

## Objective:
Our goal is to predict the **road condition type** based on vehicle sensor readings over time.
Specifically, we want to classify whether the vehicle is on:
- Asphalt
- Cobblestone
- Dirt Road

We will be using the **cleaned dataset** prepared from `01_data_cleaning.ipynb`.

## Dataset:
- Path: `dataset/cleaned_master_dataset.csv`
- Shape: 1,080,905 rows, 81 columns (after cleaning)

## Tasks Overview:
- Load the cleaned dataset
- Basic data exploration (optional, feel free to plot if needed)
- Reshape data into sequences for LSTM input
- Train **LSTM Model**
- Evaluate performance (Accuracy, Confusion Matrix, etc.)

In [21]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

import seaborn as sns
import matplotlib.pyplot as plt

import numpy as np
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from sklearn.preprocessing import StandardScaler
from sklearn.utils.class_weight import compute_class_weight
import tensorflow as tf
from sklearn.metrics import confusion_matrix, classification_report

### Run the cell below if you need to run LSTM on your Mac M2 Chip ONLY

In [15]:
import tensorflow as tf

print("TensorFlow version:", tf.__version__)
print("List of Physical Devices:", tf.config.list_physical_devices())
print("Is GPU available?", tf.config.list_physical_devices('GPU'))

# Disable GPU acceleration (force CPU execution)
tf.config.set_visible_devices([], 'GPU')

print("Running TensorFlow on CPU only")

TensorFlow version: 2.16.1
List of Physical Devices: [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Is GPU available? [PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Running TensorFlow on CPU only


In [17]:
# Load the cleaned master dataset
df = pd.read_csv('dataset/cleaned_master_dataset.csv')

# Quick check
# print(df.shape)
# print(df.head())

## 📌 Building a Simple LSTM Model Before Optimization

#### To understand the impact of hyperparameter tuning, we first implement a basic LSTM model using default parameters. This serves as a benchmark to compare against our optimized model. The base model uses a simple architecture with minimal tuning, demonstrating the initial accuracy and loss before enhancements are applied. We will later analyze how modifications such as layer adjustments, dropout rates, and learning rate scheduling affect performance.

In [18]:
# Select Features (Time-Series Sensor Example)
features = df[['acc_x_dashboard_left', 'acc_y_dashboard_left', 'acc_z_dashboard_left']].values
target = df['dirt_road'].values  # Example: Predicting dirt road (0 or 1)

# Reshape data for LSTM [samples, time_steps, features]
# Here we use a simple window approach, e.g., 10 time steps per sample
sequence_length = 10

X = []
y = []

for i in range(len(features) - sequence_length):
    X.append(features[i:i + sequence_length])
    y.append(target[i + sequence_length])

X = np.array(X)
y = np.array(y)

print(f"X shape: {X.shape}, y shape: {y.shape}")

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build LSTM Model
model = Sequential()
model.add(LSTM(units=50, return_sequences=True, input_shape=(sequence_length, X.shape[2])))
model.add(Dropout(0.2))
model.add(LSTM(units=50))
model.add(Dropout(0.2))
model.add(Dense(units=1, activation='sigmoid'))  # Binary classification

# Compile Model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

# Train Model
history = model.fit(X_train, y_train, epochs=3, batch_size=64, validation_split=0.1)

# Evaluate Model
loss, accuracy = model.evaluate(X_test, y_test)
print(f"Test Accuracy: {accuracy:.4f}")

X shape: (1080895, 10, 3), y shape: (1080895,)
Epoch 1/3


  super().__init__(**kwargs)


[1m12161/12161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m81s[0m 7ms/step - accuracy: 0.7923 - loss: 0.4184 - val_accuracy: 0.8178 - val_loss: 0.3751
Epoch 2/3
[1m12161/12161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m80s[0m 7ms/step - accuracy: 0.8161 - loss: 0.3764 - val_accuracy: 0.8239 - val_loss: 0.3658
Epoch 3/3
[1m12161/12161[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m80s[0m 7ms/step - accuracy: 0.8201 - loss: 0.3677 - val_accuracy: 0.8259 - val_loss: 0.3587
[1m6756/6756[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m7s[0m 1ms/step - accuracy: 0.8234 - loss: 0.3607
Test Accuracy: 0.8238


## 🔬 Base Model Performance and Initial Observations

#### The base LSTM model achieved an accuracy of 82% on the test dataset. While this is a strong result, there is room for improvement. The model was trained using default hyperparameters without tuning for optimal performance. We observed that loss started to plateau early, indicating that further adjustments, such as modifying the learning rate, dropout values, or batch size, could enhance performance. In the next section, we explore hyperparameter tuning to maximize accuracy while maintaining a stable and generalizable model.


------------- ----------------

## 🛠️ Enhancing the LSTM Model Through Hyperparameter Tuning

#### To further improve accuracy and generalization, we now optimize the LSTM model by adjusting key hyperparameters. This includes:
- Increasing the sequence length from 10 to 20 for better temporal learning.
- Using StandardScaler to normalize sensor data.
- Implementing learning rate scheduling for dynamic learning.
- Adding class weights to balance the dataset.
- Reducing the number of LSTM units per layer for efficiency.
- Incorporating early stopping and learning rate reduction for better convergence.

This enhanced model aims to achieve higher accuracy and lower validation loss while preventing overfitting.

In [20]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dropout, Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.utils.class_weight import compute_class_weight

# ✅ Correct Number of Classes
num_classes = 3  # We have three road types: asphalt, cobblestone, dirt

# ✅ Feature Selection (Same as GRU)
features = df[[
    'acc_x_dashboard_left', 'acc_y_dashboard_left', 'acc_z_dashboard_left',
    'acc_x_dashboard_right', 'acc_y_dashboard_right', 'acc_z_dashboard_right',
    'gyro_x_dashboard_left', 'gyro_y_dashboard_left', 'gyro_z_dashboard_left'
]].values

target = df[['asphalt_road', 'cobblestone_road', 'dirt_road']].values  # ✅ Multi-class labels

# Normalize the features
scaler = StandardScaler()
features = scaler.fit_transform(features)

# Create sequences for LSTM
sequence_length = 20  # Ensure it matches GRU
X, y = [], []

for i in range(len(features) - sequence_length):
    X.append(features[i:i + sequence_length])
    y.append(target[i + sequence_length])

X = np.array(X)
y = np.array(y)  # ✅ No `to_categorical(y)`, it's already multi-class

# ✅ Split Data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# ✅ Calculate Class Weights
class_weights = compute_class_weight('balanced', classes=np.unique(np.argmax(y_train, axis=1)), y=np.argmax(y_train, axis=1))
class_weight_dict = dict(enumerate(class_weights))

# ✅ Learning Rate Schedule
initial_learning_rate = 0.001
decay_steps = 1000
decay_rate = 0.9
learning_rate_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
    initial_learning_rate, decay_steps, decay_rate
)
optimizer = tf.keras.optimizers.Adam(learning_rate=learning_rate_schedule)

# ✅ Build Updated LSTM Model
model = Sequential([
    LSTM(units=64, return_sequences=True, input_shape=(sequence_length, features.shape[1])),  # First LSTM layer
    Dropout(0.3), # Regularization to reduce overfitting
    LSTM(units=32, return_sequences=True),  # Second LSTM layer for feature extraction
    Dropout(0.3),
    LSTM(units=16),  # Final LSTM layer before Dense layer
    Dropout(0.3),
    Dense(units=16, activation='relu'),  # Fully connected layer
    Dense(num_classes, activation='softmax')  # ✅ Fix: Multi-class output (3 road types)
])

# ✅ Compile Model (Fix Loss Function)
model.compile(
    optimizer=optimizer,
    loss='categorical_crossentropy',  # ✅ Fix: Multi-class classification
    metrics=['accuracy', tf.keras.metrics.AUC(), tf.keras.metrics.Precision(), tf.keras.metrics.Recall()]
)

# ✅ Callbacks
callbacks = [
    tf.keras.callbacks.EarlyStopping(
        monitor='val_loss',
        patience=5,
        restore_best_weights=True
    ),
    tf.keras.callbacks.ReduceLROnPlateau(
        monitor='val_loss',
        factor=0.2,
        patience=3,
        min_lr=1e-6
    )
]

# ✅ Train Model
print("\nTraining the model...")
history = model.fit(
    X_train, y_train,
    epochs=10,  # Reduced from 50
    batch_size=64,  # Increased from 32 for faster training
    validation_split=0.2,
    callbacks=callbacks,
    class_weight=class_weight_dict,
    verbose=1
)


Model Architecture:


  super().__init__(**kwargs)



Training the model...
Epoch 1/10
[1m10809/10809[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m160s[0m 15ms/step - accuracy: 0.8254 - auc: 0.9169 - loss: 0.3553 - precision: 0.6378 - recall: 0.8683 - val_accuracy: 0.8894 - val_auc: 0.9610 - val_loss: 0.2472 - val_precision: 0.7481 - val_recall: 0.9016 - learning_rate: 3.2019e-04
Epoch 2/10
[1m10809/10809[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m149s[0m 14ms/step - accuracy: 0.8854 - auc: 0.9615 - loss: 0.2468 - precision: 0.7340 - recall: 0.9186 - val_accuracy: 0.8923 - val_auc: 0.9675 - val_loss: 0.2387 - val_precision: 0.7425 - val_recall: 0.9314 - learning_rate: 1.0252e-04
Epoch 3/10
[1m10809/10809[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m148s[0m 14ms/step - accuracy: 0.8937 - auc: 0.9667 - loss: 0.2294 - precision: 0.7490 - recall: 0.9266 - val_accuracy: 0.8960 - val_auc: 0.9695 - val_loss: 0.2294 - val_precision: 0.7498 - val_recall: 0.9336 - learning_rate: 3.2826e-05
Epoch 4/10
[1m10809/10809[0m [32m━━━━━━━━

## 🔍 Final Optimized LSTM Performance and Key Findings

#### After applying hyperparameter tuning, our optimized LSTM model achieved an accuracy of approximately 90%, with a validation loss of 20%. Compared to the base model, this represents a notable improvement in both classification performance and generalization. The key improvements observed include:
- Higher accuracy due to better feature selection and normalization.
- More stable training with ExponentialDecay learning rate scheduling.
- Class balancing through compute_class_weight.
- Stronger generalization with dropout adjustments and layer modifications.
- Early stopping prevented unnecessary training cycles, reducing computation time.

This final model provides an effective trade-off between accuracy and efficiency, making it well-suited for real-world deployment scenarios.

## 📊 Model Evaluation Metrics

To better understand our model's performance, we use multiple evaluation metrics:

- **Accuracy**: Measures the percentage of correctly classified instances.
- **AUC (Area Under the Curve)**: Evaluates the ability of the model to distinguish between classes.
- **Precision**: The proportion of true positives among all predicted positives. High precision means fewer false positives.
- **Recall**: The proportion of actual positives correctly identified by the model. High recall means fewer false negatives.

In our case, we use the following metrics:
```python
metrics=['accuracy', tf.keras.metrics.AUC(), tf.keras.metrics.Precision(), tf.keras.metrics.Recall()]

In [None]:
# Plot training history
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')

plt.subplot(1, 2, 2)
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')

plt.tight_layout()
plt.show()

# Evaluate and print detailed metrics
print("\nEvaluating the model on test data...")
test_results = model.evaluate(X_test, y_test)
print("\nTest Results:")
for metric_name, value in zip(model.metrics_names, test_results):
    print(f"{metric_name}: {value:.4f}")

# Generate predictions and confusion matrix
y_pred = model.predict(X_test)
y_pred_classes = (y_pred > 0.5).astype(int)

cm = confusion_matrix(y_test, y_pred_classes)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()

print("\nClassification Report:")
print(classification_report(y_test, y_pred_classes))

In [14]:
# Save the model (optional)
model.save('../api/models/lstm_road_condition_model_optimized.keras')
print("\nModel saved as 'lstm_road_condition_model_optimized.keras'")


Model saved as 'lstm_road_condition_model_optimized.keras'
