# Project in Learning Based Inertial Sensing

This is a final project in the course "Learning Based Inertial Sensing" at the Technion, Israel.
This project aims to classify the road surface type which a car is driving upon.

The project is based on an existing research: 
J. Menegazzo and A. von Wangenheim, "Multi-Contextual and Multi-Aspect Analysis for Road Surface Type Classification Through Inertial Sensors and Deep Learning," 2020 X Brazilian Symposium on Computing Systems Engineering (SBESC), Florianopolis, 2020, pp. 1-8, doi: 10.1109/SBESC51047.2020.9277846.

We will explore various machine learning tricks to try to better our understanding and intuition on neural networks, and hopefully outperform the existing research on the matter.

### <img src="https://img.icons8.com/bubbles/50/000000/information.png" style="height:50px;display:inline"> Students Information
---


|Name     |Campus Email| ID  |
|---------|--------------------------------|----------|
|Eyal Kaldor| eyalkaldor@campus.technion.ac.il| 205907330|
|Snir Carmeli| snircarmeli@campus.technion.ac.il| 318880234|
|Dolev Freund| dolev@campus.technion.ac.il| 316216605|

<!-- Add Headline -->


Importing the relevant packages:

In [116]:
# import numpy, pytorch
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader
import torchvision
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

Making sure we are using the GPU for training:

In [117]:
# Work on GPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print('Using device:', device)

Using device: cuda


In [118]:
# Loading the data
col_names = ['timestamp', 'acc_x_dashboard', 'acc_y_dashboard', 'acc_z_dashboard', 'acc_x_above_suspension', 'acc_y_above_suspension', 'acc_z_above_suspension',
              'acc_x_below_suspension', 'acc_y_below_suspension', 'acc_z_below_suspension', 'gyro_x_dashboard', 'gyro_y_dashboard', 'gyro_z_dashboard', 
              'gyro_x_above_suspension', 'gyro_y_above_suspension', 'gyro_z_above_suspension', 'gyro_x_below_suspension', 'gyro_y_below_suspension',
                'gyro_z_below_suspension', 'mag_x_dashboard', 'mag_y_dashboard', 'mag_z_dashboard', 'mag_x_above_suspension', 'mag_y_above_suspension',
                'mag_z_above_suspension', 'temp_dashboard', 'temp_above_suspension', 'temp_below_suspension']
features = col_names # all columns are features
labels_names = ['paved_road', 'unpaved_road', 'dirt_road', 'cobblestone_road', 'asphalt_road', 'no_speed_bump', 'speed_bump_asphalt', 'speed_bump_cobblestone',
           'good_road_left', 'regular_road_left', 'bad_road_left', 'good_road_right', 'regular_road_right', 'bad_road_right']

Loading the dataset:
Every data points is all the rows of a single time stamp from all 4 files.
Between every folder data we will add padding of 1000 rows of zeros to make sure the model will not learn the transition between the folders.

In [119]:
# There are 9 folders with data: PVS 1-9. Each folder contains these data files: 
# 1. dataset_gps_mpu_left.csv
# 2. dataset_gps_mpu_right.csv
# 3. dataset_mpu_left.csv
# 4. dataset_mpu_right.csv
# The output (labels) are in the file dataset_labels.csv
data_files = ['dataset_gps_mpu_left.csv', 'dataset_gps_mpu_right.csv', 'dataset_mpu_left.csv', 'dataset_mpu_right.csv']
data_folders = ['PVS 1', 'PVS 2', 'PVS 3', 'PVS 4', 'PVS 5', 'PVS 6', 'PVS 7', 'PVS 8', 'PVS 9']
labels_file = 'dataset_labels.csv'
# Arrange data in an array: take a window of x samples from all files and stack them together as a row in the array
# Print every k rows to see the progress
k_print = 10000
# Padding for the data array: Each file will be separated by this many rows of zeros
padding = 1e4


data = []
labels_data = []


for folder in data_folders:
    cnt = 0
    files_data = np.hstack([np.genfromtxt(folder + '/' + file, delimiter=',', skip_header=1) for file in data_files])
    total_columns = sum([np.genfromtxt(folder + '/' + file, delimiter=',', skip_header=1).shape[1] for file in data_files])
    files_labels = np.genfromtxt(folder + '/' + labels_file, delimiter=',', skip_header=1)
    print("Loading data from folder: ", folder, "\n")

    num_samples = files_data.shape[0]
    # Preallocate space for data and labels
    folder_data = np.zeros((num_samples, files_data.shape[1]))
    folder_labels = np.zeros((num_samples, files_labels.shape[1]))

    for i in range(num_samples):
        folder_data[i] = files_data[i, :]
        folder_labels[i] = files_labels[i, :]
        cnt += 1
        if cnt % k_print == 0:
            print("Processed {:.2f}% of data from folder: {}".format(100 * cnt / num_samples, folder))

    # Append the folder data and labels to the main lists
    data.extend(folder_data)
    labels_data.extend(folder_labels)
        
    # Add padding if this is not the last folder
    if folder != data_folders[-1]:
        zero_rows_data = np.zeros((int(padding), len(data[0])))
        zero_rows_labels = np.zeros((int(padding), len(labels_data[0])))
        data.extend(zero_rows_data)
        labels_data.extend(zero_rows_labels)    
        print()
        print("Added padding to data from folder: ", folder, "\n")

# Show shape of data and labels without turning them into torch tensors or numpy arrays
print("\nData shape:", len(data), len(data[0]))
print("Labels shape:", len(labels_data), len(labels_data[0]))


Loading data from folder:  PVS 1 

Processed 6.94% of data from folder: PVS 1
Processed 13.89% of data from folder: PVS 1
Processed 20.83% of data from folder: PVS 1
Processed 27.77% of data from folder: PVS 1
Processed 34.71% of data from folder: PVS 1
Processed 41.66% of data from folder: PVS 1
Processed 48.60% of data from folder: PVS 1
Processed 55.54% of data from folder: PVS 1
Processed 62.48% of data from folder: PVS 1
Processed 69.43% of data from folder: PVS 1
Processed 76.37% of data from folder: PVS 1
Processed 83.31% of data from folder: PVS 1
Processed 90.26% of data from folder: PVS 1
Processed 97.20% of data from folder: PVS 1

Added padding to data from folder:  PVS 1 

Loading data from folder:  PVS 2 

Processed 8.02% of data from folder: PVS 2
Processed 16.04% of data from folder: PVS 2
Processed 24.06% of data from folder: PVS 2
Processed 32.08% of data from folder: PVS 2
Processed 40.10% of data from folder: PVS 2
Processed 48.12% of data from folder: PVS 2
Process

In [120]:
# Convert data and labels to torch tensors
data = torch.tensor(data, dtype=torch.float)
labels_data = torch.tensor(labels_data, dtype=torch.float)

# Show shape of data and labels after turning them into torch
print("\nData shape:", data.shape)
print("Labels shape:", labels_data.shape)



Data shape: torch.Size([1160905, 120])
Labels shape: torch.Size([1160905, 14])


In [121]:
# Function to split data into train, validation and test sets
def train_val_test_split(data, labels, train_size=0.7, val_size=0.15, test_size=0.15, random_state=None):
    assert train_size + val_size + test_size == 1, "Train, validation and test sizes must sum to 1"
    
    data_train, data_temp, labels_train, labels_temp = train_test_split(data, labels, test_size=(val_size + test_size), random_state=random_state)
    data_val, data_test, labels_val, labels_test = train_test_split(data_temp, labels_temp, test_size=test_size/(test_size + val_size), random_state=random_state)
    
    return data_train, data_val, data_test, labels_train, labels_val, labels_test

In [122]:
# Divide the data into windows of 100 samples each
window_size = 100
data_windows = data.unfold(0, window_size, window_size)
# Create empty torch tensor to store the windows of labels
labels_windows = torch.zeros((data_windows.shape[0], labels_data.shape[1]))

cnt = 0
# Per each window, take the mean of the labels every window_size samples
for i in range(len(labels_data)):
    if i + window_size <= labels_windows.shape[0]:
        tmp_mean = torch.tensor([labels_data[i:i+window_size, j].mean() for j in range(labels_data.shape[1])])
        labels_windows[cnt] = tmp_mean
        cnt += 1
        i += window_size


# Show shape of data and labels after dividing them into windows
print("\nData windows shape:", data_windows.shape)
print("Labels windows shape:", labels_windows.shape)



Data windows shape: torch.Size([11609, 120, 100])
Labels windows shape: torch.Size([11609, 14])


In [123]:
# Split the data into training, validation and test sets based on the following ratios
train_size = 0.6
val_size = 0.15
test_size = 0.25

# Split the data into training, validation and test sets
data_train, data_val, data_test, labels_train, labels_val, labels_test = train_val_test_split(data_windows, labels_windows, train_size=train_size, val_size=val_size, test_size=test_size, random_state=42)

# Show shape of data and labels after splitting them into training, validation and test sets
print("\nData train shape:", data_train.shape)
print("Labels train shape:", labels_train.shape)
print("\nData validation shape:", data_val.shape)
print("Labels validation shape:", labels_val.shape)
print("\nData test shape:", data_test.shape)
print("Labels test shape:", labels_test.shape)


Data train shape: torch.Size([6965, 120, 100])
Labels train shape: torch.Size([6965, 14])

Data validation shape: torch.Size([1741, 120, 100])
Labels validation shape: torch.Size([1741, 14])

Data test shape: torch.Size([2903, 120, 100])
Labels test shape: torch.Size([2903, 14])


In [None]:
# Create a baseline FC neural network model.
# The model will have 10 hidden layers with 100 neurons each, and a ReLU activation function.
# The input size is the number of features * window_size, and the output size is the number of labels.

class FCNN(nn.Module):
    def __init__(self, input_size, output_size, hidden_size, num_hidden_layers):
        super(FCNN, self).__init__()
        self.input_size = input_size
        self.output_size = output_size
        self.hidden_size = hidden_size
        self.num_hidden_layers = num_hidden_layers
        
        self.layers = nn.ModuleList()
        self.layers.append(nn.Linear(self.input_size, self.hidden_size))
        self.layers.append(nn.ReLU())
        for i in range(self.num_hidden_layers):
            self.layers.append(nn.Linear(self.hidden_size, self.hidden_size))
            self.layers.append(nn.ReLU())
        self.layers.append(nn.Linear(self.hidden_size, self.output_size))
        self.layers.append(nn.Sigmoid())
        
    def forward(self, x):
        for layer in self.layers:
            x = layer(x)
        return x
    
# Define the hyperparameters
input_size = data_train.shape[1] * data_train.shape[2]
output_size = labels_train.shape[1]
hidden_size = 10
num_hidden_layers = 10
# Create the model
model = FCNN(input_size, output_size, hidden_size, num_hidden_layers)
model.to(device)

# Define the loss function (Cross-Entropy loss) and the optimizer (Adam)
criterion = nn.BCELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Define the batch size and the number of epochs for stochastic gradient descent
batch_size = 256
num_epochs = 1e2

# Create DataLoaders for the training and validation data
train_dataset = TensorDataset(data_train, labels_train)
train_loader = DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)

val_dataset = TensorDataset(data_val, labels_val)
val_loader = DataLoader(dataset=val_dataset, batch_size=batch_size, shuffle=False)

# Lists to store loss values
train_loss_values = []
val_loss_values = []

# Train the model
for epoch in range(int(num_epochs)):
    model.train()
    epoch_train_loss = 0
    for i, (data, labels) in enumerate(train_loader):
        data = data.to(device)
        labels = labels.to(device)
        
        # Flatten the input data
        data = data.view(data.size(0), -1)
        
        # Forward pass
        outputs = model(data)
        loss = criterion(outputs, labels)
        
        # Backward pass and optimization
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        epoch_train_loss += loss.item()
        
    # Average training loss for the epoch
    epoch_train_loss /= len(train_loader)
    train_loss_values.append(epoch_train_loss)
    
    # Calculate validation loss
    model.eval()
    epoch_val_loss = 0
    with torch.no_grad():
        for data, labels in val_loader:
            data = data.to(device)
            labels = labels.to(device)
            
            # Flatten the input data
            data = data.view(data.size(0), -1)
            
            # Forward pass
            outputs = model(data)
            loss = criterion(outputs, labels)
            
            epoch_val_loss += loss.item()
    
    # Average validation loss for the epoch
    epoch_val_loss /= len(val_loader)
    val_loss_values.append(epoch_val_loss)
    
    if (epoch + 1) % 10 == 0:
        print('Epoch [{}/{}], Train Loss: {:.4f}, Validation Loss: {:.4f}'.format(epoch + 1, int(num_epochs), epoch_train_loss, epoch_val_loss))

# Plot the loss values
plt.plot(range(int(num_epochs)), train_loss_values, label='Train Loss')
plt.plot(range(int(num_epochs)), val_loss_values, label='Validation Loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.title('Train and Validation Loss vs Epochs')
plt.legend()
plt.show()


Epoch [10/100], Train Loss: 0.8217, Validation Loss: 0.8932
Epoch [20/100], Train Loss: 0.8519, Validation Loss: 0.8932
Epoch [30/100], Train Loss: 0.8253, Validation Loss: 0.8932
Epoch [40/100], Train Loss: 0.8192, Validation Loss: 0.8932


In [None]:
# Evaluate the model on the validation set - calculate the accuracy
model.eval()
val_correct = 0
val_total = 0
with torch.no_grad():
    for data, labels in val_loader:
        data = data.to(device)
        labels = labels.to(device)
        
        # Flatten the input data
        data = data.view(data.size(0), -1)
        
        # Forward pass
        outputs = model(data)
        
        # Apply threshold to get predicted labels
        predicted = (outputs > 0.5).float()
        
        # Calculate accuracy
        val_correct += (predicted == labels).all(dim=1).sum().item()
        val_total += labels.size(0)

val_accuracy = val_correct / val_total
print('Validation Accuracy: {:.2f}%'.format(100 * val_accuracy))

# Evaluate the model on the test set - calculate the accuracy
test_dataset = TensorDataset(data_test, labels_test)
test_loader = DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)

model.eval()
test_correct = 0
test_total = 0
with torch.no_grad():
    for data, labels in test_loader:
        data = data.to(device)
        labels = labels.to(device)
        
        # Flatten the input data
        data = data.view(data.size(0), -1)
        
        # Forward pass
        outputs = model(data)
        
        # Apply threshold to get predicted labels
        predicted = (outputs > 0.5).float()
        
        # Calculate accuracy
        test_correct += (predicted == labels).all(dim=1).sum().item()
        test_total += labels.size(0)

test_accuracy = test_correct / test_total
print('Test Accuracy: {:.2f}%'.format(100 * test_accuracy))




Validation Accuracy: 96.27%
Test Accuracy: 96.18%
