# **Lab 3 - Activity Recognition with Machine Learning**

This notebook implements a machine learning workflow to recognize different physical activities from Respeck sensor data. The dataset includes multiple 30-second recordings of various physical activities (e.g., ascending stairs, shuffle walking, sitting-standing) stored in separate CSV files for each activity.

You will then use the model you develop here and deploy it inside your Android app for live classification.

In this week, you will not have access to the full dataset as of yet. However, you can complete this lab by combining the data that you and your group mates have collected in Coursework 1 as proof-of-concept first for when you eventually receive the full dataset.


# Imports

In [465]:
# Importing libraries that will be used
import pandas as pd
import numpy as np
import glob
import os
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder
from sklearn.metrics import accuracy_score, classification_report
import tensorflow as tf
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.layers import Conv1D, MaxPooling1D, Flatten, Dense, Dropout, LSTM, Input, concatenate



# Reading Files
Reading files from your dataset

In [466]:
# # Put in the path of your dataset here
# respeck_dataset_path = "C:/Users/nikit/Documents/University/Year4/PDIOT/CW3/dataset/Respeck/DailyActivities/"
# thingy_dataset_path = "C:/Users/nikit/Documents/University/Year4/PDIOT/CW3/dataset/Thingy/"

This line uses the glob module to find all file paths that match a specified pattern. The 'glob.glob()' function returns a list of file paths that match the given pattern. `your_dataset_path` should be the directory where your dataset files are located.

The `*` is a wildcard character that matches any string of characters,  so this pattern retrieves all folders in the 'your_dataset_path' directory.

Below is just an example of what your dataset folder can look like. You should refer to the Coursework 3 instructions on what classes your model(s) are expected to be able to classify. Within your dataset directory, there should be subfolders, each representing a class of activity.

In [467]:
# glob.glob(thingy_dataset_path + "*")

In [468]:
# glob.glob(respeck_dataset_path + "*")

To see the files in each subfolder you can similarly do:

In [469]:
# activity_folder = ""
# glob.glob(respeck_dataset_path + "/"+activity_folder+"/*")

In [470]:
# activity_folder = ""
# glob.glob(thingy_dataset_path + "/"+activity_folder+"/*")

# Functions

## Load list of files in an activity folder

In [471]:
# def load_files_from_folder(folder_path):
#     """
#     Load all CSV files from a folder and return a list of file paths.

#     Parameters:
#     folder_path (str): The path to the folder containing CSV files.

#     Returns:
#     list: A list of file paths for all CSV files in the folder.
#     """

#     # Initialize an empty list to store the full file paths of the CSV files
#     file_paths = []

#     # Loop through all the files in the given folder
#     for file_name in os.listdir(folder_path):
#         # Check if the file has a .csv extension (ignores other files)
#         if file_name.endswith('.csv'):
#             # Construct the full file path by joining the folder path and the file name
#             full_file_path = os.path.join(folder_path, file_name)

#             # Append the full file path to the file_paths list
#             file_paths.append(full_file_path)

#     # Return the complete list of CSV file paths
#     return file_paths

In [472]:
import os

def load_files_from_multiple_folders(base_path, subfolders):
    """
    Load all CSV files from multiple folders and return a combined list of file paths.
    
    Parameters:
    base_path (str): The base path containing year folders.
    subfolders (list): List of subfolder paths relative to base_path for each dataset.
    
    Returns:
    list: A list of file paths for all CSV files in the specified folders.
    """
    file_paths = []
    
    for subfolder in subfolders:
        folder_path = os.path.join(base_path, subfolder)
        for file_name in os.listdir(folder_path):
            if file_name.endswith('.csv'):
                file_paths.append(os.path.join(folder_path, file_name))
                
    return file_paths


In [473]:
# Define the base path
base_path = "C:/Users/nikit/Documents/University/Year4/PDIOT/CW3/pdiotapp/datasets"

# Subfolders for each dataset type
respeck_subfolders = ["year1/Respeck/DailyActivities", "year2/Respeck/DailyActivities"]
thingy_subfolders = ["year1/Thingy", "year2/Thingy"]

# Prepare paths to be used by `process_activity`
respeck_paths = [os.path.join(base_path, subfolder) for subfolder in respeck_subfolders]
thingy_paths = [os.path.join(base_path, subfolder) for subfolder in thingy_subfolders]


## Train and test set split from list of files

In [474]:
def split_files(file_list, test_size=0.2):
    """
    Split the list of files into training and test sets.

    Parameters:
    file_list (list): List of file paths to be split into train and test sets.
    test_size (float): The proportion of files to allocate to the test set.
                       Default is 0.2, meaning 20% of the files will be used for testing.

    Returns:
    tuple:
        - train_files (list): List of file paths for the training set.
        - test_files (list): List of file paths for the test set.
    """

    # Split the file list into training and test sets using train_test_split from scikit-learn
    # test_size defines the proportion of the data to use as the test set (default is 20%)
    # shuffle=True ensures that the files are shuffled randomly before splitting
    train_files, test_files = train_test_split(file_list, test_size=test_size, shuffle=True)

    # Return the train and test file lists
    return train_files, test_files

## Sliding Window

In time series Activity Recognition, a sliding window is a commonly used technique to segment continuous sensor data (such as accelerometer readings) into smaller, fixed-length overlapping or non-overlapping time intervals, or windows. Each window contains a sequence of sensor measurements that represent a short period of time, and this segmented data is used to extract features or make predictions about the activity happening within that window.

### Key Concepts of a Sliding Window
1.   **Window Size:** This refers to the length of each segment or window, typically defined in terms of the number of time steps or the duration (e.g., 2 seconds). The window size should be chosen carefully to capture enough information about the activity without making the window too large.
2.   **Step Size:** The step size determines how far the window moves forward after each step. If the step size is smaller than the window size, the windows will overlap. For example, if the window size is 5 seconds and the step size is 2 seconds, there will be a 3-second overlap between consecutive windows. Overlapping windows provide more data for analysis and can help smooth out predictions by capturing transitional activities.
3.   **Non-Overlapping Windows:** If the step size is equal to the window size, the windows do not overlap. This method provides distinct segments of data but may miss transitional phases between activities.

### Why Sliding Windows for Activity Recognition?

* Segmentation of Continuous Data: Activity recognition systems work with continuous streams of sensor data, and the sliding window helps segment these into manageable pieces to classify activities within specific intervals.

* Context Capturing: Human activities are often complex and spread across time. By using a sliding window, you can capture context across a short duration, which may include transitions or small fluctuations in the activity (e.g., a person moving from sitting to standing).

* Feature Extraction: Within each window, features such as mean, variance, frequency domain features, etc., can be extracted to help classify the activity.

* Real-Time Recognition: In real-time systems, the sliding window allows for continuous monitoring and updating of predictions as new data arrives.



In [475]:
def load_and_apply_sliding_windows(file_paths, window_size, step_size, label):
    """
    Load the data from each file, apply sliding windows, and return the windows and labels.

    Parameters:
    file_paths (list): List of file paths to CSV files. Each file contains sensor data (e.g., accelerometer, gyroscope).
    window_size (int): The size of each sliding window (number of time steps).
    step_size (int): The step size (stride) between consecutive windows.
    label (int or str): The label for the activity corresponding to the folder.
                        This label will be assigned to each sliding window extracted from the data.

    Returns:
    tuple:
        - windows (numpy.ndarray): A 3D array of sliding windows, where each window has the shape
                                   (num_windows, window_size, num_features).
        - labels (numpy.ndarray): A 1D array of labels, where each label corresponds to a sliding window.
    """
    # Initialize lists to store sliding windows and their corresponding labels
    windows = []
    labels = []

    # Loop through each file in the provided file paths
    for file_path in file_paths:
        # Load the CSV file into a pandas DataFrame
        data = pd.read_csv(file_path)

        # Select the columns containing the necessary sensor data (acceleration and gyroscope readings)
        # These columns might vary depending on your dataset's structure
        data = data[['accel_x', 'accel_y', 'accel_z']]

        # Convert the DataFrame into a numpy array for faster processing in the sliding window operation
        data = data.to_numpy()

        # Get the number of samples (rows) and features (columns) in the data
        num_samples, num_features = data.shape

        # Apply sliding windows to the data
        # The range function defines the start of each window, moving step_size increments at a time
        for i in range(0, num_samples - window_size + 1, step_size):
            # Extract a window of size 'window_size' from the current position 'i'
            window = data[i:i + window_size, :]

            # Append the window to the windows list
            windows.append(window)

            # Assign the activity label to the window and append it to the labels list
            labels.append(label)

    # Convert the lists of windows and labels into numpy arrays for efficient numerical operations
    return np.array(windows), np.array(labels)

## Load and Split Train Test for Each Activity Folder

This function processes the sensor data for a specific activity, such as 'walking' or 'running', stored in its respective folder. It splits the data into training and testing sets, applies sliding windows, and labels the windows with the corresponding activity. This function can be used repeatedly for each activity to process and prepare data for training and evaluation.

In [476]:
def process_activity(activity, label, respeck_paths, thingy_paths, window_size=100, step_size=50, test_size=0.2):
    """
    Load data from multiple folders for a specific activity, apply sliding windows, 
    and split into train and test sets.
    
    Parameters:
    activity (str): The name of the activity.
    label (int): The label associated with the activity.
    respeck_paths (list): List of paths for Respeck data folders (e.g., both year1 and year2 folders).
    thingy_paths (list): List of paths for Thingy data folders.
    
    Returns:
    tuple: Train and test sliding windows and labels for both Respeck and Thingy.
    """
    # Load all files for Respeck and Thingy from specified folders
    respeck_files = []
    for path in respeck_paths:
        respeck_files.extend(load_files_from_multiple_folders(path, [activity]))
        
    thingy_files = []
    for path in thingy_paths:
        thingy_files.extend(load_files_from_multiple_folders(path, [activity]))

    # Split files into train and test sets
    respeck_train_files, respeck_test_files = split_files(respeck_files, test_size)
    thingy_train_files, thingy_test_files = split_files(thingy_files, test_size)

    # Load and apply sliding windows on Respeck files
    respeck_train_windows, respeck_train_labels = load_and_apply_sliding_windows(respeck_train_files, window_size, step_size, label)
    respeck_test_windows, respeck_test_labels = load_and_apply_sliding_windows(respeck_test_files, window_size, step_size, label)

    # Load and apply sliding windows on Thingy files
    thingy_train_windows, thingy_train_labels = load_and_apply_sliding_windows(thingy_train_files, window_size, step_size, label)
    thingy_test_windows, thingy_test_labels = load_and_apply_sliding_windows(thingy_test_files, window_size, step_size, label)

    return (respeck_train_windows, respeck_train_labels, respeck_test_windows, respeck_test_labels,
            thingy_train_windows, thingy_train_labels, thingy_test_windows, thingy_test_labels)


## Combine Data
The function combines the sliding window data and their corresponding labels from multiple activities (e.g., walking, running, etc.) into single arrays.

In [477]:
def combine_data(train_test_data, data_type):
    """
    Combines the sliding windows and labels from all activities into a single array for either training or testing.
    
    Args:
        train_test_data (dict): Dictionary containing sliding window data for all activities.
        data_type (str): Either 'train' or 'test' to specify which data to combine ('train_windows' or 'test_windows').
    
    Returns:
        tuple: 
            - respeck_windows (numpy.ndarray): Concatenated Respeck windows.
            - respeck_labels (numpy.ndarray): Concatenated Respeck labels.
            - thingy_windows (numpy.ndarray): Concatenated Thingy windows.
            - thingy_labels (numpy.ndarray): Concatenated Thingy labels.
    """
    
    # Extract sliding windows and labels for Respeck sensor
    respeck_windows_list = [train_test_data[activity][f'respeck_{data_type}_windows'] for activity in train_test_data]
    respeck_labels_list = [train_test_data[activity][f'respeck_{data_type}_labels'] for activity in train_test_data]
    
    # Extract sliding windows and labels for Thingy sensor
    thingy_windows_list = [train_test_data[activity][f'thingy_{data_type}_windows'] for activity in train_test_data]
    thingy_labels_list = [train_test_data[activity][f'thingy_{data_type}_labels'] for activity in train_test_data]
    
    # Concatenate windows and labels separately for Respeck
    respeck_windows = np.concatenate(respeck_windows_list, axis=0)
    respeck_labels = np.concatenate(respeck_labels_list, axis=0)
    
    # Concatenate windows and labels separately for Thingy
    thingy_windows = np.concatenate(thingy_windows_list, axis=0)
    thingy_labels = np.concatenate(thingy_labels_list, axis=0)
    
    return respeck_windows, respeck_labels, thingy_windows, thingy_labels

## 1D CNN Model

This function, `build_1d_cnn_model`, creates and compiles a 1D Convolutional Neural Network (CNN) for multi-class classification tasks.

### Function Overview

Input Parameters
* `input_shape`: Specifies the shape of the input data. It represents (timesteps, features), where timesteps refer to the length of the time series (e.g., 50 windows), and features represent the number of measurements in each time step (e.g., accelerometer readings).
* `num_classes`: The number of output classes for the classification problem. For example, if you're classifying six different activities, num_classes would be 6.

Returns
* The function returns a compiled 1D CNN model that is ready to be trained on your data.

<hr>

### Function Breakdown
1.   Model Initialization:
    * `model = Sequential()`: Initializes a Sequential model, which means layers will be stacked on top of each other in a linear fashion.
2.   First Convolutional Layer
    * `Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=input_shape)`
        * This is the first 1D convolutional layer
        * `filters=64`: The layer applies 64 filters (or kernels) over the input data.
        * `kernel_size=3`: Each filter will cover 3 timesteps at a time (a window of 3).
        * `activation='relu'`: The Rectified Linear Unit (ReLU) activation function introduces non-linearity and helps the model learn complex patterns.
        * `input_shape=input_shape`: Specifies the shape of the input data.
    * `MaxPooling1D(pool_size=2)`: This pooling layer reduces the dimensionality of the data by taking the maximum value from each 2-timestep window (`pool_size=2`). This helps reduce computational complexity and captures the most important features.
3. Second Convolutional Layer:
    * `Conv1D(filters=128, kernel_size=3, activation='relu')`
        * This is the second convolutional layer, similar to the first, but with 128 filters, which allow the network to learn more complex features from the data.
        * `kernel_size=3` and activation='relu' function in the same way as the first Conv1D layer.
    * `MaxPooling1D(pool_size=2)`: Another pooling layer to downsample the output, further reducing the data’s dimensionality.
4. Flattening Layer:
    * `Flattening`: Converts the 2D output of the convolutional and pooling layers into a 1D vector. This is necessary because the next layer is fully connected, and it requires a 1D input.
5. Fully Connected Layer:
    * `Dense(128, activation='relu')`: This is a fully connected layer with 128 units/neurons. Each neuron is connected to every input from the flattened output. The ReLU activation function is used again to introduce non-linearity and help the model learn complex relationships.
6. Dropout Layer:
    * `Dropout(0.5)`: This layer randomly sets 50% of the neurons to zero during training to prevent overfitting. It helps the model generalize better to unseen data.
7. Output Layer:
    * `Dense(num_classes, activation='softmax')`: This is the output layer with num_classes neurons, one for each class in the classification problem. The softmax activation function ensures the output values represent probabilities that sum to 1, useful for multi-class classification.
8. Compiling the model
    * model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']):
        * Optimizer: 'adam': Adam is an optimization algorithm that adjusts the learning rate during training to improve performance.
        * Loss: 'categorical_crossentropy': This loss function is used for multi-class classification problems where the target variable is one-hot encoded (i.e., represented as a vector of 0s and 1s).
        * Metrics: ['accuracy']: The accuracy metric is used to evaluate the model’s performance during training and testing.


In [478]:
def build_1d_cnn_model(input_shape, num_classes):
    """
    Builds and compiles a 1D CNN model for multi-class classification.

    Args:
        input_shape (tuple): The shape of the input data (timesteps, features).
        num_classes (int): The number of output classes.

    Returns:
        model (Sequential): Compiled 1D CNN model.
    """
    model = Sequential()

    # First Conv1D layer
    # You can try experimenting with different filters, kernel_size values and activiation functions
    model.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=input_shape))
    model.add(MaxPooling1D(pool_size=2))

    # Second Conv1D layer
    # You can try experimenting with different filters, kernel_size values and activiation functions
    model.add(Conv1D(filters=128, kernel_size=3, activation='relu'))
    model.add(MaxPooling1D(pool_size=2))

    # Add LSTM layer
    model.add(LSTM(64, return_sequences=True))  # Use return_sequences=True if adding more layers after LSTM

    # Flatten the output from the convolutional layers
    model.add(Flatten())

    # Fully connected layer
    model.add(Dense(128, activation='relu'))

    # Dropout layer for regularization
    # You can try experimenting with different dropout rates
    model.add(Dropout(0.5))

    # Output layer with softmax for multi-class classification
    model.add(Dense(num_classes, activation='softmax'))

    # Compile the model
    model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

    #  Prints a detailed summary of the model, showing the layers, their output shapes, and the number of trainable parameters
    model.summary()

    return model

In [479]:
# from tensorflow.keras.models import Model
# from tensorflow.keras.layers import Input, Conv1D, MaxPooling1D, GlobalAveragePooling1D
# from tensorflow.keras.layers import Dense, Dropout, concatenate
# from tensorflow.keras.optimizers import Adam
# from tensorflow.keras.regularizers import l2

# def build_dual_input_cnn_model(respeck_input_shape, thingy_input_shape, num_classes):
#     respeck_input = Input(shape=respeck_input_shape, name='respeck_input')
#     thingy_input = Input(shape=thingy_input_shape, name='thingy_input')
    
#     # Respeck branch with BatchNorm and Global Average Pooling
#     x_respeck = Conv1D(64, 3, activation='relu')(respeck_input)
#     x_respeck = MaxPooling1D(2)(x_respeck)
#     x_respeck = Conv1D(128, 3, activation='relu')(x_respeck)
#     x_respeck = MaxPooling1D(2)(x_respeck)
#     x_respeck = GlobalAveragePooling1D()(x_respeck)

#     # Thingy branch with BatchNorm and Global Average Pooling
#     x_thingy = Conv1D(64, 3, activation='relu')(thingy_input)
#     x_thingy = MaxPooling1D(2)(x_thingy)
#     x_thingy = Conv1D(128, 3, activation='relu')(x_thingy)
#     x_thingy = MaxPooling1D(2)(x_thingy)
#     x_thingy = GlobalAveragePooling1D()(x_thingy)

#     # Combine branches
#     combined_output = concatenate([x_respeck, x_thingy])
    
#     # Dense layers with L2 regularization and Dropout
#     combined_output = Dense(128, activation='relu', kernel_regularizer=l2(0.001))(combined_output)  
#     combined_output = Dropout(0.5)(combined_output)

#     # Output layer with softmax for multi-class classification
#     output_layer = Dense(num_classes, activation='softmax')(combined_output)

#     # Compile model
#     model = Model(inputs=[respeck_input, thingy_input], outputs=output_layer)
    
#     model.compile(
#         optimizer=Adam(),
#         loss='categorical_crossentropy',
#         metrics=['accuracy']
#     )
    
#     return model

In [480]:
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Conv1D, MaxPooling1D, GlobalAveragePooling1D
from tensorflow.keras.layers import Dense, Dropout, concatenate, BatchNormalization
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l2
from tensorflow.keras.callbacks import ReduceLROnPlateau

def build_dual_input_cnn_model(respeck_input_shape, thingy_input_shape, num_classes):
    # Respeck branch
    respeck_input = Input(shape=respeck_input_shape, name='respeck_input')
    x_respeck = Conv1D(128, 5, activation='relu', padding='same')(respeck_input)
    x_respeck = BatchNormalization()(x_respeck)
    x_respeck = MaxPooling1D(2)(x_respeck)
    x_respeck = Dropout(0.3)(x_respeck)

    x_respeck = Conv1D(256, 3, activation='relu', padding='same')(x_respeck)
    x_respeck = BatchNormalization()(x_respeck)
    x_respeck = MaxPooling1D(2)(x_respeck)
    x_respeck = Dropout(0.3)(x_respeck)

    x_respeck = GlobalAveragePooling1D()(x_respeck)

    # Thingy branch
    thingy_input = Input(shape=thingy_input_shape, name='thingy_input')
    x_thingy = Conv1D(128, 5, activation='relu', padding='same')(thingy_input)
    x_thingy = BatchNormalization()(x_thingy)
    x_thingy = MaxPooling1D(2)(x_thingy)
    x_thingy = Dropout(0.3)(x_thingy)

    x_thingy = Conv1D(256, 3, activation='relu', padding='same')(x_thingy)
    x_thingy = BatchNormalization()(x_thingy)
    x_thingy = MaxPooling1D(2)(x_thingy)
    x_thingy = Dropout(0.3)(x_thingy)

    x_thingy = GlobalAveragePooling1D()(x_thingy)

    # Concatenate branches directly without attention
    combined = concatenate([x_respeck, x_thingy])

    # Fully connected layer with dropout and L2 regularization
    combined = Dense(256, activation='relu', kernel_regularizer=l2(0.001))(combined)
    combined = Dropout(0.5)(combined)

    # Output layer for multi-class classification
    output_layer = Dense(num_classes, activation='softmax')(combined)

    # Compile model
    model = Model(inputs=[respeck_input, thingy_input], outputs=output_layer)
    model.compile(
        optimizer=Adam(learning_rate=0.001),
        loss='categorical_crossentropy',
        metrics=['accuracy']
    )
    
    return model

# Reduce learning rate when a metric has stopped improving
lr_scheduler = ReduceLROnPlateau(monitor='val_loss', factor=0.5, patience=3, min_lr=1e-6)


# Classification Pipeline

## Step 1: Prepare and Preprocess the Data

In [481]:
# Define activity folders and corresponding labels
# Each key is the name of the physical activity, and the corresponding value is the numeric label
# These labels will be used as the target variable for classification.
activities = {
    'sittingStanding': 0,
    'lyingBack': 1,
    'lyingLeft': 2,
    'lyingRight': 3,
    'lyingStomach': 4,
    'miscMovement': 5,
    'normalWalking': 6,
    'running': 7,
    'shuffleWalking': 8,
    'ascending': 9,
    'descending': 10,
}

In [482]:
# Dictionary to store sliding windows and labels for both train and test sets for each activity
train_test_data = {}

# Loop through each activity folder and process the data
for activity, label in activities.items():
    train_test_data[activity] = {}

    # Call process_activity() with respeck_paths and thingy_paths
    (train_test_data[activity]['respeck_train_windows'], train_test_data[activity]['respeck_train_labels'],
     train_test_data[activity]['respeck_test_windows'], train_test_data[activity]['respeck_test_labels'],
     train_test_data[activity]['thingy_train_windows'], train_test_data[activity]['thingy_train_labels'],
     train_test_data[activity]['thingy_test_windows'], train_test_data[activity]['thingy_test_labels']) = process_activity(
        activity, label, respeck_paths, thingy_paths)

# Explanation:
# - `respeck_paths` and `thingy_paths` now contain both year1 and year2 folders for each dataset type.
# - `process_activity` will handle loading, splitting, and processing data across all specified folders.


Now that each activity has been processed and stored in train_test_data, we need to combine the sliding windows and labels from all activities into unified arrays (one for training and one for testing) for model training.

In [483]:
X_train_respeck, y_train_respeck, X_train_thingy, y_train_thingy = combine_data(train_test_data, 'train')
X_test_respeck, y_test_respeck, X_test_thingy, y_test_thingy = combine_data(train_test_data, 'test')

# Print shapes to verify correctness
print(f"X_train_respeck shape: {X_train_respeck.shape}, y_train_respeck shape: {y_train_respeck.shape}")
print(f"X_train_thingy shape: {X_train_thingy.shape}, y_train_thingy shape: {y_train_thingy.shape}")
print(f"X_test_respeck shape: {X_test_respeck.shape}, y_test_respeck shape: {y_test_respeck.shape}")
print(f"X_test_thingy shape: {X_test_thingy.shape}, y_test_thingy shape: {X_test_thingy.shape}")

X_train_respeck shape: (16432, 100, 3), y_train_respeck shape: (16432,)
X_train_thingy shape: (17428, 100, 3), y_train_thingy shape: (17428,)
X_test_respeck shape: (4197, 100, 3), y_test_respeck shape: (4197,)
X_test_thingy shape: (4442, 100, 3), y_test_thingy shape: (4442, 100, 3)


In [484]:
# No need to equalize samples anymore since we want to keep all windows as they are

# Print shapes of training data to verify correctness
print(f"X_train_respeck shape: {X_train_respeck.shape}, y_train_respeck shape: {y_train_respeck.shape}")
print(f"X_train_thingy shape: {X_train_thingy.shape}, y_train_thingy shape: {y_train_thingy.shape}")

# Print shapes of testing data to verify correctness
print(f"X_test_respeck shape: {X_test_respeck.shape}, y_test_respeck shape: {y_test_respeck.shape}")
print(f"X_test_thingy shape: {X_test_thingy.shape}, y_test_thingy shape: {y_test_thingy.shape}")

X_train_respeck shape: (16432, 100, 3), y_train_respeck shape: (16432,)
X_train_thingy shape: (17428, 100, 3), y_train_thingy shape: (17428,)
X_test_respeck shape: (4197, 100, 3), y_test_respeck shape: (4197,)
X_test_thingy shape: (4442, 100, 3), y_test_thingy shape: (4442,)


### One-Hot Encode Labels (for multi-class classification)
If you have more than two classes, you'll need to one-hot encode the labels, especially if your model will use categorical cross-entropy loss.

One-Hot Encoding converts categorical labels into binary vectors (one-hot encoded format). Each class label is represented as a binary vector with 1 for the correct class and 0 for others. This is necessary for training models that use categorical_crossentropy as the loss function, such as a neural network.

In [485]:
from sklearn.preprocessing import OneHotEncoder
import numpy as np

# Initialize separate OneHotEncoders for Respeck and Thingy
encoder_respeck = OneHotEncoder(sparse_output=False)
encoder_thingy = OneHotEncoder(sparse_output=False)

# Fit encoders on training data only (to avoid data leakage)
encoder_respeck.fit(y_train_respeck.reshape(-1, 1))
encoder_thingy.fit(y_train_thingy.reshape(-1, 1))

# One-hot encode Respeck labels (train and test)
y_train_respeck_one_hot = encoder_respeck.transform(y_train_respeck.reshape(-1, 1))
y_test_respeck_one_hot = encoder_respeck.transform(y_test_respeck.reshape(-1, 1))

# One-hot encode Thingy labels (train and test)
y_train_thingy_one_hot = encoder_thingy.transform(y_train_thingy.reshape(-1, 1))
y_test_thingy_one_hot = encoder_thingy.transform(y_test_thingy.reshape(-1, 1))

# Print shapes of one-hot encoded labels to verify correctness
print(f"y_train_respeck_one_hot shape: {y_train_respeck_one_hot.shape}, y_test_respeck_one_hot shape: {y_test_respeck_one_hot.shape}")
print(f"y_train_thingy_one_hot shape: {y_train_thingy_one_hot.shape}, y_test_thingy_one_hot shape: {y_test_thingy_one_hot.shape}")

y_train_respeck_one_hot shape: (16432, 11), y_test_respeck_one_hot shape: (4197, 11)
y_train_thingy_one_hot shape: (17428, 11), y_test_thingy_one_hot shape: (4442, 11)


In [486]:
# Verify Respeck input shape
print(f"X_train_respeck shape: {X_train_respeck.shape}")  # Should be (num_samples, window_size, num_features)

# Verify Thingy input shape
print(f"X_train_thingy shape: {X_train_thingy.shape}")  # Should be (num_samples, window_size, num_features)

X_train_respeck shape: (16432, 100, 3)
X_train_thingy shape: (17428, 100, 3)


In [487]:
from sklearn.utils import resample

# Find the maximum sample size between Respeck and Thingy
max_samples = max(X_train_respeck.shape[0], X_train_thingy.shape[0])

# Resample Respeck data to match Thingy's sample size (upsampling)
X_train_respeck, y_train_respeck_one_hot = resample(
    X_train_respeck, y_train_respeck_one_hot,
    replace=True,  # Upsampling with replacement
    n_samples=max_samples,
    random_state=42
)

# No need to resample Thingy if it's already larger

In [488]:
# Verify Respeck input shape
print(f"X_train_respeck shape: {X_train_respeck.shape}")  # Should be (num_samples, window_size, num_features)

# Verify Thingy input shape
print(f"X_train_thingy shape: {X_train_thingy.shape}")  # Should be (num_samples, window_size, num_features)

X_train_respeck shape: (17428, 100, 3)
X_train_thingy shape: (17428, 100, 3)


In [489]:
# Verify number of output classes
print(f"Number of classes (Respeck): {y_train_respeck_one_hot.shape[1]}")
print(f"Number of classes (Thingy): {y_train_thingy_one_hot.shape[1]}")

Number of classes (Respeck): 11
Number of classes (Thingy): 11


In [490]:
from sklearn.utils import resample

# Resample Respeck test data to match Thingy's sample size
X_test_respeck, y_test_respeck_one_hot = resample(
    X_test_respeck, y_test_respeck_one_hot,
    replace=True,
    n_samples=X_test_thingy.shape[0],
    random_state=42
)

In [491]:
class_weights_dict = {
    0: 0.8,  # Well-performing class
    1: 0.7,  # Well-performing class
    2: 1.5,  # Slightly underperforming
    3: 1.0,  
    4: 0.8,  # Well-performing class
    5: 2.5,  # Underperforming class (F1 ~65%)
    6: 4.0,  # Underperforming class (F1 ~87%)
    7: 1.3,
    8: 4.0,  # Underperforming class (F1 ~75%)
    9: 2.5,
    10: 1.5
}

## Step 2: Build the 1D-CNN Model
Call our `build_1d_cnn_model` functionto build our model

In [492]:
# Determine the input shape for the model
respeck_input_shape = (X_train_respeck.shape[1], X_train_respeck.shape[2])
thingy_input_shape = (X_train_thingy.shape[1], X_train_thingy.shape[2])

# Determine the number of output classes
num_classes = y_train_respeck_one_hot.shape[1]  # This should be the same for both sensors

# Build and compile the model
model = build_dual_input_cnn_model(respeck_input_shape, thingy_input_shape, num_classes)

In [493]:
print(f"X_test_respeck shape: {X_test_respeck.shape}")
print(f"X_test_thingy shape: {X_test_thingy.shape}")
print(f"y_test_respeck_one_hot shape: {y_test_respeck_one_hot.shape}")

X_test_respeck shape: (4442, 100, 3)
X_test_thingy shape: (4442, 100, 3)
y_test_respeck_one_hot shape: (4442, 11)


## Step 3: Train the CNN Model

Train the 1D CNN model using the training data and validate on the test data. The model will learn to map input sliding windows to their corresponding activity labels.

`model.fit()` is used to train the neural network model. It takes several parameters:
* `X_train`: The input training data (sliding windows), with shape (num_samples, window_size, num_features).
* `y_train_one_hot`: The corresponding one-hot encoded labels for the training data, with shape (num_samples, num_classes).
* `epochs`: Number of times the entire training dataset is passed through the model. You can try adjusting the number of epochs and compare the difference in model performance. In this case, we are training for 20 epochs, meaning the model will see the entire training set 20 times.
* `batch_size`: Number of samples processed before the model's weights are updated. Here, the batch size is set to 32, meaning the model will process 32 samples at a time before updating its parameters.
* `validation_data`: This parameter allows us to evaluate the model's performance on the test data after each epoch.
*`(X_test, y_test_one_hot)`: These are the input test data and corresponding one-hot encoded test labels.

In [494]:
# history = model.fit(
#     [X_train_respeck, X_train_thingy],  # Two inputs: Respeck and Thingy features
#     y_train_respeck_one_hot,
#     epochs=20,
#     batch_size=32,
#     validation_data=([X_test_respeck, X_test_thingy], y_test_respeck_one_hot)  # Validation inputs and labels
# )

In [495]:
# Train the model with class weights
history = model.fit(
    [X_train_respeck, X_train_thingy],   # Two inputs: Respeck and Thingy features
    y_train_respeck_one_hot,             # One-hot encoded labels
    epochs=20,
    batch_size=32,
    validation_data=([X_test_respeck, X_test_thingy], y_test_respeck_one_hot),  # Validation inputs and labels
    class_weight=class_weights_dict      # Pass computed class weights here
)

Epoch 1/20




[1m545/545[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m15s[0m 22ms/step - accuracy: 0.5739 - loss: 3.0956 - val_accuracy: 0.6824 - val_loss: 1.4673
Epoch 2/20
[1m545/545[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 22ms/step - accuracy: 0.7983 - loss: 1.6007 - val_accuracy: 0.8222 - val_loss: 0.9781
Epoch 3/20
[1m545/545[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 23ms/step - accuracy: 0.8664 - loss: 1.1497 - val_accuracy: 0.8593 - val_loss: 0.9181
Epoch 4/20
[1m545/545[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 24ms/step - accuracy: 0.8942 - loss: 0.9369 - val_accuracy: 0.8307 - val_loss: 1.0663
Epoch 5/20
[1m545/545[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 24ms/step - accuracy: 0.9021 - loss: 0.8578 - val_accuracy: 0.8424 - val_loss: 1.0074
Epoch 6/20
[1m545/545[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m13s[0m 24ms/step - accuracy: 0.9207 - loss: 0.7210 - val_accuracy: 0.8352 - val_loss: 0.9623
Epoch 7/20
[1m545/545[0m 

## Step 4: Evaluate the Model
After training, you can evaluate the model on the test set:

In [496]:
# # Get predicted probabilities for the test set
# y_pred_probs = model.predict([X_test_respeck, X_test_thingy])

# # Convert the predicted probabilities to class labels
# y_pred_classes = np.argmax(y_pred_probs, axis=1)

# # Convert the true test labels from one-hot encoding back to class labels
# y_true_classes_respeck = np.argmax(y_test_respeck_one_hot, axis=1)
# y_true_classes_thingy = np.argmax(y_test_thingy_one_hot, axis=1)

# # Combine the true classes from both sensors
# y_true_classes = np.concatenate([y_true_classes_respeck, y_true_classes_thingy])

# # Generate the classification report
# report = classification_report(y_true_classes, y_pred_classes, digits=4)
# print(report)

In [497]:
# Get predicted probabilities for the test set
y_pred_probs = model.predict([X_test_respeck, X_test_thingy])

# Convert the predicted probabilities to class labels
y_pred_classes = np.argmax(y_pred_probs, axis=1)

# Convert the true test labels from one-hot encoding back to class labels
# Use only one set of true labels (e.g., Respeck)
y_true_classes = np.argmax(y_test_respeck_one_hot, axis=1)

# Generate the classification report
report = classification_report(y_true_classes, y_pred_classes, digits=4)
print(report)

[1m139/139[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 7ms/step
              precision    recall  f1-score   support

           0     0.9361    0.9455    0.9408       697
           1     1.0000    0.9924    0.9962       395
           2     0.8225    0.9104    0.8642       346
           3     1.0000    0.8349    0.9100       327
           4     0.9017    1.0000    0.9483       413
           5     0.9015    0.8444    0.8720       347
           6     0.8244    0.8934    0.8575       394
           7     0.9225    0.9835    0.9520       363
           8     0.7210    0.8005    0.7587       381
           9     0.9262    0.8289    0.8748       409
          10     0.8914    0.7324    0.8042       370

    accuracy                         0.8935      4442
   macro avg     0.8952    0.8878    0.8890      4442
weighted avg     0.8976    0.8935    0.8933      4442



As you can see from the model performance results, the classification performance isn't exactly impressive. For Coursework 3, your group should explore and experiment with various models, parameters, and techniques in order to improve your model's performance.

# Exporting your model to TFLite

You can use the TFLiteConverter class provided by TensorFlow to convert your trained model into the TensorFlow Lite format. We export models to TensorFlow Lite (TFLite) for several reasons, primarily because TFLite is designed for deployment on edge devices, such as mobile phones, embedded systems, IoT devices, and microcontrollers, where computational resources and power are limited. This is necessary as you will be running your ML models on your Android devices to perform live classification.

In [498]:
# # Convert the trained Keras model to TensorFlow Lite format
# converter = tf.lite.TFLiteConverter.from_keras_model(model)  # model is your trained Keras model
# tflite_model = converter.convert()

# # Save the converted model to a .tflite file
# with open('model.tflite', 'wb') as f:
#     f.write(tflite_model)

# print("Model successfully exported to model.tflite")

In [499]:
# Convert the trained Keras model to TensorFlow Lite format
try:
    converter = tf.lite.TFLiteConverter.from_keras_model(model)  # 'model' is your trained Keras model
    tflite_model = converter.convert()

    # Specify the path where the .tflite model should be saved
    tflite_model_path = 'C:/Users/nikit/Documents/University/Year4/PDIOT/CW3/models/activities_model.tflite'

    # Save the converted model to the specified .tflite file
    with open(tflite_model_path, 'wb') as f:
        f.write(tflite_model)

    print(f"Model successfully exported to {tflite_model_path}")

except Exception as e:
    print(f"An error occurred during model export: {e}")


INFO:tensorflow:Assets written to: C:\Users\nikit\AppData\Local\Temp\tmpzq2w_upw\assets


INFO:tensorflow:Assets written to: C:\Users\nikit\AppData\Local\Temp\tmpzq2w_upw\assets


Saved artifact at 'C:\Users\nikit\AppData\Local\Temp\tmpzq2w_upw'. The following endpoints are available:

* Endpoint 'serve'
  args_0 (POSITIONAL_ONLY): List[TensorSpec(shape=(None, 100, 3), dtype=tf.float32, name='respeck_input'), TensorSpec(shape=(None, 100, 3), dtype=tf.float32, name='thingy_input')]
Output Type:
  TensorSpec(shape=(None, 11), dtype=tf.float32, name=None)
Captures:
  3193277457584: TensorSpec(shape=(), dtype=tf.resource, name=None)
  3193277716032: TensorSpec(shape=(), dtype=tf.resource, name=None)
  3193277454768: TensorSpec(shape=(), dtype=tf.resource, name=None)
  3193277463392: TensorSpec(shape=(), dtype=tf.resource, name=None)
  3193277716384: TensorSpec(shape=(), dtype=tf.resource, name=None)
  3193277713568: TensorSpec(shape=(), dtype=tf.resource, name=None)
  3193277712336: TensorSpec(shape=(), dtype=tf.resource, name=None)
  3193277714272: TensorSpec(shape=(), dtype=tf.resource, name=None)
  3193277455120: TensorSpec(shape=(), dtype=tf.resource, name=None)

We would now like to look into the average accuraces of static and dynamic activities separately.

# Good job!
This is the end of Lab 3. In the next lab, you will focus on deploying your machine learning model onto your Android App in order to classify activities in real-time.