# EEE4114F ML Project Code Part 2: Classifying

From the **(A) DeviceMotion_data** file, we chose to use the data of:

- **Attitude (roll, pitch, yaw)** showing device orientation (e.g., facing up/down)
- **Gravity (x, y, z)** showing static acceleration (orientation wrt gravity)



In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### Useful functions
Here are some useful functions from motionsense.ipyb we have copied. We didn't copy get_ds_infos() since we don't really care about subject information. We edited the code to have just classified subjects [1-24] without needing to read the subject info.

In [2]:
def set_data_types(data_types=["userAcceleration"]):
    """
    Select the sensors and the mode to shape the final dataset. 

    Args:
        data_types: A list of sensor data type from this list: [attitude, gravity, rotationRate, userAcceleration] 

    Returns:
        A list of columns to use for creating time-series from files.
    """
    dt_list = []
    for t in data_types:
        if t != "attitude":
            dt_list.append([t+".x",t+".y",t+".z"])
        else:
            dt_list.append([t+".roll", t+".pitch", t+".yaw"])
    return dt_list

ACT_LABELS = ["dws","ups", "wlk", "jog", "std", "sit"]
TRIAL_CODES = {
    ACT_LABELS[0]:[1,2,11],
    ACT_LABELS[1]:[3,4,12],
    ACT_LABELS[2]:[7,8,15],
    ACT_LABELS[3]:[9,16],
    ACT_LABELS[4]:[6,14],
    ACT_LABELS[5]:[5,13]
}

In [3]:
def create_time_series(dt_list, act_labels, trial_codes, subject_ids=None, mode="mag", labeled=True):
    """
    Defines what data to include for a given set, using selected sensors and subjects.

    Args:
        dt_list: List of sensor columns to include.
        act_labels: List of activity labels (e.g. ["dws", "ups", "wlk"...]).
        trial_codes: Dictionary mapping activity to trial numbers.
        subject_ids: List of subject IDs to include. Example: [1, 2, ..., 24]
        mode: "raw" = keep all sensor components; "mag" = magnitude only.
        labeled: True to include activity labels.

    Returns:
        A pandas DataFrame containing time-series sensor data.
    """
    if subject_ids is None:
        subject_ids = list(range(1, 25))  # Default: use all subjects

    num_data_cols = len(dt_list) if mode == "mag" else len(dt_list * 3)
    dataset = np.zeros((0, num_data_cols + 1)) if labeled else np.zeros((0, num_data_cols))

    print("[INFO] -- Creating Time-Series")
    for sub_id in subject_ids:
        for act_id, act in enumerate(act_labels):
            for trial in trial_codes[act]:
                # For Olive: fname = f'/Users/olivekschonfeldt/Library/CloudStorage/OneDrive-UniversityofCapeTown/EEE4114F DSP/ML Project 2025/motion-sense-master/data/A_DeviceMotion_data/{act}_{trial}/sub_{sub_id}.csv'
                fname = f'/Users/olivekschonfeldt/Library/CloudStorage/OneDrive-UniversityofCapeTown/EEE4114F DSP/ML Project 2025/motion-sense-master/data/A_DeviceMotion_data/{act}_{trial}/sub_{sub_id}.csv'
                try:
                    raw_data = pd.read_csv(fname)
                    raw_data = raw_data.drop(['Unnamed: 0'], axis=1)
                    vals = np.zeros((len(raw_data), num_data_cols))
                    for x_id, axes in enumerate(dt_list):
                        if mode == "mag":
                            vals[:, x_id] = (raw_data[axes] ** 2).sum(axis=1) ** 0.5
                        else:
                            vals[:, x_id * 3:(x_id + 1) * 3] = raw_data[axes].values
                        vals = vals[:, :num_data_cols]
                    if labeled:
                        lbls = np.array([[act_id]] * len(raw_data))
                        vals = np.concatenate((vals, lbls), axis=1)
                    dataset = np.append(dataset, vals, axis=0)
                except FileNotFoundError:
                    print(f"[WARNING] File not found: {fname}. Skipping.")
                    continue

    cols = []
    for axes in dt_list:
        cols += axes if mode == "raw" else [str(axes[0][:-2])]

    if labeled:
        cols += ["act"]

    dataset = pd.DataFrame(data=dataset, columns=cols)
    return dataset

### Loading the data
Here we extract the data to obtain our new **dataset**. We are only going to extract attitude(roll, pitch, yaw) and gravity(x,y,z) for all activity types (see part 1 for why).

In [7]:
# Here we set parameter to build labeled time-series from dataset of "(A)DeviceMotion_data"

sdt = ["attitude", "gravity"]
print("[INFO] -- Selected sensor data types: "+str(sdt))    
act_labels = ACT_LABELS  # includes all six activities
print("[INFO] -- Selected activites: "+str(act_labels))    
trial_codes = {act: TRIAL_CODES[act] for act in act_labels}
dt_list = set_data_types(sdt)
dataset = create_time_series(dt_list, act_labels, trial_codes, mode="raw", labeled=True)
print("[INFO] -- Shape of time-Series dataset:"+str(dataset.shape))    
dataset.head()

[INFO] -- Selected sensor data types: ['attitude', 'gravity']
[INFO] -- Selected activites: ['dws', 'ups', 'wlk', 'jog', 'std', 'sit']
[INFO] -- Creating Time-Series
[INFO] -- Shape of time-Series dataset:(1412865, 7)


Unnamed: 0,attitude.roll,attitude.pitch,attitude.yaw,gravity.x,gravity.y,gravity.z,act
0,1.528132,-0.733896,0.696372,0.741895,0.669768,-0.031672,0.0
1,1.527992,-0.716987,0.677762,0.753099,0.657116,-0.032255,0.0
2,1.527765,-0.706999,0.670951,0.759611,0.649555,-0.032707,0.0
3,1.516768,-0.704678,0.675735,0.760709,0.647788,-0.04114,0.0
4,1.493941,-0.703918,0.672994,0.760062,0.64721,-0.05853,0.0
