# TDOST-Based HAR Model using Aruba Dataset

This notebook implements a TDOST-based Human Activity Recognition (HAR) model. We will use the Aruba dataset from the CASAS repository, handling its specific structure where activities are marked by 'begin' and 'end' rows, while intermediate rows lack direct activity labels.

Steps:
1. Load and preprocess the dataset
2. Group sensor events into activity-based segments
3. Generate TDOST descriptions
4. Encode descriptions and labels
5. Train the HAR model
6. Test the model on new embeddings.

In [1]:
# Step 1: Load and Preprocess Dataset
import pandas as pd
import numpy as np

# Define file path
aruba_data_path = '/Users/harrisonkirstein/Desktop/CSCI-4380-Honors-Option-Repo/CSCI 4380 Honors Option Project/Datasets/aruba/data'

# Load dataset with variable columns
aruba_data = pd.read_csv(
    aruba_data_path, 
    header=None, 
    names=['Date', 'Time', 'Sensor', 'Value', 'Activity_Type', 'Begin_Or_End'], 
    delim_whitespace=True,
    engine='python'
)

# Combine Date and Time into a single timestamp column
aruba_data['Timestamp'] = pd.to_datetime(aruba_data['Date'] + ' ' + aruba_data['Time'], errors='coerce')
aruba_data.drop(columns=['Date', 'Time'], inplace=True)

# Fill missing columns with NaN for rows without activity labels
aruba_data.fillna('', inplace=True)

# Preview the dataset
aruba_data.head()


  aruba_data = pd.read_csv(


Unnamed: 0,Sensor,Value,Activity_Type,Begin_Or_End,Timestamp
0,M003,ON,Sleeping,begin,2010-11-04 00:03:50.209589
1,M003,OFF,,,2010-11-04 00:03:57.399391
2,T002,21.5,,,2010-11-04 00:15:08.984841
3,T003,21,,,2010-11-04 00:30:19.185547
4,T004,21,,,2010-11-04 00:30:19.385336


In [5]:
aruba_data[aruba_data['Begin_Or_End'] == 'end']

Unnamed: 0,Sensor,Value,Activity_Type,Begin_Or_End,Timestamp
48,M003,OFF,Sleeping,end,2010-11-04 05:40:43.642664
63,M004,OFF,Bed_to_Toilet,end,2010-11-04 05:43:30.279021
172,M003,OFF,Sleeping,end,2010-11-04 08:01:12.282970
520,M018,OFF,Meal_Preparation,end,2010-11-04 08:27:02.801314
707,M018,OFF,Meal_Preparation,end,2010-11-04 08:35:45.822482
...,...,...,...,...,...
1718826,M009,OFF,Relax,end,2011-06-11 18:14:30.112460
1719050,M009,OFF,Relax,end,2011-06-11 20:33:44.453476
1719347,M009,OFF,Relax,end,2011-06-11 21:18:10.632466
1719431,M009,OFF,Relax,end,2011-06-11 22:05:07.486416
