
# Human Activity Recognition (HAR) with Textual Descriptions of Sensor Triggers (TDOST)
This notebook demonstrates a layout-agnostic Human Activity Recognition (HAR) model using the TDOST (Textual Descriptions of Sensor Triggers) methodology. 
The model is trained on sensor data, converted into natural language descriptions, to improve generalizability across different smart home layouts.

The data provided in `hh101.ann.txt` includes sensor triggers with associated activities, which we'll use to generate TDOST embeddings for activity recognition.


In [1]:

import pandas as pd

# Load the dataset
data_path = '/Users/harrisonkirstein/Desktop/CSCI-4380-Honors-Option-Repo/CSCI 4380 Honors Option Project/hh101/hh101.ann.txt'
columns = ['timestamp', 'sensor_id', 'location', 'value', 'sensor_type', 'activity']
df = pd.read_csv(data_path, sep='\t', header=None, names=columns)

# Display the data
df.head()


Unnamed: 0,timestamp,sensor_id,location,value,sensor_type,activity
2012-07-20 10:38:54.512364,M001,OutsideDoor,Entry,ON,Control4-Motion,Step_Out
2012-07-20 10:38:54.653634,LS001,Ignore,Ignore,49,Control4-LightSensor,Step_Out
2012-07-20 10:38:57.448892,LS001,Ignore,Ignore,7,Control4-LightSensor,Step_Out
2012-07-20 10:38:58.385068,LS001,Ignore,Ignore,50,Control4-LightSensor,Step_Out
2012-07-20 10:38:59.335432,LS001,Ignore,Ignore,7,Control4-LightSensor,Step_Out



## Generate TDOST Descriptions
We will create natural language descriptions for each sensor event by incorporating contextual information from the sensor type, location, and value. 
This will result in sentences that can be processed by a language model to produce embeddings for classification.


In [None]:

# Function to create a TDOST sentence for each sensor event
def generate_tdost(row):
    return f"{row['sensor_type']} sensor in {row['location']} fired with value {row['value']}"

# Apply the function to generate TDOST sentences
df['tdost_description'] = df.apply(generate_tdost, axis=1)
df[['tdost_description', 'activity']].head()



## Text Embedding with Pre-trained Sentence Encoder
Using a pre-trained sentence encoder (e.g., Sentence-BERT), we convert the TDOST descriptions into embeddings. 
These embeddings represent the contextual information within each description and serve as input for the activity classifier.


In [None]:

from sentence_transformers import SentenceTransformer

# Load a pre-trained sentence transformer model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Generate embeddings for each TDOST description
df['embedding'] = df['tdost_description'].apply(lambda x: model.encode(x))
df.head()



## Model Definition and Training
We define a simple neural network classifier to predict activities based on the TDOST embeddings. 
The model is trained using labeled activities in the dataset.


In [None]:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

# Convert embeddings to list for training
X = list(df['embedding'])
y = df['activity']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a classifier
clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)

# Evaluate on test set
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))



## Summary
This notebook demonstrated the layout-agnostic HAR model using the TDOST approach, where sensor triggers were converted into natural language descriptions and 
embedded using a pre-trained language model. This process improves the model's ability to generalize across different smart home layouts without additional retraining.

Future improvements can include experimenting with different sentence embeddings and deep learning models for enhanced performance.
