
# Human Activity Recognition (HAR) with Textual Descriptions of Sensor Triggers (TDOST)
This notebook demonstrates a layout-agnostic Human Activity Recognition (HAR) model using the TDOST (Textual Descriptions of Sensor Triggers) methodology. 
The model is trained on sensor data, converted into natural language descriptions, to improve generalizability across different smart home layouts.

The data provided in `hh101.ann.txt` includes sensor triggers with associated activities, which we'll use to generate TDOST embeddings for activity recognition.


In [2]:

import pandas as pd

# Load the dataset
# data_path = '/Users/harrisonkirstein/Desktop/CSCI-4380-Honors-Option-Repo/CSCI 4380 Honors Option Project/hh101/hh101.ann.txt'
data_path = '/Users/harrisonkirstein/Documents/GitHub/CSCI-4380-Honors-Option-Repo/CSCI 4380 Honors Option Project/hh101/hh101.ann.txt'
columns = ['timestamp', 'sensor_id', 'location', 'value', 'sensor_type', 'activity']
df = pd.read_csv(data_path, sep='\t', header=None, names=columns)

sample_df = df.sample(n=100000, random_state=42)  # random_state ensures reproducibility


# Display the data
sample_df.head()


Unnamed: 0,timestamp,sensor_id,location,value,sensor_type,activity
2012-09-13 18:19:55.348630,M008,LivingRoom,Chair,ON,Control4-Motion,Watch_TV
2012-07-21 23:43:20.142586,M003,Kitchen,Kitchen,ON,Control4-Motion,Other_Activity
2012-07-21 19:13:48.817447,M008,LivingRoom,Chair,ON,Control4-Motion,Watch_TV
2012-09-12 21:31:43.649158,LS003,Ignore,Ignore,17,Control4-LightSensor,Cook
2012-09-09 07:08:42.323352,MA015,Bathroom,Bathroom,ON,Control4-MotionArea,Toilet



## Generate TDOST Descriptions
We will create natural language descriptions for each sensor event by incorporating contextual information from the sensor type, location, and value. 
This will result in sentences that can be processed by a language model to produce embeddings for classification.


In [8]:

# Function to create a TDOST sentence for each sensor event
def generate_tdost(row):
    return f"{row['sensor_type']} sensor in {row['location']} fired with value {row['value']}"

# Apply the function to generate TDOST sentences
df['tdost_description'] = df.apply(generate_tdost, axis=1)
df[['tdost_description', 'activity']].head()


Unnamed: 0,tdost_description,activity
2012-07-20 10:38:54.512364,Control4-Motion sensor in Entry fired with val...,Step_Out
2012-07-20 10:38:54.653634,Control4-LightSensor sensor in Ignore fired wi...,Step_Out
2012-07-20 10:38:57.448892,Control4-LightSensor sensor in Ignore fired wi...,Step_Out
2012-07-20 10:38:58.385068,Control4-LightSensor sensor in Ignore fired wi...,Step_Out
2012-07-20 10:38:59.335432,Control4-LightSensor sensor in Ignore fired wi...,Step_Out



## Text Embedding with Pre-trained Sentence Encoder
Using a pre-trained sentence encoder (e.g., Sentence-BERT), we convert the TDOST descriptions into embeddings. 
These embeddings represent the contextual information within each description and serve as input for the activity classifier.


In [9]:

from sentence_transformers import SentenceTransformer

# Load a pre-trained sentence transformer model
model = SentenceTransformer('all-MiniLM-L6-v2')

embeddings = model.encode(df['tdost_description'].tolist(), batch_size=64, show_progress_bar=True)

# Generate embeddings for each TDOST description
# sample_df['embedding'] = sample_df['tdost_description'].apply(lambda x: model.encode(x))
sample_df.head()


Batches:   0%|          | 0/5023 [00:00<?, ?it/s]

Unnamed: 0,timestamp,sensor_id,location,value,sensor_type,activity,tdost_description
2012-09-13 18:19:55.348630,M008,LivingRoom,Chair,ON,Control4-Motion,Watch_TV,Control4-Motion sensor in Chair fired with val...
2012-07-21 23:43:20.142586,M003,Kitchen,Kitchen,ON,Control4-Motion,Other_Activity,Control4-Motion sensor in Kitchen fired with v...
2012-07-21 19:13:48.817447,M008,LivingRoom,Chair,ON,Control4-Motion,Watch_TV,Control4-Motion sensor in Chair fired with val...
2012-09-12 21:31:43.649158,LS003,Ignore,Ignore,17,Control4-LightSensor,Cook,Control4-LightSensor sensor in Ignore fired wi...
2012-09-09 07:08:42.323352,MA015,Bathroom,Bathroom,ON,Control4-MotionArea,Toilet,Control4-MotionArea sensor in Bathroom fired w...



## Model Definition and Training
We define a simple neural network classifier to predict activities based on the TDOST embeddings. 
The model is trained using labeled activities in the dataset.


In [11]:
len(embeddings)

321457

In [12]:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

# Convert embeddings to list for training
X = embeddings
y = df['activity']

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a classifier
clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)

# Evaluate on test set
y_pred = clf.predict(X_test)
print(classification_report(y_test, y_pred))


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


                       precision    recall  f1-score   support

                Bathe       0.33      0.93      0.49      3290
Bed_Toilet_Transition       0.00      0.00      0.00       152
                 Cook       0.00      0.00      0.00       525
       Cook_Breakfast       0.28      0.75      0.41      3315
          Cook_Dinner       0.00      0.00      0.00      1106
           Cook_Lunch       0.00      0.00      0.00       661
                Dress       0.00      0.00      0.00      2806
                Drink       0.00      0.00      0.00       769
                  Eat       0.00      0.00      0.00       103
        Eat_Breakfast       0.00      0.00      0.00       705
           Eat_Dinner       0.00      0.00      0.00       207
            Eat_Lunch       0.00      0.00      0.00       169
           Enter_Home       0.23      0.05      0.09       507
     Entertain_Guests       0.00      0.00      0.00       530
         Evening_Meds       0.00      0.00      0.00  

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))



## Summary
This notebook demonstrated the layout-agnostic HAR model using the TDOST approach, where sensor triggers were converted into natural language descriptions and 
embedded using a pre-trained language model. This process improves the model's ability to generalize across different smart home layouts without additional retraining.

Future improvements can include experimenting with different sentence embeddings and deep learning models for enhanced performance.
