## Task and Models

Our task is to focus on activity recognition, at least for now. We will create a few models to get a baseline

#### Random Forest
A random forest may be able to accurately predict activities, as this data is somewhat complex and noisy

#### KNN
K-Nearest Neighbors is likely not a good choice due to the curse of dimensionality

#### Logistic/Linear Regression
The relationships here aren't linear, so this likely wouldn't be a good choice

#### Decision Trees
These may be a good option due to the breadth of the data, but the data set is very large which may cause issues

#### DNN
Deep neural networks handle high-dimensional data well, but do not take into account sequencing. Due to this, DNNs are not a good choice

#### CNN
Convolutional neural networks are good for grid-like structures (images) and for spotting patterns in certain sectors of data. This may be a possible option, but likely wouldn't perform well. It's main use cases involve image, video, or natural language processing.

#### LSTM
Long Short-Tern Memory models are good for understanding sequential data, context, and long term dependencies. This is likely the best choice for the job (out of the deep models)


In [18]:
def create_windows(X, y, window_size, step_size):
    X_win, y_win = [], []
    for i in range(0, len(X) - window_size, step_size):
        window = X.iloc[i:i + window_size].values
        label = y.iloc[i + window_size]
        X_win.append(window)
        y_win.append(label)
    return np.array(X_win), np.array(y_win)

# LSTM

In [11]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.utils import to_categorical

In [12]:

FILE_PATH = "../processed_data/final_processed_data_ALL_DAYS.csv"
TARGET_COLUMN = 'activity_user_1'  # We will predict the activity for User 1
WINDOW_SIZE = 60  # How many past time steps to look at (e.g., 60 * 2s = 120 seconds of history)
STEP_SIZE = 30    # How far to slide the window forward each time

print("Loading final processed data...")
df = pd.read_csv(FILE_PATH)

print("Separating data...")
df.dropna(inplace=True)
X = df.drop(columns=[col for col in df.columns if 'activity' in col])
y = df[TARGET_COLUMN].astype(int)
num_classes = len(y.unique())
y_categorical = to_categorical(y, num_classes=num_classes)
print(f"Creating sliding windows (size={WINDOW_SIZE}, step={STEP_SIZE})...")
X_win, y_win = create_windows(X, pd.Series(y_categorical.tolist()), WINDOW_SIZE, STEP_SIZE)
print(f"  - Windowed X shape: {X_win.shape}")
print(f"  - Windowed y shape: {y_win.shape}")

print("Splitting data into training and test sets...")
X_train, X_test, y_train, y_test = train_test_split(X_win, y_win, test_size=0.2, random_state=42)
print(f"  - Training set size: {len(X_train)}")
print(f"  - Test set size: {len(X_test)}")

print("Building the LSTM model...")
model = Sequential([
    # The input layer must match the shape of our windows (WINDOW_SIZE, num_features)
    LSTM(64, input_shape=(X_train.shape[1], X_train.shape[2]), return_sequences=True),
    Dropout(0.5),
    LSTM(64),
    Dropout(0.5),
    Dense(num_classes, activation='softmax') # The output layer has one neuron per activity
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()

print("\nTraining the model...")
history = model.fit(
    X_train, y_train,
    epochs=10,          # Start with a few epochs to see how it goes
    batch_size=128,
    validation_split=0.1, # Use part of the training data for validation
    verbose=1
)
model.save('LSTM_first_iteration.keras')

print("\nEvaluating the model on the test set...")
loss, accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"\nTest Accuracy: {accuracy * 100:.2f}%")
y_pred_probs = model.predict(X_test)
y_pred = np.argmax(y_pred_probs, axis=1)
y_test_labels = np.argmax(y_test, axis=1)

print("\nClassification Report:")
print(classification_report(y_test_labels, y_pred))

Loading final processed data...
Separating data...
Creating sliding windows (size=60, step=30)...


KeyboardInterrupt: 

# Random Forest

In [15]:
from sklearn.metrics import accuracy_score
from sklearn.ensemble import RandomForestClassifier
import joblib

FILE_PATH = "../processed_data/final_processed_data_ALL_DAYS.csv"
TARGET_COLUMN = 'activity_user_1'
WINDOW_SIZE = 60  # 120 seconds of history
STEP_SIZE = 30

print("Loading final processed data...")
df = pd.read_csv(FILE_PATH)

print("Separating Features and Target...")
df.dropna(inplace=True)
X = df.drop(columns=[col for col in df.columns if 'activity' in col])
y = df[TARGET_COLUMN].astype(int)

print(f"Creating sliding windows (size={WINDOW_SIZE}, step={STEP_SIZE})...")
X_win, y_win = create_windows(X, y, WINDOW_SIZE, STEP_SIZE)
print(f"  - Initial windowed X shape: {X_win.shape}")

print("Flattening window data...")
n_samples, n_timesteps, n_features = X_win.shape
X_flattened = X_win.reshape((n_samples, n_timesteps * n_features))
print(f"  - Flattened X shape: {X_flattened.shape}")

print("Splitting data into training and test sets...")
X_train, X_test, y_train, y_test = train_test_split(X_flattened, y_win, test_size=0.2, random_state=42)
print(f"  - Training set size: {len(X_train)}")
print(f"  - Test set size: {len(X_test)}")

print("\nBuilding and training the Random Forest model...")
# n_estimators is the number of trees in the forest.
# n_jobs=-1 uses all available CPU cores for faster training.
model = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)

model.fit(X_train, y_train)

print("\nEvaluating the model on the test set...")
y_pred = model.predict(X_test)
joblib.dump(model, "../models/RandomForest_first_iteration.joblib")

accuracy = accuracy_score(y_test, y_pred)
print(f"\nTest Accuracy: {accuracy * 100:.2f}%")

print("\nClassification Report:")
# You may need to create a mapping from integer back to activity name for readability
activity_names = ["BATHROOM ACTIVITY", "CHORES", "COOK", "DISHWASHING", "DRESS", "EAT", "LAUNDRY",
                  "MAKE SIMPLE FOOD", "OUT HOME", "PET", "READ", "RELAX", "SHOWER", "SLEEP",
                  "TAKE MEDS", "WATCH TV", "WORK", "OTHER"]
print(classification_report(y_test, y_pred, target_names=activity_names))

Loading final processed data...
Separating Features and Target...
Creating sliding windows (size=60, step=30)...
  - Initial windowed X shape: (67467, 60, 41)
Flattening window data...
  - Flattened X shape: (67467, 2460)
Splitting data into training and test sets...
  - Training set size: 53973
  - Test set size: 13494

Building and training the Random Forest model...

Evaluating the model on the test set...

Test Accuracy: 97.90%

Classification Report:
                   precision    recall  f1-score   support

BATHROOM ACTIVITY       0.94      0.92      0.93       381
           CHORES       0.99      0.88      0.93        83
             COOK       0.96      0.89      0.93       139
      DISHWASHING       1.00      0.88      0.93        16
            DRESS       0.75      0.36      0.49        33
              EAT       0.92      0.98      0.95       609
          LAUNDRY       0.00      0.00      0.00         1
 MAKE SIMPLE FOOD       0.93      0.72      0.81       104
        

  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


# Decision Tree

In [16]:
from sklearn.tree import DecisionTreeClassifier

FILE_PATH = "../processed_data/final_processed_data_ALL_DAYS.csv"
TARGET_COLUMN = 'activity_user_1'
WINDOW_SIZE = 60
STEP_SIZE = 30

print("Loading final processed data...")
df = pd.read_csv(FILE_PATH)

print("Separating Features and Target...")
df.dropna(inplace=True)
X = df.drop(columns=[col for col in df.columns if 'activity' in col])
y = df[TARGET_COLUMN].astype(int)

print(f"Creating sliding windows (size={WINDOW_SIZE}, step={STEP_SIZE})...")
X_win, y_win = create_windows(X, y, WINDOW_SIZE, STEP_SIZE)
print(f"  - Initial windowed X shape: {X_win.shape}")

print("Flattening window data...")
n_samples, n_timesteps, n_features = X_win.shape
X_flattened = X_win.reshape((n_samples, n_timesteps * n_features))
print(f"  - Flattened X shape: {X_flattened.shape}")

print("Splitting data into training and test sets...")
X_train, X_test, y_train, y_test = train_test_split(X_flattened, y_win, test_size=0.2, random_state=42)
print(f"  - Training set size: {len(X_train)}")
print(f"  - Test set size: {len(X_test)}")

print("Building and training the Decision Tree model...")
model = DecisionTreeClassifier(random_state=42)

model.fit(X_train, y_train)
print("  - Model training complete!")

print("Evaluating the model on the test set...")
y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"\nTest Accuracy: {accuracy * 100:.2f}%")

print("\nClassification Report:")
activity_names = ["BATHROOM ACTIVITY", "CHORES", "COOK", "DISHWASHING", "DRESS", "EAT", "LAUNDRY",
                  "MAKE SIMPLE FOOD", "OUT HOME", "PET", "READ", "RELAX", "SHOWER", "SLEEP",
                  "TAKE MEDS", "WATCH TV", "WORK", "OTHER"]
print(classification_report(y_test, y_pred, target_names=activity_names))

print("Saving the Decision Tree model...")
joblib.dump(model, "../models/DecisionTree_first_iteration.joblib")
print("  - Model saved successfully!")

Loading final processed data...
Separating Features and Target...
Creating sliding windows (size=60, step=30)...
  - Initial windowed X shape: (67467, 60, 41)
Flattening window data...
  - Flattened X shape: (67467, 2460)
Splitting data into training and test sets...
  - Training set size: 53973
  - Test set size: 13494
Building and training the Decision Tree model...
  - Model training complete!
Evaluating the model on the test set...

Test Accuracy: 95.90%

Classification Report:
                   precision    recall  f1-score   support

BATHROOM ACTIVITY       0.90      0.89      0.89       381
           CHORES       0.81      0.78      0.80        83
             COOK       0.81      0.78      0.80       139
      DISHWASHING       0.82      0.56      0.67        16
            DRESS       0.23      0.24      0.24        33
              EAT       0.88      0.90      0.89       609
          LAUNDRY       0.00      0.00      0.00         1
 MAKE SIMPLE FOOD       0.70      0.64  