<a href="https://colab.research.google.com/github/eylulgokce/Met-App/blob/main/met_ml.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MET ML

Download the dataset from "https://archive.ics.uci.edu/dataset/344/heterogeneity+activity+recognition", prepare it for modeling, train and evaluate machine learning models such as Random Forest, Support Vector Machine, Gradient Boosting, and Neural Network models. We will compare their performance, select the best model, and export the selected model and the feature extraction/post-processing pipeline.

## Data preparation

Download the dataset, extract the files.



In [1]:
import pandas as pd
import os

# dataset path
dataset_path = 'HHAR/Activity_Recognition_exp/Activity recognition exp/Phones_accelerometer.csv'

# Check if the dataset file already exists
if not os.path.exists(dataset_path):
    print("Dataset not found locally. Downloading and unzipping...")
    # Download the dataset
    !wget https://archive.ics.uci.edu/static/public/344/heterogeneity+activity+recognition.zip -O heterogeneity+activity+recognition.zip

    # Unzip the dataset
    !unzip heterogeneity+activity+recognition.zip

    # Create directories for unzipping nested files
    os.makedirs('HHAR/Activity_Recognition_exp', exist_ok=True)
    os.makedirs('HHAR/Still_exp', exist_ok=True)

    # Unzip the nested zip files
    !unzip 'Activity recognition exp.zip' -d HHAR/Activity_Recognition_exp
    !unzip 'Still exp.zip' -d HHAR/Still_exp
else:
    print("Dataset found locally. Skipping download and unzip.")

try:
    df_full = pd.read_csv(dataset_path)
    print("Full DataFrame loaded successfully.")

    print("\nFirst 5 rows of the full DataFrame:")
    display(df_full.head())
    print("\nFull DataFrame Info:")
    display(df_full.info())
except FileNotFoundError:
    print(f"Error: Dataset not found at {dataset_path}. Please make sure the file is in the correct directory.")
    df_full = None # Ensure df_full is None if loading fails

Dataset not found locally. Downloading and unzipping...
--2025-08-21 08:59:59--  https://archive.ics.uci.edu/static/public/344/heterogeneity+activity+recognition.zip
Resolving archive.ics.uci.edu (archive.ics.uci.edu)... 128.195.10.252
Connecting to archive.ics.uci.edu (archive.ics.uci.edu)|128.195.10.252|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified
Saving to: ‘heterogeneity+activity+recognition.zip’

heterogeneity+activ     [     <=>            ] 784.01M  26.8MB/s    in 22s     

2025-08-21 09:00:22 (35.5 MB/s) - ‘heterogeneity+activity+recognition.zip’ saved [822098071]

Archive:  heterogeneity+activity+recognition.zip
 extracting: Activity recognition exp.zip  
 extracting: Still exp.zip           
Archive:  Activity recognition exp.zip
   creating: HHAR/Activity_Recognition_exp/Activity recognition exp/
  inflating: HHAR/Activity_Recognition_exp/Activity recognition exp/.DS_Store  
   creating: HHAR/Activity_Recognition_exp/__MACOSX/
   crea

Unnamed: 0,Index,Arrival_Time,Creation_Time,x,y,z,User,Model,Device,gt
0,0,1424696633908,1424696631913248572,-5.958191,0.688065,8.135345,a,nexus4,nexus4_1,stand
1,1,1424696633909,1424696631918283972,-5.95224,0.670212,8.136536,a,nexus4,nexus4_1,stand
2,2,1424696633918,1424696631923288855,-5.995087,0.653549,8.204376,a,nexus4,nexus4_1,stand
3,3,1424696633919,1424696631928385290,-5.942718,0.676163,8.128204,a,nexus4,nexus4_1,stand
4,4,1424696633929,1424696631933420691,-5.991516,0.641647,8.135345,a,nexus4,nexus4_1,stand



Full DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13062475 entries, 0 to 13062474
Data columns (total 10 columns):
 #   Column         Dtype  
---  ------         -----  
 0   Index          int64  
 1   Arrival_Time   int64  
 2   Creation_Time  int64  
 3   x              float64
 4   y              float64
 5   z              float64
 6   User           object 
 7   Model          object 
 8   Device         object 
 9   gt             object 
dtypes: float64(3), int64(3), object(4)
memory usage: 996.6+ MB


None

In [2]:
display(df_full.head())

Unnamed: 0,Index,Arrival_Time,Creation_Time,x,y,z,User,Model,Device,gt
0,0,1424696633908,1424696631913248572,-5.958191,0.688065,8.135345,a,nexus4,nexus4_1,stand
1,1,1424696633909,1424696631918283972,-5.95224,0.670212,8.136536,a,nexus4,nexus4_1,stand
2,2,1424696633918,1424696631923288855,-5.995087,0.653549,8.204376,a,nexus4,nexus4_1,stand
3,3,1424696633919,1424696631928385290,-5.942718,0.676163,8.128204,a,nexus4,nexus4_1,stand
4,4,1424696633929,1424696631933420691,-5.991516,0.641647,8.135345,a,nexus4,nexus4_1,stand


In [3]:
# Check for missing values
print("\nMissing values per column:")
display(df_full.isnull().sum())

# Handle missing values in the 'gt' column by dropping rows
df_full.dropna(subset=['gt'], inplace=True)

print("\nMissing values after dropping rows with missing 'gt':")
display(df_full.isnull().sum())

# Check data types
print("\nData types of columns:")
display(df_full.dtypes)


Missing values per column:


Unnamed: 0,0
Index,0
Arrival_Time,0
Creation_Time,0
x,0
y,0
z,0
User,0
Model,0
Device,0
gt,1783200



Missing values after dropping rows with missing 'gt':


Unnamed: 0,0
Index,0
Arrival_Time,0
Creation_Time,0
x,0
y,0
z,0
User,0
Model,0
Device,0
gt,0



Data types of columns:


Unnamed: 0,0
Index,int64
Arrival_Time,int64
Creation_Time,int64
x,float64
y,float64
z,float64
User,object
Model,object
Device,object
gt,object


In [4]:

window_size = 50

# Calculate basic statistical features (mean, variance, std dev) for each window
df_full['x_mean'] = df_full['x'].rolling(window=window_size).mean()
df_full['y_mean'] = df_full['y'].rolling(window=window_size).mean()
df_full['z_mean'] = df_full['z'].rolling(window=window_size).mean()

df_full['x_var'] = df_full['x'].rolling(window=window_size).var()
df_full['y_var'] = df_full['y'].rolling(window=window_size).var()
df_full['z_var'] = df_full['z'].rolling(window=window_size).var()

df_full['x_std'] = df_full['x'].rolling(window=window_size).std()
df_full['y_std'] = df_full['y'].rolling(window=window_size).std()
df_full['z_std'] = df_full['z'].rolling(window=window_size).std()

# Drop the first `window_size - 1` rows as they will have NaN for the rolling features
df_full.dropna(inplace=True)

print("\nDataFrame with extracted features:")
display(df_full.head())


DataFrame with extracted features:


Unnamed: 0,Index,Arrival_Time,Creation_Time,x,y,z,User,Model,Device,gt,x_mean,y_mean,z_mean,x_var,y_var,z_var,x_std,y_std,z_std
49,49,1424696634164,1424696632160105261,-5.947479,0.733292,8.157959,a,nexus4,nexus4_1,stand,-5.936744,0.694944,8.128228,0.001139,0.002862,0.001706,0.033755,0.053497,0.041298
50,50,1424696634165,1424696632165140662,-5.902252,0.744003,8.097259,a,nexus4,nexus4_1,stand,-5.935625,0.696063,8.127466,0.001153,0.002909,0.001723,0.033956,0.053933,0.041515
51,51,1424696634165,1424696632170176062,-5.930817,0.705917,8.107971,a,nexus4,nexus4_1,stand,-5.935197,0.696777,8.126895,0.001148,0.002897,0.001729,0.033877,0.05382,0.041584
52,52,1424696634175,1424696632175241980,-5.904633,0.665451,8.097259,a,nexus4,nexus4_1,stand,-5.933387,0.697015,8.124753,0.00109,0.002878,0.00162,0.033018,0.053651,0.040249
53,53,1424696634178,1424696632180277380,-5.89154,0.664261,8.072266,a,nexus4,nexus4_1,stand,-5.932364,0.696777,8.123634,0.001123,0.002891,0.001675,0.033512,0.053772,0.040923


## Adaptation of MET values


Approximate MET values:
- Sitting: 1.0-1.3 (Sedentary)
- Standing: 1.3-1.8 (Sedentary to Light)
- Walking: 2.0-5.0 (Light to Moderate)
- Biking: 3.0-8.0+ (Moderate to Vigorous)
- Stair Up: 4.0-8.0+ (Moderate to Vigorous)
- Stair Down: 3.0-6.0 (Moderate)

defining a simplified mapping to the four required MET classes using the actual labels from the dataset:
- Sedentary (< 1.5 METs)
- Light (1.5–3 METs)
- Moderate (3–6 METs)
- Vigorous (> 6 METs)

In [5]:
activity_to_met_class = {
    'sit': 'Sedentary',
    'stand': 'Sedentary',
    'walk': 'Light',
    'bike': 'Moderate',
    'stairsup': 'Moderate',
    'stairsdown': 'Moderate'
}

# Apply the mapping to met_class col
df_full['met_class'] = df_full['gt'].map(activity_to_met_class)

# Check the distribution MET classes
print("\nDistribution of MET classes:")
display(df_full['met_class'].value_counts())

print("\nDataFrame with 'met_class' column:")
display(df_full.head())


Distribution of MET classes:


Unnamed: 0_level_0,count
met_class,Unnamed: 1_level_1
Moderate,5243463
Sedentary,3843362
Light,2192401



DataFrame with 'met_class' column:


Unnamed: 0,Index,Arrival_Time,Creation_Time,x,y,z,User,Model,Device,gt,x_mean,y_mean,z_mean,x_var,y_var,z_var,x_std,y_std,z_std,met_class
49,49,1424696634164,1424696632160105261,-5.947479,0.733292,8.157959,a,nexus4,nexus4_1,stand,-5.936744,0.694944,8.128228,0.001139,0.002862,0.001706,0.033755,0.053497,0.041298,Sedentary
50,50,1424696634165,1424696632165140662,-5.902252,0.744003,8.097259,a,nexus4,nexus4_1,stand,-5.935625,0.696063,8.127466,0.001153,0.002909,0.001723,0.033956,0.053933,0.041515,Sedentary
51,51,1424696634165,1424696632170176062,-5.930817,0.705917,8.107971,a,nexus4,nexus4_1,stand,-5.935197,0.696777,8.126895,0.001148,0.002897,0.001729,0.033877,0.05382,0.041584,Sedentary
52,52,1424696634175,1424696632175241980,-5.904633,0.665451,8.097259,a,nexus4,nexus4_1,stand,-5.933387,0.697015,8.124753,0.00109,0.002878,0.00162,0.033018,0.053651,0.040249,Sedentary
53,53,1424696634178,1424696632180277380,-5.89154,0.664261,8.072266,a,nexus4,nexus4_1,stand,-5.932364,0.696777,8.123634,0.001123,0.002891,0.001675,0.033512,0.053772,0.040923,Sedentary



DataFrame sampled to 0.1% of original size.

First 5 rows of the sampled DataFrame:


Unnamed: 0,Index,Arrival_Time,Creation_Time,x,y,z,User,Model,Device,gt,x_mean,y_mean,z_mean,x_var,y_var,z_var,x_std,y_std,z_std,met_class
0,316643,1424783461561,1424783465042895362,-0.401215,-1.370956,9.120819,d,nexus4,nexus4_1,bike,-0.632253,-1.465338,10.105031,0.238998,0.199082,0.805968,0.488874,0.446186,0.897757,Moderate
1,40227,1424777131681,2801692456000,-3.983902,-0.612908,8.580712,i,samsungold,samsungold_1,walk,-1.087912,-1.179848,10.664599,3.747219,2.045266,11.220813,1.935773,1.430128,3.349748,Light
2,34639,1424779171336,1424779174809449676,-1.414062,0.08345,10.141998,f,nexus4,nexus4_1,stand,-1.402018,0.049625,10.150948,0.001216,0.001274,0.001808,0.034866,0.035694,0.042519,Sedentary
3,135151,1424695257852,13131151454000,5.323507,-0.37589,8.319851,c,s3mini,s3mini_1,bike,4.285334,-1.488739,8.41854,0.851397,1.070869,0.222422,0.922712,1.034828,0.471616,Moderate
4,184378,1424789370842,353877453473000,0.497994,2.260126,10.371682,e,s3,s3_2,bike,1.865754,-0.927993,9.109076,0.542268,1.694502,2.862883,0.736388,1.30173,1.692006,Moderate


In [9]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder

feature_columns = ['x_mean', 'y_mean', 'z_mean', 'x_var', 'y_var', 'z_var', 'x_std', 'y_std', 'z_std']
X = df_sampled[feature_columns]
y = df_sampled['met_class']

# Encode the 'met_class' to numerical representation with labelEncoder
label_encoder = LabelEncoder()
y_encoded = label_encoder.fit_transform(y)

# Split the data into training and testing sets (80/20 split)
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=42, stratify=y_encoded)

print("Shape of X_train:", X_train.shape)
print("Shape of X_test:", X_test.shape)
print("Shape of y_train:", y_train.shape)
print("Shape of y_test:", y_test.shape)
print("\nDistribution of MET classes in y_train:")
display(pd.Series(y_train).value_counts(normalize=True))
print("\nDistribution of MET classes in y_test:")
display(pd.Series(y_test).value_counts(normalize=True))

Shape of X_train: (9023, 9)
Shape of X_test: (2256, 9)
Shape of y_train: (9023,)
Shape of y_test: (2256,)

Distribution of MET classes in y_train:


Unnamed: 0,proportion
1,0.462706
2,0.345451
0,0.191843



Distribution of MET classes in y_test:


Unnamed: 0,proportion
1,0.462766
2,0.345301
0,0.191933


## ML models

### Random Forest

In [20]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# n_estimators=10
rf_model = RandomForestClassifier(n_estimators=10, random_state=42, n_jobs=-1)

# Train the model
print("Training Random Forest model...")
rf_model.fit(X_train, y_train)
print("Training complete.")

# Make predictions
print("Making predictions on testing data...")
y_pred_rf = rf_model.predict(X_test)
print("Predictions complete.")

# Evaluate the model
accuracy_rf = accuracy_score(y_test, y_pred_rf)
precision_rf = precision_score(y_test, y_pred_rf, average='weighted')
recall_rf = recall_score(y_test, y_pred_rf, average='weighted')
f1_rf = f1_score(y_test, y_pred_rf, average='weighted')

print("\nRandom Forest Model Performance:")
print(f"Accuracy: {accuracy_rf:.4f}")
print(f"Precision: {precision_rf:.4f}")
print(f"Recall: {recall_rf:.4f}")
print(f"F1-score: {f1_rf:.4f}")

Training Random Forest model...
Training complete.
Making predictions on testing data...
Predictions complete.

Random Forest Model Performance:
Accuracy: 0.8741
Precision: 0.8740
Recall: 0.8741
F1-score: 0.8741


In [21]:
import joblib
import os

model_filename = 'random_forest_model.joblib'

# save model
joblib.dump(rf_model, model_filename)

print(f"Random Forest model saved to '{model_filename}'")

Random Forest model saved to 'random_forest_model.joblib'


### Support Vector Machine (SVM)



In [12]:
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import time

# Use SGDClassifier for faster training on large datasets.
# With loss='hinge', it's equivalent to a linear SVM.
svm_model = SGDClassifier(loss='hinge', random_state=42, n_jobs=-1, early_stopping=True, max_iter=1000) # Added early_stopping and increased max_iter

# Train the model
print("Training SGDClassifier (Linear SVM) model...")
start_time = time.time()
svm_model.fit(X_train, y_train)
end_time = time.time()
print(f"Training complete in {end_time - start_time:.2f} seconds.")

# Make predictions
print("Making predictions on testing data...")
start_time = time.time()
y_pred_svm = svm_model.predict(X_test)
end_time = time.time()
print(f"Predictions complete in {end_time - start_time:.2f} seconds.")

# Evaluate the model
# Use zero_division=0 to handle cases where a class has no predicted samples
accuracy_svm = accuracy_score(y_test, y_pred_svm)
precision_svm = precision_score(y_test, y_pred_svm, average='weighted', zero_division=0)
recall_svm = recall_score(y_test, y_pred_svm, average='weighted', zero_division=0)
f1_svm = f1_score(y_test, y_pred_svm, average='weighted', zero_division=0)

print("\nSGDClassifier (Linear SVM) Model Performance:")
print(f"Accuracy: {accuracy_svm:.4f}")
print(f"Precision: {precision_svm:.4f}")
print(f"Recall: {recall_svm:.4f}")
print(f"F1-score: {f1_svm:.4f}")

Training SGDClassifier (Linear SVM) model...
Training complete in 0.04 seconds.
Making predictions on testing data...
Predictions complete in 0.00 seconds.

SGDClassifier (Linear SVM) Model Performance:
Accuracy: 0.7496
Precision: 0.6456
Recall: 0.7496
F1-score: 0.6776


In [14]:
import joblib
import os

model_filename_svm = 'svm_model.joblib'

# Save the SVM model
joblib.dump(svm_model, model_filename_svm)

print(f"SVM model saved to '{model_filename_svm}'")

SVM model saved to 'svm_model.joblib'


In [15]:
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import time

# Initialize the Gradient Boosting Classifier
# Using a moderate number of estimators for a balance of performance and speed
gb_model = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, max_depth=3, random_state=42)

# Train the model
print("Training Gradient Boosting model...")
start_time = time.time()
gb_model.fit(X_train, y_train)
end_time = time.time()
print(f"Training complete in {end_time - start_time:.2f} seconds.")

# Make predictions
print("Making predictions on testing data...")
start_time = time.time()
y_pred_gb = gb_model.predict(X_test)
end_time = time.time()
print(f"Predictions complete in {end_time - start_time:.2f} seconds.")

# Evaluate the model
# Use zero_division=0 to handle cases where a class has no predicted samples
accuracy_gb = accuracy_score(y_test, y_pred_gb)
precision_gb = precision_score(y_test, y_pred_gb, average='weighted', zero_division=0)
recall_gb = recall_score(y_test, y_pred_gb, average='weighted', zero_division=0)
f1_gb = f1_score(y_test, y_pred_gb, average='weighted', zero_division=0)

print("\nGradient Boosting Model Performance:")
print(f"Accuracy: {accuracy_gb:.4f}")
print(f"Precision: {precision_gb:.4f}")
print(f"Recall: {recall_gb:.4f}")
print(f"F1-score: {f1_gb:.4f}")

Training Gradient Boosting model...
Training complete in 18.86 seconds.
Making predictions on testing data...
Predictions complete in 0.01 seconds.

Gradient Boosting Model Performance:
Accuracy: 0.8688
Precision: 0.8657
Recall: 0.8688
F1-score: 0.8659


In [16]:
import joblib
import os

model_filename_gb = 'gradient_boosting_model.joblib'

# Save the Gradient Boosting model
joblib.dump(gb_model, model_filename_gb)

print(f"Gradient Boosting model saved to '{model_filename_gb}'")

Gradient Boosting model saved to 'gradient_boosting_model.joblib'


### Gradient Boosting

### Neural Network models

In [17]:
import tensorflow as tf
from tensorflow import keras
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
import numpy as np
import time

# Define the Neural Network model
# Using a simple architecture for efficiency
model = keras.Sequential([
    keras.layers.InputLayer(input_shape=(X_train.shape[1],)), # Input layer matching the number of features
    keras.layers.Dense(64, activation='relu'), # Hidden layer with ReLU activation
    keras.layers.Dropout(0.2), # Dropout for regularization
    keras.layers.Dense(32, activation='relu'), # Another hidden layer
    keras.layers.Dropout(0.2), # Dropout for regularization
    keras.layers.Dense(len(np.unique(y_train)), activation='softmax') # Output layer with softmax for multi-class classification
])

# Compile the model
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy', # Use sparse_categorical_crossentropy for integer labels
              metrics=['accuracy'])

# Train the model
print("Training Neural Network model...")
start_time = time.time()
history = model.fit(X_train, y_train, epochs=20, batch_size=32, validation_split=0.2, verbose=1) # Train for a reasonable number of epochs
end_time = time.time()
print(f"Training complete in {end_time - start_time:.2f} seconds.")

# Evaluate the model
print("Evaluating Neural Network model...")
start_time = time.time()
loss, accuracy_nn = model.evaluate(X_test, y_test, verbose=0)
end_time = time.time()
print(f"Evaluation complete in {end_time - start_time:.2f} seconds.")

# Make predictions
y_pred_nn_probs = model.predict(X_test)
y_pred_nn = np.argmax(y_pred_nn_probs, axis=1)


# Evaluate the model using scikit-learn metrics
precision_nn = precision_score(y_test, y_pred_nn, average='weighted', zero_division=0)
recall_nn = recall_score(y_test, y_pred_nn, average='weighted', zero_division=0)
f1_nn = f1_score(y_test, y_pred_nn, average='weighted', zero_division=0)


print("\nNeural Network Model Performance:")
print(f"Accuracy: {accuracy_nn:.4f}")
print(f"Precision: {precision_nn:.4f}")
print(f"Recall: {recall_nn:.4f}")
print(f"F1-score: {f1_nn:.4f}")

Training Neural Network model...
Epoch 1/20




[1m226/226[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 5ms/step - accuracy: 0.6155 - loss: 0.8292 - val_accuracy: 0.7773 - val_loss: 0.4187
Epoch 2/20
[1m226/226[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.7740 - loss: 0.4733 - val_accuracy: 0.8161 - val_loss: 0.3628
Epoch 3/20
[1m226/226[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 0.8061 - loss: 0.4095 - val_accuracy: 0.8133 - val_loss: 0.3479
Epoch 4/20
[1m226/226[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8136 - loss: 0.3794 - val_accuracy: 0.8349 - val_loss: 0.3401
Epoch 5/20
[1m226/226[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8302 - loss: 0.3587 - val_accuracy: 0.8410 - val_loss: 0.3305
Epoch 6/20
[1m226/226[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - accuracy: 0.8309 - loss: 0.3642 - val_accuracy: 0.8343 - val_loss: 0.3309
Epoch 7/20
[1m226/226[0m [32m━━━━━━━

In [18]:
import os

# Define the filename for the Keras model
model_filename_nn = 'neural_network_model.keras'

# Save the model in the native Keras format
model.save(model_filename_nn)

print(f"Neural Network model saved to '{model_filename_nn}'")

Neural Network model saved to 'neural_network_model.keras'


In [23]:
import pandas as pd

# Create a dictionary to store the performance metrics
performance_data = {
    'Model': ['Random Forest', 'Linear SVM (SGDClassifier)', 'Gradient Boosting', 'Neural Network'],
    'Accuracy': [accuracy_rf, accuracy_svm, accuracy_gb, accuracy_nn],
    'Precision (Weighted)': [precision_rf, precision_svm, precision_gb, precision_nn],
    'Recall (Weighted)': [recall_rf, recall_svm, recall_gb, recall_nn],
    'F1-score (Weighted)': [f1_rf, f1_svm, f1_gb, f1_nn]
}

# Create a pandas DataFrame from the performance data
performance_df = pd.DataFrame(performance_data)

# Display the performance comparison table
print("Model Performance Comparison:")
display(performance_df)

Model Performance Comparison:


Unnamed: 0,Model,Accuracy,Precision (Weighted),Recall (Weighted),F1-score (Weighted)
0,Random Forest,0.874113,0.874034,0.874113,0.874057
1,Linear SVM (SGDClassifier),0.749557,0.645567,0.749557,0.67755
2,Gradient Boosting,0.868794,0.865678,0.868794,0.865933
3,Neural Network,0.864805,0.864993,0.864805,0.850731


## Model Selection

Based on the performance comparison, the **Random Forest** model demonstrated the highest performance across all evaluated metrics (Accuracy, Precision, Recall, and F1-score) on the sampled dataset. Therefore, the Random Forest model is selected as the best model for this activity recognition task among the models trained.