**Project Overview**

The goal of this project is to collect motion data, extract meaningful features, and use a Hidden Markov Model (HMM) to infer human activity states such as standing, walking, jumping, and still.

Sensors Recorded:
- **Accelerometer:** x, y, z axes
- **Gyroscope:** x, y, z axes

Activities Performed:
| Activity | Duration | Notes |
|-----------|-----------|--------|
| Standing | 5–10 s | Keep the phone steady at waist level |
| Walking | 5–10 s | Maintain a consistent pace |
| Jumping | 5–10 s | Perform 3–5 continuous jumps |
| Still | 5–10 s | Place the phone on a flat surface |

Each recording was taken at a sampling rate of **70 Hz**, and the collected data was saved as `.csv` files with timestamps.


**Feature Extraction Summary**

From the raw accelerometer and gyroscope readings, features were extracted in both **time** and **frequency** domains to capture movement patterns.

Examples of extracted features:
- **Time-domain features:** mean, variance, standard deviation, signal magnitude area (SMA), correlations between axes.
- **Frequency-domain features:** dominant frequency, spectral energy, FFT components.

The extracted features were saved in a file called `features.csv`, which contains all four activities combined with their corresponding labels.


Load Extracted Data

In [9]:
import pandas as pd

# Load preprocessed features
df = pd.read_csv("features.csv")

# Standardize column name to 'label'
df.rename(columns={'activity': 'label', 'Activity': 'label', 'Label': 'label'}, inplace=True)

# Display basic information
print("Dataset shape:", df.shape)
df.head()

Dataset shape: (948, 94)


Unnamed: 0,accel_x_mean,accel_x_std,accel_x_var,accel_x_mad,accel_y_mean,accel_y_std,accel_y_var,accel_y_mad,accel_z_mean,accel_z_std,...,acc_res_dom_freq,acc_res_spec_energy,acc_res_fft_top1_mag,acc_res_fft_top1_freq,acc_res_fft_top2_mag,acc_res_fft_top2_freq,acc_res_fft_top3_mag,acc_res_fft_top3_freq,label,start_time
0,-0.325147,1.005483,1.010997,0.715388,-0.125287,1.95071,3.805271,1.288558,-0.052421,7.376959,...,0.78125,50.754583,248.492811,0.78125,224.096697,0.390625,147.161871,3.90625,jumping,1761134000.0
1,-0.333427,1.21263,1.470472,0.896227,-0.036922,2.340061,5.475887,1.697479,0.170994,8.71132,...,3.90625,65.81295,183.212361,3.90625,164.318067,1.953125,137.923398,0.78125,jumping,1761134000.0
2,-0.150583,1.298404,1.685853,0.915027,-0.09291,1.918203,3.679501,1.403808,-0.045061,8.365517,...,3.90625,64.546681,193.348538,3.90625,176.854559,1.953125,164.362721,5.859375,jumping,1761134000.0
3,-0.137955,1.175404,1.381574,0.768837,-0.24509,1.596382,2.548437,1.141766,0.054054,7.954364,...,0.78125,58.464855,275.642796,0.78125,175.347922,3.90625,158.849126,0.390625,jumping,1761134000.0
4,-0.134741,0.959452,0.920549,0.607052,-0.155564,1.428992,2.042019,0.890355,0.201396,6.758013,...,0.390625,41.634446,379.684283,0.390625,128.71818,3.90625,118.783268,3.515625,jumping,1761134000.0


Data Inspection

In [10]:
# Check for missing values and class distribution
print("Missing values:\n", df.isna().sum())
print("\nClass distribution:\n", df['label'].value_counts())

Missing values:
 accel_x_mean             0
accel_x_std              0
accel_x_var              0
accel_x_mad              0
accel_y_mean             0
                        ..
acc_res_fft_top2_freq    0
acc_res_fft_top3_mag     0
acc_res_fft_top3_freq    0
label                    0
start_time               0
Length: 94, dtype: int64

Class distribution:
 label
walking     266
still       247
jumping     227
standing    208
Name: count, dtype: int64


**Defining Model Components**

In a Hidden Markov Model (HMM), we define the following key elements:

| Element | Description |
|----------|--------------|
| **Hidden States (Z)** | The underlying activities (e.g., standing, walking, jumping, still). |
| **Observations (X)** | Feature vectors derived from accelerometer and gyroscope signals. |
| **Transition Probabilities (A)** | Probability of transitioning from one activity to another. |
| **Emission Probabilities (B)** | Probability of observing a specific feature pattern given an activity. |
| **Initial State Probabilities (π)** | Likelihood of starting in a specific activity. |

We will now define these components based on our dataset.

Preparing data for HMM

In [11]:
import numpy as np

# Separate features (X) and labels (y)
X = df.drop(columns=['label']).values
y = df['label'].values

# Define hidden states
states = sorted(df['label'].unique())
n_states = len(states)

# Map labels to numeric states
state_map = {name: i for i, name in enumerate(states)}
y_encoded = np.array([state_map[label] for label in y])

print("Hidden States:", states)
print("Feature Matrix Shape:", X.shape)

Hidden States: ['jumping', 'standing', 'still', 'walking']
Feature Matrix Shape: (948, 93)
