# Hypothesis Operationalization & Validation Planning

This section documents the finalized hypotheses, maps them to dataset variables, defines how each hypothesis will be validated, and specifies evaluation metrics, assumptions, and limitations.


In [1]:
import pandas as pd
import numpy as np

In [2]:
import sys
from pathlib import Path

PROJECT_ROOT = Path().resolve().parent
sys.path.append(str(PROJECT_ROOT / "src"))

print(PROJECT_ROOT)
print(PROJECT_ROOT / "src")

D:\Data Analytics\TERM 5\Industrial Machine Prediction Capstone Project\Capstone_Project_Damo-699
D:\Data Analytics\TERM 5\Industrial Machine Prediction Capstone Project\Capstone_Project_Damo-699\src


In [3]:
from data_loader import load_cleaned_data, get_basic_info
df = load_cleaned_data()
df.head()

Unnamed: 0,UDI,Product ID,Type,Air temperature [K],Process temperature [K],Rotational speed [rpm],Torque [Nm],Tool wear [min],Machine failure,TWF,HDF,PWF,OSF,RNF
0,1,M14860,M,298.1,308.6,1551,42.8,0,0,0,0,0,0,0
1,2,L47181,L,298.2,308.7,1408,46.3,3,0,0,0,0,0,0
2,3,L47182,L,298.1,308.5,1498,49.4,5,0,0,0,0,0,0
3,4,L47183,L,298.2,308.6,1433,39.5,7,0,0,0,0,0,0
4,5,L47184,L,298.2,308.7,1408,40.0,9,0,0,0,0,0,0


## Finalized Hypotheses

### H1 — Predictive Capability
- **H01 (Null):** Operational and sensor-based variables do not significantly predict machine failure.
- **H11 (Alternative):** Operational and sensor-based variables significantly predict machine failure.

### H2 — Sensor Impact
- **H02 (Null):** Temperature, torque, rotational speed, and tool wear have no significant association with machine failure.
- **H12 (Alternative):** Temperature, torque, rotational speed, and tool wear have a significant association with machine failure.

### H3 — Model Performance Comparison
- **H03 (Null):** Tree-based machine learning models do not outperform logistic regression in predicting machine failure.
- **H13 (Alternative):** Tree-based machine learning models outperform logistic regression in predicting machine failure.

In [4]:
TARGET_COL = "Machine failure"

if TARGET_COL not in df.columns:
    raise ValueError(f"Target column '{TARGET_COL}' not found. Available columns: {list(df.columns)}")

df[TARGET_COL].value_counts()

Machine failure
0    9661
1     339
Name: count, dtype: int64

## Map Hypotheses to Dataset Variables

### Target Variable (Dependent Variable)
- **Machine failure** (0 = No Failure, 1 = Failure)

### Predictor Variables (Independent Variables)

**Core sensor variables (H2):**
- Air temperature [K]
- Process temperature [K]
- Rotational speed [rpm]
- Torque [Nm]
- Tool wear [min]

**Additional categorical variable (H1):**
- Type (Product type)

### Derived / Engineered Variable (to be created in Sprint 2)
- **Temperature difference** = Process temperature − Air temperature

This mapping ensures each hypothesis is testable using specific dataset features.

In [5]:
# Adjust these names if your columns differ
SENSOR_COLS = [
    "Air temperature [K]",
    "Process temperature [K]",
    "Rotational speed [rpm]",
    "Torque [Nm]",
    "Tool wear [min]",
]

CATEGORICAL_COLS = ["Type"]

missing = [c for c in SENSOR_COLS + CATEGORICAL_COLS if c not in df.columns]
print("Missing columns:", missing)

print("Sensors:", SENSOR_COLS)
print("Categorical:", CATEGORICAL_COLS)

Missing columns: []
Sensors: ['Air temperature [K]', 'Process temperature [K]', 'Rotational speed [rpm]', 'Torque [Nm]', 'Tool wear [min]']
Categorical: ['Type']


## Validation Approach Per Hypothesis

### H1 — Predictive Capability
**Goal:** Determine whether operational/sensor variables can predict machine failure.
**Approach:**
- Train a baseline **Logistic Regression** model.
- Evaluate performance using imbalance-aware metrics (Recall, F1, ROC-AUC).
- If predictive performance is meaningfully better than random or naive baseline, we reject H01.

### H2 — Sensor Impact
**Goal:** Determine whether key sensors are associated with machine failure.
**Approach:**
- Interpret **logistic regression coefficients** (direction and strength).
- Use **tree-based feature importance** to confirm influential sensors.
- Use **failure rate by sensor bins** (risk trend evidence).
- If sensor variables show meaningful influence, reject H02.

### H3 — Model Performance
**Goal:** Check whether tree-based models outperform logistic regression.
**Approach:**
- Train and evaluate Logistic Regression vs Tree-based models (Random Forest / Gradient Boosting).
- Use the **same split** and the **same metrics** for fairness.
- If tree-based models improve ROC-AUC and/or F1 materially, reject H03.

## Selected Evaluation Metrics

Because machine failure is rare (class imbalance), **accuracy alone is not reliable**.

### Primary Metrics (Imbalance-aware)
- **Recall (Sensitivity):** ability to catch failures (critical)
- **F1-score:** balances precision and recall
- **ROC-AUC:** overall discrimination capability
- **Precision-Recall AUC (optional):** especially useful for rare events

### Secondary Metrics
- Confusion Matrix
- Precision
- Specificity

**Justification:** In predictive maintenance, missing a true failure (false negative) is typically more costly than a false alarm (false positive).

In [6]:
counts = df[TARGET_COL].value_counts()
perc = df[TARGET_COL].value_counts(normalize=True) * 100

summary = pd.DataFrame({"count": counts, "percent": perc.round(2)})
summary.index = summary.index.map({0: "No Failure (0)", 1: "Failure (1)"})
summary

Unnamed: 0_level_0,count,percent
Machine failure,Unnamed: 1_level_1,Unnamed: 2_level_1
No Failure (0),9661,96.61
Failure (1),339,3.39


## Assumptions and Limitations

### Assumptions
- Sensor readings are accurate and consistent.
- The failure label represents the true failure condition.
- Observations are independent (no time-dependence assumed in Sprint 1).
- Product types and operating conditions reflect realistic machine behavior.

### Limitations
- The dataset is **highly imbalanced** (failures are rare), which can bias models toward predicting “no failure.”
- The dataset does not provide a time-series sequence; predictive maintenance in real life often benefits from temporal patterns.
- The dataset is a public/synthetic benchmark and may not capture all real-world operational complexities.
- Limited contextual variables (e.g., maintenance history, operator behavior, environment).

## Validation Plan Documentation (Sprint 2 Roadmap)

### Data Splitting Strategy
- Use **stratified train/test split** to preserve class imbalance in both sets.
- Set a random seed for reproducibility.

### Modeling Plan
1. Build baseline Logistic Regression model (interpretability).
2. Build tree-based models (Random Forest / Gradient Boosting).
3. Compare models using the same metrics and split strategy.
4. Apply imbalance handling strategies if needed:
   - class weights
   - oversampling/SMOTE (optional)
5. Select the best model based on **Recall + F1 + ROC-AUC** (not accuracy).

### Evidence for Hypothesis Testing
- **H1:** predictive performance above baseline → reject H01
- **H2:** sensors show meaningful influence (coefficients + importance + bin trends) → reject H02
- **H3:** tree models outperform logistic regression on target metrics → reject H03

In [7]:
validation_plan = {
    "target": TARGET_COL,
    "sensor_features": SENSOR_COLS,
    "categorical_features": CATEGORICAL_COLS,
    "derived_features_next_sprint": ["Temp_diff = Process temperature [K] - Air temperature [K]"],
    "models_next_sprint": ["Logistic Regression", "Random Forest", "Gradient Boosting"],
    "primary_metrics": ["Recall", "F1-score", "ROC-AUC"],
    "secondary_metrics": ["Precision", "Confusion Matrix"],
    "split_strategy": "Stratified Train/Test split",
}

validation_plan

{'target': 'Machine failure',
 'sensor_features': ['Air temperature [K]',
  'Process temperature [K]',
  'Rotational speed [rpm]',
  'Torque [Nm]',
  'Tool wear [min]'],
 'categorical_features': ['Type'],
 'derived_features_next_sprint': ['Temp_diff = Process temperature [K] - Air temperature [K]'],
 'models_next_sprint': ['Logistic Regression',
  'Random Forest',
  'Gradient Boosting'],
 'primary_metrics': ['Recall', 'F1-score', 'ROC-AUC'],
 'secondary_metrics': ['Precision', 'Confusion Matrix'],
 'split_strategy': 'Stratified Train/Test split'}