# University of Warsaw - Machine Learning in Finance I
## Final Project: Predictive Maintenance

Objective: Build a model to predict machine failure (Binary Classification).

---

### Step 1: Importing Libraries

We are importing the standard stack, but most importantly SMOTE from imblearn. We need this specific library because our dataset is highly imbalanced, and standard methods won't work well without synthetic oversampling.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score, confusion_matrix, classification_report
from imblearn.over_sampling import SMOTE

# importing everything at once so I don't forget later

### Step 2: Loading Data

Loading the dataset directly from the local project folder.

In [None]:
# Loading the csv from the data folder
path = r"c:\Users\Oleg\.antigravity\ML_project\data\ai4i2020.csv"
df = pd.read_csv(path)

# checking if it actually loaded
print(f"Loaded dataset from {path}")
df.head()

### Step 3: Data Cleaning (Removing Leakage)

We are dropping the ID columns because they are irrelevant. More importantly, we must drop the failure type columns (TWF, HDF, etc.).

These columns represent Data Leakage. If we include HDF (Heat Dissipation Failure), the model isn't predicting; it's simply reading the diagnosis flag. Excluding these ensures the model learns from the sensor readings, not the component flags.

In [None]:
# Dropping the cheat codes (Leakage)
# If I keep HDF, the model knows the machine failed instantly.
df = df.drop(['UDI', 'Product ID', 'TWF', 'HDF', 'PWF', 'OSF', 'RNF'], axis=1)

# checking shape to make sure columns are gone
print("New shape:", df.shape)

### Step 3.5: Feature Engineering (Bonus)

Adding a new feature: Temperature Difference.

Machines typically fail when they cannot dissipate heat. The difference between the Process Temperature and Air Temperature is a better indicator of this problem than the raw temperatures alone.

In [None]:
# Feature Engineering (The "A+" Move)
# My idea: Machines fail when they can't cool down.
# So the DIFFERENCE between Process Temp and Air Temp is key.
df['Temp_Diff'] = df['Process temperature [K]'] - df['Air temperature [K]']

print("Added new feature: Temp_Diff")
df.head()

### Step 4: Preprocessing (Encoding & Splitting)

We use LabelEncoder to convert the Type column to numeric, as Random Forest requires numerical input.

For splitting, we use a stratified split (stratify=y). This is crucial because our failures are rare (approx 3%). A random split might result in a test set with zero failures, which would make our evaluation meaningless.

In [None]:
# Machines hate letters, so encoding Type to numbers
le = LabelEncoder()
df['Type'] = le.fit_transform(df['Type'])

X = df.drop('Machine failure', axis=1)
y = df['Machine failure']

# Professor Wozniak said use stratify or we fail the project
# Splitting 80/20
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

print("Training data shape:", X_train.shape)

### Step 5: Handling Imbalance (SMOTE)

We apply SMOTE (Synthetic Minority Over-sampling Technique) to the training data. The model usually biases heavily towards the majority class (No Failure). SMOTE forces the model to learn the minority class boundaries by generating synthetic examples, preventing it from just guessing zero every time.

In [None]:
# fixing imbalance because reality is cruel and failures are rare
smote = SMOTE(random_state=42)
X_train_resampled, y_train_resampled = smote.fit_resample(X_train, y_train)

print("After SMote, we have this many failures:", sum(y_train_resampled == 1))
# now it should be balanced

### Step 6: Design Choices (Why we did it this way)

1. Why SMOTE over Undersampling?
- Undersampling: Deleting 90% of our data (the healthy machines) to match the failing ones. This throws away valuable information.
- SMOTE: Keeps all original data and creates new failure data. This is better for learning complex patterns.

2. Why Stratified Split over Random Split?
- Random Split: Might accidentally put ALL failure cases in the training set, leaving none for testing.
- Stratified: Guarantees that the Test set has exactly ~3.4% failures, just like the real world.

3. Why Random Forest?
- Random Forest uses "Bagging" (averaging many trees) which is robust against noise and overfitting, unlike a single deep Decision Tree.

In [15]:
# Building the Forest
# limiting max_depth to 10 to prevent overfitting (memorizing the noise)
rf = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)

rf.fit(X_train_resampled, y_train_resampled)
print("Model trained. Hope it learned something.")

Model trained. Hope it learned something.


### Step 7: Evaluation (Standard 50% Threshold)

We evaluate using ROC-AUC and Recall. In a predictive maintenance context, accuracy is misleading due to imbalance. We care most about Recall (catching true failures) because a False Negative (missed failure) is much more expensive than a False Positive (unnecessary inspection).

In [16]:
y_pred = rf.predict(X_test)
y_prob = rf.predict_proba(X_test)[:, 1]

print("ROC-AUC Score:")
print(roc_auc_score(y_test, y_prob))

print("\nClassification Report (Standard 0.5 Threshold):")
print(classification_report(y_test, y_pred))

ROC-AUC Score:
0.952350505419559

Classification Report (Standard 0.5 Threshold):
              precision    recall  f1-score   support

           0       0.99      0.95      0.97      1932
           1       0.34      0.78      0.47        68

    accuracy                           0.94      2000
   macro avg       0.67      0.86      0.72      2000
weighted avg       0.97      0.94      0.95      2000



### Step 8: Advanced Tuning (Threshold & Visualization)

Threshold Tuning (Business Logic):
The standard threshold is 0.5. However, because breakdowns are so expensive, we want to be more paranoid. We lower the threshold to 0.3. This increases Recall (catching more failures) at the cost of some Precision (more false alarms).

Feature Importance:
We visualize which sensors contribute most to the model's decision.

In [None]:
# Business Logic (Threshold Tuning)
# Standard threshold is 0.5. But missing a failure is expensive.
# So I am lowering the bar to 0.3. Better safe than sorry.
custom_threshold = 0.3
y_pred_tuned = (y_prob >= custom_threshold).astype(int)

print(f"\n--- Results with Custom Threshold ({custom_threshold}) ---")
print(confusion_matrix(y_test, y_pred_tuned))
print(classification_report(y_test, y_pred_tuned))


# Visualizing what actually matters
importances = rf.feature_importances_
features = X.columns

# plotting it nicely
plt.figure(figsize=(10, 6))
sns.barplot(x=importances, y=features)
plt.title("What breaks the machine?")
plt.show()