# 🧠 AI-Driven Predictive Maintenance for Water Utility Assets

This notebook demonstrates the development of an AI-based predictive model to classify asset failure events in a water utility context. The goal is to enhance operational efficiency, reduce downtime, and support smarter maintenance planning through data-driven insights.

## 📦 Import Required Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import classification_report, confusion_matrix
import warnings
warnings.filterwarnings('ignore')
sns.set(style='whitegrid')


## 📥 Load and Preview Dataset

In [None]:
df = pd.read_csv("../data/sample_asset_data_full.csv")
df.head()


## 🔍 Data Overview

In [None]:
df.info()


In [None]:
df.describe()


### 📊 Failure Distribution

In [None]:
sns.countplot(x='Failure', data=df, palette='Set2')
plt.title("Failure Class Distribution")
plt.xlabel("Failure (0 = No, 1 = Yes)")
plt.ylabel("Count")
plt.show()


### 📦 Asset Conditions by Type

In [None]:
plt.figure(figsize=(12, 6))
sns.boxplot(x='Asset_Type', y='Vibration', data=df)
plt.title("Vibration Distribution by Asset Type")
plt.show()


## 📈 Correlation Analysis

In [None]:
plt.figure(figsize=(10, 6))
sns.heatmap(df.corr(numeric_only=True), annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()


## 🧹 Data Preprocessing & Encoding

In [None]:
df_encoded = pd.get_dummies(df.drop("Asset_ID", axis=1), drop_first=True)
X = df_encoded.drop("Failure", axis=1)
y = df_encoded["Failure"]

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, stratify=y, random_state=42)


## 🧠 Model Training: Random Forest Classifier

In [None]:
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)


## ✅ Model Evaluation

In [None]:
y_pred = model.predict(X_test)
print("Classification Report:")
print(classification_report(y_test, y_pred))


In [None]:
plt.figure(figsize=(6, 4))
sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt='d', cmap='Blues')
plt.title("Confusion Matrix")
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.show()


## 📌 Feature Importance

In [None]:
importances = model.feature_importances_
feature_names = X.columns
indices = np.argsort(importances)[::-1]

plt.figure(figsize=(10, 6))
sns.barplot(x=importances[indices][:10], y=feature_names[indices][:10], palette='viridis')
plt.title("Top 10 Important Features")
plt.xlabel("Importance Score")
plt.ylabel("Feature")
plt.show()


## 💾 Save Trained Model

In [None]:
from joblib import dump
dump(model, "../models/rf_model.pkl")


## 📊 Final Insights & Operational Value

- High vibration and temperature levels are key indicators of potential asset failure.
- Predictive modeling can proactively inform **maintenance scheduling**, **SLA compliance checks**, and **contract scoring**.
- This AI approach supports smart water management strategies and improves uptime, as proven in national-scale deployments.
