# Smart Fault Detection System for Electrical Transformers

## Project Overview
This notebook demonstrates an AI-powered predictive maintenance system for electrical transformers. The system analyzes sensor data to predict potential faults before they cause equipment failure, helping power companies reduce downtime and prevent costly equipment failures.

**Business Impact:**
- Early fault detection reduces unplanned downtime by up to 70%
- Prevents expensive transformer failures (cost: $500K - $2M per unit)
- Enables condition-based maintenance scheduling
- Improves grid reliability and power quality

**Technical Approach:**
- Machine Learning Classification (Random Forest)
- Multi-class prediction: Normal, Warning, Critical
- Real-time sensor data analysis
- Feature engineering for time-series patterns

## 1. Import Libraries and Setup

In [None]:
# Data manipulation and analysis
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns

# Machine Learning
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import (
    classification_report, confusion_matrix, accuracy_score,
    roc_auc_score, roc_curve, f1_score, precision_score, recall_score
)

# Model persistence
import joblib
import warnings
warnings.filterwarnings('ignore')

# Set style for better plots
plt.style.use('default')
sns.set_palette("husl")

print("✅ All libraries imported successfully!")
print(f"📊 Analysis started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

## 2. Data Loading and Initial Exploration

In [None]:
# Load the transformer sensor data
df = pd.read_csv('../data/transformer_sensor_data.csv')

print("🔍 Dataset Overview:")
print(f"Dataset shape: {df.shape}")
print(f"Time period: {df['timestamp'].min()} to {df['timestamp'].max()}")
print(f"\n📈 Fault Status Distribution:")
print(df['fault_status'].value_counts())
print(f"\n📊 Fault Status Percentages:")
print(df['fault_status'].value_counts(normalize=True) * 100)

# Display first few rows
print("\n🔍 First 5 rows of the dataset:")
df.head()

In [None]:
# Data types and missing values
print("📋 Data Info:")
print(df.info())
print("\n🔍 Missing Values:")
print(df.isnull().sum())
print("\n📊 Statistical Summary:")
df.describe()