# Predictive Maintenance for Smart Factory Equipment

**Copyright (c) 2024 Shrikara Kaudambady. All rights reserved.**

This notebook demonstrates how to build a predictive maintenance model using machine learning. The goal is to predict whether a piece of equipment will fail within a specific future time window, enabling proactive maintenance and reducing unplanned downtime in a smart factory setting.

### 1. Installing and Importing Libraries

In [None]:
!pip install pandas scikit-learn matplotlib seaborn

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
import matplotlib.pyplot as plt
import seaborn as sns

print("Libraries imported successfully.")

### 2. Data Generation

For this demonstration, we'll generate a synthetic dataset that mimics sensor readings from multiple machines over time. This data is inspired by the famous NASA Turbofan Engine Degradation dataset. 

Each machine operates for a variable number of cycles before it fails. The sensor readings will show a degradation trend as a machine approaches failure.

In [None]:
def generate_synthetic_data(n_machines=100, max_cycles=250, n_sensors=5):
    data = []
    np.random.seed(42)
    for machine_id in range(1, n_machines + 1):
        # Each machine has a random lifespan
        lifespan = np.random.randint(120, max_cycles)
        # Generate sensor data for the lifespan of the machine
        for cycle in range(1, lifespan + 1):
            row = {'machine_id': machine_id, 'cycle': cycle}
            # Generate sensor readings
            for sensor in range(1, n_sensors + 1):
                # Add a degradation trend and noise
                base_reading = np.random.normal(100, 10)
                degradation = (cycle / lifespan) * np.random.uniform(10, 20) * (sensor * 0.5)
                noise = np.random.normal(0, 2)
                row[f'sensor_{sensor}'] = base_reading + degradation + noise
            data.append(row)
    return pd.DataFrame(data)

df = generate_synthetic_data()
print("Synthetic dataset created. Shape:", df.shape)
df.head()

### 3. Feature Engineering and Labeling

We need to calculate the Remaining Useful Life (RUL) for each machine. Then, we'll frame this as a classification problem: will the machine fail within the next `N` cycles? 

- `RUL`: For each row, RUL is the number of cycles remaining until that machine's final failure.
- `label`: A binary label. `1` if RUL <= `N` (e.g., 30 cycles), `0` otherwise.

In [None]:
# Calculate RUL
max_cycles = df.groupby('machine_id')['cycle'].max().reset_index()
max_cycles.columns = ['machine_id', 'max_cycle']
df = pd.merge(df, max_cycles, on='machine_id')
df['RUL'] = df['max_cycle'] - df['cycle']

# Create binary label
warning_window = 30
df['label'] = (df['RUL'] <= warning_window).astype(int)

# Drop helper columns
df = df.drop(columns=['max_cycle', 'RUL'])

print("Labels created. Value counts:")
print(df['label'].value_counts())

### 4. Exploratory Data Analysis (EDA)

Let's visualize the sensor data for a single machine to see the degradation trend.

In [None]:
machine_to_plot = df[df['machine_id'] == 1]
plt.figure(figsize=(16, 6))
for i in range(1, 6):
    plt.plot(machine_to_plot['cycle'], machine_to_plot[f'sensor_{i}'], label=f'Sensor {i}')

plt.title('Sensor Readings for Machine 1 Over Time')
plt.xlabel('Cycle')
plt.ylabel('Sensor Value')
plt.legend()
plt.grid(True)
plt.show()

### 5. Data Preparation for Modeling

- Define features (`X`) and target (`y`).
- Split data into training and testing sets.
- Scale the features using `StandardScaler`.

In [None]:
features = [col for col in df.columns if 'sensor' in col]
X = df[features]
y = df['label']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print(f"Training data shape: {X_train_scaled.shape}")
print(f"Test data shape: {X_test_scaled.shape}")

### 6. Training the Classification Model

We'll use a Random Forest Classifier, which is a powerful and commonly used model for this type of problem.

In [None]:
model = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)
model.fit(X_train_scaled, y_train)

print("Random Forest model trained successfully.")

### 7. Evaluating Model Performance

We will evaluate the model on the unseen test data using a confusion matrix and a classification report.

In [None]:
y_pred = model.predict(X_test_scaled)

print("Model Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Plotting the confusion matrix
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Will Not Fail', 'Will Fail'], yticklabels=['Will Not Fail', 'Will Fail'])
plt.title('Confusion Matrix')
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.show()

### 8. Feature Importance

Let's see which sensor features were most important for the model's predictions.

In [None]:
feature_importances = pd.Series(model.feature_importances_, index=features).sort_values(ascending=False)

plt.figure(figsize=(10, 6))
sns.barplot(x=feature_importances, y=feature_importances.index)
plt.title('Feature Importances')
plt.xlabel('Importance Score')
plt.ylabel('Features')
plt.show()

### 9. Conclusion

This notebook demonstrates a complete workflow for building a predictive maintenance model. The Random Forest classifier performed well, showing high precision and recall for the 'failure imminent' class (label 1).

This model can be deployed in a smart factory to monitor equipment in real-time. By analyzing live sensor data, it can raise an alert when a machine is predicted to fail within the next 30 cycles, allowing the maintenance team to act proactively. This minimizes costly unplanned downtime and optimizes maintenance scheduling.

**Copyright (c) 2024 Shrikara Kaudambady. All rights reserved.**