# Industrial Manufacturing KPI Prediction

This notebook demonstrates a complete predictive analytics workflow for an industrial Quality KPI.

**Steps:**
- Load synthetic manufacturing data (production volume, downtime, scrap rate, line speed, cycle time, operator load)
- Explore correlations
- Train a Random Forest regression model
- Evaluate model performance (MAE, R²)
- Inspect feature importance
- Visualize actual vs predicted Quality_KPI


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, r2_score

# Display all columns
pd.set_option('display.max_columns', None)


In [None]:
# Load data (replace the path with your own if needed)
df = pd.read_csv('industrial_kpi_data.csv')
df.head()


In [None]:
# Basic info
df.describe()


In [None]:
# Correlation heatmap
corr = df.corr(numeric_only=True)
fig, ax = plt.subplots(figsize=(6, 4))
im = ax.imshow(corr.values)
ax.set_xticks(range(len(corr.columns)))
ax.set_yticks(range(len(corr.columns)))
ax.set_xticklabels(corr.columns, rotation=45, ha='right')
ax.set_yticklabels(corr.columns)
fig.colorbar(im, ax=ax)
ax.set_title('Correlation Heatmap')
plt.tight_layout()
plt.show()


In [None]:
# Train-test split
X = df.drop('Quality_KPI', axis=1)
y = df['Quality_KPI']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

model = RandomForestRegressor(
    n_estimators=200,
    max_depth=8,
    random_state=42
)
model.fit(X_train, y_train)
pred = model.predict(X_test)

mae = mean_absolute_error(y_test, pred)
r2 = r2_score(y_test, pred)
print('MAE:', round(mae, 3))
print('R²:', round(r2, 3))


In [None]:
# Feature importance
fi = pd.Series(model.feature_importances_, index=X.columns)
fig, ax = plt.subplots(figsize=(6, 4))
fi.sort_values().plot(kind='barh', ax=ax)
ax.set_title('Feature Importance')
ax.set_xlabel('Importance')
plt.tight_layout()
plt.show()


In [None]:
# Actual vs Predicted
fig, ax = plt.subplots(figsize=(6, 4))
ax.scatter(y_test, pred)
ax.set_xlabel('Actual Quality_KPI')
ax.set_ylabel('Predicted Quality_KPI')
ax.set_title('Actual vs Predicted Quality_KPI')
plt.tight_layout()
plt.show()


### Notes
- The dataset is synthetic and built to mimic real manufacturing behavior.
- You can extend this notebook by adding:
  - Hyperparameter tuning (GridSearchCV / RandomizedSearchCV)
  - Additional KPIs
  - Model comparison (Random Forest vs XGBoost vs Linear Regression)
