# Paint Manufacturing Quality Analysis

## Mission
Analyze the paint production quality crisis using first principles and systems thinking to identify root causes of the drop from 99% to 67% pass rate after automation.

## Approach
1. **First Principles Decomposition**: Break down quality failures into fundamental components
2. **Systems Thinking**: Map relationships and interactions between components
3. **Business Impact**: Focus on actionable insights with quantified impact

## Key Questions to Answer
- What are the top 3 failure drivers?
- Which dosing stations need immediate attention?
- What's the optimal operating range for temperature?
- How does recipe complexity interact with other factors?
- If the plant manager could only fix ONE thing tomorrow, what should it be?

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from scipy import stats
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score, classification_report, confusion_matrix
from sklearn.preprocessing import StandardScaler
import warnings
warnings.filterwarnings('ignore')

# Set plotting style
plt.style.use('default')
sns.set_palette("husl")

# Configure pandas display
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

## Phase 1: Data Loading and Initial Validation

In [None]:
# Load the paint production data
df = pd.read_csv('../data/paint_production_data.csv')

print("Dataset Shape:", df.shape)
print("\nColumn Information:")
print(df.info())
print("\nFirst few rows:")
df.head()

## Phase 2: Data Understanding and Quality Assessment

In [None]:
# Basic data quality checks
print("Missing Values:")
print(df.isnull().sum())
print("\nDuplicate Rows:", df.duplicated().sum())
print("\nUnique Values per Column:")
for col in df.columns:
    print(f"{col}: {df[col].nunique()}")