# Synthetic Warranty Claim Pipeline (Demonstration Version)

This notebook presents a **clean, modular, synthetic demonstration** of a warranty-claim
processing pipeline â€” inspired by a real-world automotive DS/AI workflow.

All data shown here is **synthetic**, but the overall pipeline structure mirrors a genuine
industrial process:

1. Load synthetic warranty claims  
2. Clean & normalize fields  
3. Feature engineering  
4. Rule-based classification  
5. Export results  

The goal of this notebook is to showcase:
- Data engineering thinking  
- Pipeline modularization  
- Clear documentation  
- Visual explanations  
- Predictive rule logic  

This notebook engineering was used as an end-to-end pipeline for a Quality Management Engineering team, coupled with a sequential Power BI analysis and report


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from pipeline.load_data import load_synthetic_data
from pipeline.clean_normalize import clean_data
from pipeline.feature_engineering import build_features
from pipeline.classifier import classify
from pipeline.exporter import export_results

sns.set(style="whitegrid")


In [None]:
df_raw = load_synthetic_data(n=300)
df_raw.head()


In [None]:
plt.figure(figsize=(10,5))
sns.histplot(df_raw["mileage_km"], kde=True, bins=30)
plt.title("Distribution of Synthetic Mileage (km)")
plt.show()

plt.figure(figsize=(10,5))
sns.countplot(data=df_raw, x="component")
plt.title("Component Distribution")
plt.show()


In [None]:
df_clean = clean_data(df_raw)
df_clean.head()

In [None]:
df_feat = build_features(df_clean)
df_feat.head()

In [None]:
plt.figure(figsize=(10,5))
sns.histplot(df_feat["usage_intensity"], bins=30, kde=True)
plt.title("Usage Intensity Feature")
plt.show()

sns.boxplot(data=df_feat, x="is_high_usage", y="usage_intensity")
plt.title("High Usage vs. Usage Intensity")
plt.show()

In [None]:
df_pred = classify(df_feat)
df_pred.head()

In [None]:
plt.figure(figsize=(8,4))
sns.countplot(data=df_pred, x="prediction")
plt.title("Prediction Distribution (0 = Reject, 1 = Approve)")
plt.show()

In [None]:
export_path = export_results(df_pred)
export_path

## Pipeline Summary (Synthetic Version)

This notebook demonstrated a clean DS-oriented pipeline:

- Synthetic dataset generation  
- Basic cleaning  
- Feature engineering  
- Interpretable rule-based model  
- Visualization and analysis  
- Export step  

The structure mirrors a modern DS workflow with separation of concerns:

pipeline/
load_data.py
clean_normalize.py
feature_engineering.py
classifier.py
exporter.py

In [None]:
This synthetic demonstration intentionally avoids domain-specific logic, but the
architecture closely resembles a real industrial warranty-claim processing system.