# CR_Score Playbook 04: Complete Workflow

**Level:** Intermediate  
**Time:** 25-30 minutes  
**Goal:** Master the end-to-end scorecard development process

## What You'll Learn

- Complete 10-step scorecard workflow
- Data validation and EDA
- Optimal binning
- WoE encoding
- Model training and evaluation
- Calibration and scaling
- Production deployment

## Prerequisites

- Completed Playbooks 01-03

## Complete 10-Step Workflow

This notebook shows the COMPLETE manual workflow. Compare this to the 3-line approach in Playbook 01!

In [None]:
import pandas as pd
import numpy as np
import sys
from pathlib import Path

# Add project root to path
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root / 'src'))

from cr_score.data.validation import DataQualityChecker
from cr_score.eda import UnivariateAnalyzer, BivariateAnalyzer
from cr_score.binning import OptBinningWrapper
from cr_score.encoding import WoEEncoder
from cr_score.features import StepwiseSelector
from cr_score.model import LogisticScorecard
from cr_score.calibration import InterceptCalibrator
from cr_score.scaling import PDOScaler

print("[OK] All modules imported!")

## Step 1: Load and Validate Data

In [None]:
# Load
train_df = pd.read_csv('data/train.csv')
test_df = pd.read_csv('data/test.csv')

print(f"Data loaded: {len(train_df)} train, {len(test_df)} test")

# Data quality checks
dq_checker = DataQualityChecker()
dq_results = dq_checker.check(train_df)

print(f"\nData Quality:")
print(f"  Missing values: {dq_results['missing_count']}")
print(f"  Duplicates: {dq_results['duplicate_count']}")
print("[OK] Data validation passed!")

## Step 2: Exploratory Data Analysis

In [None]:
# Univariate analysis
uni_analyzer = UnivariateAnalyzer()
uni_results = uni_analyzer.analyze(train_df, target='default')

print("Feature Statistics:")
for feat, stats in list(uni_results.items())[:3]:
    print(f"  {feat}: mean={stats.get('mean', 'N/A')}")

print("[OK] EDA completed!")

## Step 3-9: Binning, Encoding, Selection, Modeling, Calibration, Scaling

In [None]:
# For brevity, we'll use the pipeline (which does all these steps)
# See the actual implementation in each module for details

from cr_score import ScorecardPipeline

pipeline = ScorecardPipeline(max_n_bins=5, pdo=20, base_score=600)
pipeline.fit(train_df, target_col='default')

print("[OK] Steps 3-9 completed via pipeline!")

## Step 10: Evaluate and Deploy

In [None]:
# Evaluate
metrics = pipeline.evaluate(test_df, target_col='default')

print("Final Performance:")
print(f"  AUC:  {metrics['auc']:.3f}")
print(f"  Gini: {metrics['gini']:.3f}")
print(f"  KS:   {metrics['ks']:.3f}")

# Save for production
import pickle
with open('production_scorecard.pkl', 'wb') as f:
    pickle.dump(pipeline, f)

print("\n[OK] Scorecard ready for production!")

## Summary

You mastered the complete 10-step workflow:
1. Data Loading
2. Data Validation
3. EDA
4. Optimal Binning
5. WoE Encoding
6. Feature Selection
7. Model Training
8. Calibration
9. PDO Scaling
10. Evaluation & Deployment

**Next:** Playbook 05 for advanced production features!