# CR_Score Playbook 05: Advanced Topics

**Level:** Advanced  
**Time:** 30-40 minutes  
**Goal:** Master enterprise-grade production features

## What You'll Learn

- Config-driven development (YAML)
- Spark-based compression (optional)
- Reject inference
- Drift detection
- MCP tools for AI agents
- Artifact versioning
- Production patterns

## Prerequisites

- Completed Playbooks 01-04
- PySpark installed (optional): `pip install pyspark>=3.4.0`

## Check PySpark Availability

In [None]:
import sys
from pathlib import Path

# Check PySpark
try:
    import spark_helper
    spark = spark_helper.get_spark_session()
    if spark:
        print("[OK] PySpark is available - full features enabled!")
    else:
        print("[INFO] PySpark not available - some features limited")
except:
    print("[INFO] Running without PySpark - this is fine!")

# Add project root to path
project_root = Path.cwd().parent
sys.path.insert(0, str(project_root / 'src'))

## Topic 1: Config-Driven Development

In [None]:
import yaml
import pandas as pd

# Example configuration
config = {
    'project': {
        'name': 'credit_scorecard_v1',
        'description': 'Production credit scorecard'
    },
    'binning': {
        'method': 'optbinning',
        'max_n_bins': 5,
        'min_bin_size': 0.05
    },
    'model': {
        'type': 'logistic',
        'max_iter': 1000,
        'solver': 'lbfgs'
    },
    'scaling': {
        'pdo': 20,
        'base_score': 600,
        'base_odds': 50
    }
}

# Save config
with open('scorecard_config.yaml', 'w') as f:
    yaml.dump(config, f)

print("[OK] Config-driven workflow enabled!")
print("\nConfig:")
print(yaml.dump(config))

## Topic 2: Reject Inference

In [None]:
from cr_score.reject_inference import ParcelingMethod, ReweightingMethod

# Load data
train_df = pd.read_csv('data/train.csv')

# Simulate rejects (applications that were declined, so we don't know true outcome)
# In reality, you'd have actual reject data
rejects_df = train_df.sample(frac=0.2, random_state=42).copy()
rejects_df['default'] = -1  # Unknown

print(f"Accepted: {len(train_df)} applications")
print(f"Rejected: {len(rejects_df)} applications (unknown outcomes)")

# Apply parceling method
parceling = ParcelingMethod()
inferred_df = parceling.infer(
    accepted_df=train_df,
    rejected_df=rejects_df,
    features=['age', 'income', 'debt_to_income_ratio']
)

print(f"\n[OK] Reject inference completed!")
print(f"Inferred {len(inferred_df)} reject outcomes")

## Topic 3: Drift Detection

In [None]:
from cr_score.eda import DriftDetector

# Simulate production data (test set as "new" data)
baseline_df = pd.read_csv('data/train.csv')
production_df = pd.read_csv('data/test.csv')

# Detect drift
drift_detector = DriftDetector()
drift_results = drift_detector.detect_psi(
    baseline_df=baseline_df,
    production_df=production_df,
    features=['age', 'income', 'debt_to_income_ratio']
)

print("Population Stability Index (PSI):")
for feat, psi in drift_results.items():
    status = "STABLE" if psi < 0.1 else "WARNING" if psi < 0.25 else "ALERT"
    print(f"  {feat:25s}: PSI={psi:.4f} [{status}]")

print("\n[OK] Drift detection completed!")

## Topic 4: MCP Tools for AI Agents

In [None]:
from cr_score.tools import mcp_tools

# MCP tools enable AI agents to interact with scorecards
print("Available MCP Tools:")
print("  1. score_predict_tool - Score applications")
print("  2. model_evaluate_tool - Evaluate model performance")
print("  3. feature_select_tool - Run feature selection")
print("  4. binning_analyze_tool - Analyze binning results")

print("\n[OK] MCP tools ready for AI agent integration!")

## Topic 5: Artifact Versioning

In [None]:
from cr_score.core.registry import ArtifactIndex, RunRegistry
from cr_score.core.hashing import hash_content
import json

# Create artifact registry
artifact_index = ArtifactIndex(registry_path='./artifacts')

# Example: register a model artifact
model_artifact = {
    'artifact_id': 'model_v1',
    'artifact_type': 'model',
    'content_hash': hash_content({'model': 'logistic', 'version': 1}),
    'file_path': 'production_scorecard.pkl',
    'metadata': {
        'auc': 0.850,
        'created_at': '2026-01-16',
        'author': 'Your Name'
    }
}

artifact_index.register(model_artifact)

print("[OK] Artifact versioning enabled!")
print(f"\nArtifact registered:")
print(json.dumps(model_artifact, indent=2))

## Summary

You mastered advanced production topics:
- Config-driven development with YAML
- Reject inference for unseen data
- Drift detection with PSI/CSI
- MCP tools for AI agent integration
- Artifact versioning and lineage

**Congratulations!** You've completed all CR_Score playbooks!

### What's Next?

1. **Build your own scorecard** with real data
2. **Deploy to production** using these patterns
3. **Contribute** to the CR_Score project
4. **Share** your learnings with the community

**You're now a CR_Score expert!** ðŸŽ‰