# Production Promotion — Fraud Detection Model

This notebook demonstrates the complete workflow for evaluating a trained model,
comparing it against a production baseline, and promoting the winning configuration
to the production pipeline.

**Workflow steps:**
1. Train an XGBoost model on the fraud detection dataset
2. Evaluate the model with standard metrics and visualizations
3. Compare against the production baseline
4. Check production quality thresholds
5. Validate and write hyperparameters to Parameter Store
6. Generate and write a production configuration file to S3
7. Trigger the production pipeline for retraining
8. Deploy a challenger endpoint for A/B testing

**Requirements covered:** 6.1–6.5 (Model Evaluation), 8.1–8.3 (Parameter Store),
9.1–9.3 (Configuration Files), 10.1 (Pipeline Trigger), 11.1 (A/B Testing)

## 1. Setup and Imports

In [None]:
import sys
import io
import json

import boto3
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from xgboost import XGBClassifier

# Add project src to path
sys.path.insert(0, '../src')
from model_evaluation import ModelEvaluator
from production_integration import ProductionIntegrator
from ab_testing import ABTestingManager
from experiment_tracking import ExperimentTracker

sns.set_theme(style='whitegrid')
%matplotlib inline

print('All modules imported successfully.')

## 2. Load Data from S3 and Train a Model

Load the processed fraud detection dataset from the `fraud-detection-data` bucket
and train an XGBoost model with the hyperparameters we want to promote.

In [None]:
BUCKET_NAME = 'fraud-detection-data'
DATA_PREFIX = 'processed'

s3_client = boto3.client('s3')


def load_parquet_from_s3(bucket: str, key: str) -> pd.DataFrame:
    """Load a Parquet file from S3 into a pandas DataFrame."""
    response = s3_client.get_object(Bucket=bucket, Key=key)
    return pd.read_parquet(io.BytesIO(response['Body'].read()))


train_df = load_parquet_from_s3(BUCKET_NAME, f'{DATA_PREFIX}/train.parquet')
test_df = load_parquet_from_s3(BUCKET_NAME, f'{DATA_PREFIX}/test.parquet')

TARGET = 'Class'
FEATURES = [c for c in train_df.columns if c != TARGET]

X_train = train_df[FEATURES]
y_train = train_df[TARGET]
X_test = test_df[FEATURES]
y_test = test_df[TARGET]

print(f'Training set:  {X_train.shape[0]:,} rows, {X_train.shape[1]} features')
print(f'Test set:      {X_test.shape[0]:,} rows, {X_test.shape[1]} features')

In [None]:
# Hyperparameters to promote
hyperparameters = {
    'objective': 'binary:logistic',
    'num_round': 150,
    'max_depth': 7,
    'eta': 0.15,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
}

# Train XGBoost model
model = XGBClassifier(
    objective=hyperparameters['objective'],
    n_estimators=hyperparameters['num_round'],
    max_depth=hyperparameters['max_depth'],
    learning_rate=hyperparameters['eta'],
    subsample=hyperparameters['subsample'],
    colsample_bytree=hyperparameters['colsample_bytree'],
    use_label_encoder=False,
    eval_metric='logloss',
)

model.fit(X_train, y_train)
print('XGBoost model trained successfully.')

In [None]:
# Generate predictions for evaluation
y_pred = model.predict(X_test)
y_pred_proba = model.predict_proba(X_test)[:, 1]

print(f'Predictions generated for {len(y_pred):,} test samples.')

## 3. Model Evaluation

Use the `ModelEvaluator` to calculate standard classification metrics and generate
diagnostic visualizations.

**Requirement 6.1**: Calculate accuracy, precision, recall, F1 score, and AUC-ROC  
**Requirement 6.2**: Generate confusion matrices  
**Requirement 6.3**: Generate ROC curves and precision-recall curves

In [None]:
evaluator = ModelEvaluator()

# Calculate all metrics (Req 6.1)
metrics = evaluator.calculate_metrics(y_test, y_pred, y_pred_proba)

print('Model Metrics:')
for name, value in metrics.items():
    print(f'  {name:15s}: {value:.4f}')

In [None]:
# Confusion matrix (Req 6.2)
cm = evaluator.plot_confusion_matrix(y_test, y_pred, save_path='confusion_matrix.png')

print('Confusion Matrix:')
print(cm)

In [None]:
# ROC curve (Req 6.3)
fpr, tpr, auc_score = evaluator.plot_roc_curve(y_test, y_pred_proba, save_path='roc_curve.png')

print(f'AUC-ROC: {auc_score:.4f}')

In [None]:
# Precision-recall curve (Req 6.3)
precision_vals, recall_vals = evaluator.plot_precision_recall_curve(
    y_test, y_pred_proba, save_path='pr_curve.png'
)

print(f'Precision-Recall curve saved to pr_curve.png')

## 4. Baseline Comparison

Compare the current model against the production baseline to quantify improvement.

**Requirement 6.4**: Compare experiment results against baseline metrics from production

In [None]:
# Production baseline metrics (from the current deployed model)
baseline_metrics = {
    'accuracy': 0.952,
    'precision': 0.89,
    'recall': 0.85,
    'f1_score': 0.87,
    'auc_roc': 0.94,
}

comparison = evaluator.compare_to_baseline(metrics, baseline_metrics)

print('Comparison to Production Baseline:')
print(f'{"Metric":15s} {"Current":>10s} {"Baseline":>10s} {"Diff":>10s} {"Change":>10s} {"Improved":>10s}')
print('-' * 70)
for metric_name, comp in comparison.items():
    print(
        f'{metric_name:15s} '
        f'{comp["current"]:10.4f} '
        f'{comp["baseline"]:10.4f} '
        f'{comp["difference"]:+10.4f} '
        f'{comp["percent_change"]:+9.2f}% '
        f'{"✓" if comp["improved"] else "✗":>10s}'
    )

## 5. Production Threshold Check

Verify that the model meets the minimum production quality threshold (accuracy >= 0.90).

**Requirement 6.5**: Mark models meeting accuracy >= 0.90 as production-quality

In [None]:
# Full model evaluation with threshold check
eval_results = evaluator.evaluate_model(
    model, X_test, y_test, baseline_metrics=baseline_metrics
)

meets_threshold = eval_results['meets_production_threshold']
accuracy = eval_results['metrics']['accuracy']

if meets_threshold:
    print(f'✓ Model MEETS production threshold (accuracy={accuracy:.4f} >= 0.90)')
    print('  Proceeding with production promotion.')
else:
    print(f'✗ Model DOES NOT meet production threshold (accuracy={accuracy:.4f} < 0.90)')
    print('  Consider further tuning before promoting.')

## 6. Hyperparameter Validation

Before writing to Parameter Store, validate that all hyperparameters have correct
names and values within acceptable ranges.

**Requirement 8.2**: Validate parameter names and value formats before writing

In [None]:
# Initialize production integrator with experiment tracker
tracker = ExperimentTracker(region_name='us-east-1')
integrator = ProductionIntegrator(experiment_tracker=tracker)

# Validate hyperparameters (Req 8.2)
try:
    integrator.validate_hyperparameters(hyperparameters)
    print('✓ Hyperparameters validated successfully.')
    print(f'  Parameters: {list(hyperparameters.keys())}')
except ValueError as e:
    print(f'✗ Validation failed: {e}')

In [None]:
# Demonstrate validation catching invalid parameters
invalid_params = {
    'objective': 'binary:logistic',
    'num_round': 150,
    'max_depth': 25,  # Out of range (max 20)
    'eta': 0.15,
    'subsample': 0.8,
    'colsample_bytree': 0.8,
}

try:
    integrator.validate_hyperparameters(invalid_params)
    print('Validation passed (unexpected).')
except ValueError as e:
    print(f'✓ Validation correctly caught error: {e}')

## 7. Parameter Store Update

Write the validated hyperparameters to AWS Systems Manager Parameter Store.
A backup of the current values is created automatically before overwriting.

**Requirement 8.1**: Write hyperparameters to Parameter Store paths matching production pipeline  
**Requirement 8.3**: Create a backup of previous values with timestamp

In [None]:
# Write hyperparameters to Parameter Store (Req 8.1, 8.3)
backup_key = integrator.write_hyperparameters_to_parameter_store(hyperparameters)

print(f'\nBackup saved to: {backup_key}')
print('\nParameter Store paths updated:')
for param_name in hyperparameters:
    print(f'  /fraud-detection/hyperparameters/{param_name} = {hyperparameters[param_name]}')

## 8. Configuration File Generation

Generate a production configuration file in YAML format and write it to S3.
The config includes the algorithm, hyperparameters, performance metrics, test date,
and approver name.

**Requirement 9.1**: Generate production configuration files in YAML format  
**Requirement 9.2**: Include algorithm, hyperparameters, metrics, test date, approver  
**Requirement 9.3**: Write config to `s3://fraud-detection-config/production-model-config.yaml`

In [None]:
EXPERIMENT_ID = 'exp-xgboost-optimized-20240115'
APPROVER = 'data-science-team'

# Generate production config (Req 9.1, 9.2)
config = integrator.generate_production_config(
    experiment_id=EXPERIMENT_ID,
    hyperparameters=hyperparameters,
    metrics=metrics,
    approver=APPROVER,
)

print('Generated production config:')
print(json.dumps(config, indent=2, default=str))

In [None]:
# Validate config schema before writing
integrator.validate_config_schema(config)
print('✓ Configuration schema validated.')

In [None]:
# Write config to S3 (Req 9.3)
integrator.write_config_to_s3(config)

print('\nConfig written to: s3://fraud-detection-config/production-model-config.yaml')
print('Previous config archived with timestamp.')

## 9. Pipeline Trigger

Trigger the production pipeline (Step Functions) to retrain the model with the
newly promoted hyperparameters.

**Requirement 10.1**: Trigger the production pipeline Step Functions execution

In [None]:
# Trigger production pipeline retraining (Req 10.1)
execution_arn = integrator.trigger_production_pipeline(EXPERIMENT_ID)

print(f'Pipeline execution ARN: {execution_arn}')

In [None]:
# Check pipeline status
status = integrator.check_pipeline_status(execution_arn)

print('Pipeline Status:')
for key, value in status.items():
    print(f'  {key}: {value}')

## 10. Full Promotion Workflow (One-Liner)

The `promote_to_production` method orchestrates the entire promotion workflow in a
single call: Parameter Store update, config file generation, S3 write, and optional
pipeline trigger.

In [None]:
# Complete promotion workflow
result = integrator.promote_to_production(
    experiment_id=EXPERIMENT_ID,
    hyperparameters=hyperparameters,
    metrics=metrics,
    approver=APPROVER,
    trigger_pipeline=True,
)

print('\nPromotion Result:')
print(f'  Experiment ID:  {result["promotion_event"]["experiment_id"]}')
print(f'  Timestamp:      {result["promotion_event"]["timestamp"]}')
print(f'  Approver:       {result["promotion_event"]["approver"]}')
print(f'  Backup Key:     {result["promotion_event"]["backup_key"]}')
print(f'  Execution ARN:  {result["execution_arn"]}')

## 11. A/B Testing Workflow

Deploy a challenger model endpoint alongside the production champion and compare
their performance before fully switching over.

**Requirement 11.1**: Deploy a challenger model endpoint alongside the production champion

In [None]:
ab_manager = ABTestingManager()

# Deploy challenger endpoint (Req 11.1)
MODEL_DATA_URL = 's3://fraud-detection-models/xgboost/model.tar.gz'

challenger_endpoint = ab_manager.deploy_challenger_endpoint(
    model_data_url=MODEL_DATA_URL,
    experiment_id=EXPERIMENT_ID,
    instance_type='ml.m5.xlarge',
)

print(f'Challenger endpoint deployed: {challenger_endpoint}')

In [None]:
# Generate traffic split configuration for gradual rollout
CHAMPION_ENDPOINT = 'fraud-detection-production'

traffic_config = ab_manager.generate_traffic_split_config(
    champion_endpoint=CHAMPION_ENDPOINT,
    challenger_endpoint=challenger_endpoint,
)

print('Traffic Split Configuration:')
print(json.dumps(traffic_config, indent=2, default=str))

print('\nRollout Plan:')
for stage in traffic_config.get('rollout_plan', []):
    print(f'  Stage {stage["stage"]}: {stage["challenger_traffic"]}% challenger traffic '
          f'for {stage["duration_hours"]}h')

In [None]:
# Compare champion and challenger endpoints with test data
test_records = X_test.head(100).to_dict(orient='records')

comparison_result = ab_manager.compare_endpoints(
    champion_endpoint=CHAMPION_ENDPOINT,
    challenger_endpoint=challenger_endpoint,
    test_data=test_records,
)

print('Endpoint Comparison:')
print(json.dumps(comparison_result, indent=2, default=str))

In [None]:
# Promote challenger to champion (when A/B test results are positive)
# Uncomment the line below to execute the promotion:
# ab_manager.promote_challenger_to_champion(
#     champion_endpoint=CHAMPION_ENDPOINT,
#     challenger_endpoint=challenger_endpoint,
# )

print('To promote the challenger to champion, uncomment and run the cell above.')
print('This will update the production endpoint and clean up the challenger.')

## 12. Summary and Next Steps

### What We Accomplished

1. **Trained** an XGBoost model with optimized hyperparameters
2. **Evaluated** the model using accuracy, precision, recall, F1, and AUC-ROC
3. **Visualized** results with confusion matrix, ROC curve, and precision-recall curve
4. **Compared** against the production baseline and confirmed improvement
5. **Validated** hyperparameters before promotion
6. **Updated** Parameter Store with new hyperparameters (with automatic backup)
7. **Generated** a production configuration file and wrote it to S3
8. **Triggered** the production pipeline for retraining
9. **Deployed** a challenger endpoint for A/B testing

### Next Steps

- **Monitor the A/B test** — track champion vs challenger metrics over the rollout
  stages (1% → 10% → 50% → 100%).
- **Promote the challenger** — once the challenger consistently outperforms the
  champion, call `promote_challenger_to_champion()` to complete the switch.
- **Iterate** — use the other notebooks to explore new features, algorithms, or
  hyperparameter configurations and repeat this promotion workflow.
- **Rollback if needed** — use `integrator.rollback_parameter_store(backup_key)` to
  restore previous Parameter Store values if issues arise.