## Deploying the Attrition Model: A Production Plan

This document outlines a complete, production-grade deployment plan for the XGBoost attrition model. The objective is to turn the model into a reliable, scalable, and secure business tool that delivers real-time insights.

### 1. Deployment Strategy & Objectives

We will deploy the model as a **real-time REST API**. This approach supports on-demand use cases, such as integrating predictions directly into an HR portal to provide managers with immediate employee attrition risk scores.

**Key Performance Indicators (KPIs) & Targets:**
* **Latency Target**: p95 latency of ≤ 200ms for end-to-end predictions.
* **Traffic Estimate**: Approximately 500 prediction requests per day.
* **Cost Envelope**: Target of ≈ $80/month, covering compute, monitoring, and storage.
* **Update Cadence**: Features will be refreshed daily, with a full model retrain scheduled monthly.
* **Fallback Plan**: If the real-time API is unavailable for more than 5 minutes, the system will fall back to serving nightly cached batch scores.

### 2. Technology Stack & Architecture

The architecture is designed for scalability, maintainability, and cost-efficiency using a modern MLOps stack.

| Component | Technology | Purpose |
|---|---|---|
| **Model Serving** | **FastAPI (Python)** | High-performance, asynchronous API for the `/predict` endpoint. |
| **Containerization** | **Docker** | Creates a portable, consistent, and lightweight (~120 MB) runtime environment. |
| **Orchestration** | **AWS EKS (Fargate Profile)** | Serverless, auto-scaling Kubernetes hosting that scales to zero to manage cost. |
| **Feature Store** | **Delta Lake on S3** | Ensures training-serving parity and enables point-in-time correct joins. |
| **CI/CD** | **Helm + ArgoCD** | Manages blue-green/canary deployments and automates syncing changes to production. |
| **Model Registry** | **MLflow Registry** | Versions models and their artifacts; manages promotion through Staging to Production. |
| **Monitoring** | **Prometheus & Grafana** | Scrapes and visualizes service health metrics like latency and error rates. |
| **Drift Detection** | **Evidently AI** | Monitors data and prediction distributions to detect drift and trigger alerts. |
| **Secret Management** | **HashiCorp Vault** | Securely stores and manages access to secrets like API keys and database credentials. |

#### **High-Level Architecture Diagram**

```
*******************************************************************************
* *
* PLACEHOLDER FOR ARCHITECTURE DIAGRAM                     *
* *
* Diagram should show:                                                      *
* 1. HR Portal (Client) -> API Gateway (Auth)                               *
* 2. API Gateway -> EKS Fargate Service (FastAPI Container)                 *
* 3. FastAPI app -> Feature Store (for live features)                        *
* 4. EKS -> Prometheus/Grafana (for metrics)                                *
* 5. EKS -> Evidently (for drift)                                           *
* 6. MLflow Registry and ArgoCD showing the CI/CD flow                      *
* *
*******************************************************************************
```

### 3. API Endpoint Design

The API will expose a `/predict` endpoint for generating attrition scores.

* **Endpoint:** `/predict`
* **Method:** `POST`
* **Authentication:** OAuth2 scopes managed via an API Gateway.

#### **API Skeleton (FastAPI)**
Below is a minimal code example for the API server.

In [None]:
# Illustrative code; not for execution in this notebook
from fastapi import FastAPI
from pydantic import BaseModel
import pandas as pd
import numpy as np
import joblib

app = FastAPI(title="Attrition Predictor", version="1.0")
# The model pipeline is loaded from a file path inside the container
# model = joblib.load("models/attrition_pipeline.joblib")

class Employee(BaseModel):
    Age: int
    JobRole: str
    MonthlyIncome: float
    YearsAtCompany: int
    OverTime: str

@app.post("/predict", summary="Predict employee attrition probability")
def predict(emp: Employee):
    """Accepts employee data and returns the attrition probability."""
    # In a real app, this would pull from a feature store
    X = pd.DataFrame([emp.dict()])
    # proba = float(model.predict_proba(X)[0, 1])
    proba = 0.81 # Dummy value for illustration
    return {
        "attrition_probability": np.round(proba, 4),
        "model_version": "1.0.0" # Version from MLflow
    }

### 4. Monitoring, Maintenance & Retraining

A multi-layered monitoring and retraining strategy ensures the model remains performant, fair, and reliable.

* **Service Health**: p95 latency and 5XX error rates are scraped by **Prometheus** and visualized in **Grafana**. Alerts are configured for performance degradation.
* **Model & Data Drift**: **Evidently AI** continuously compares live traffic against a training data baseline. A Population Stability Index (PSI) > 0.2 on key features triggers an alert to MLOps via Slack and PagerDuty.
* **Model Performance**: Ground truth (actual employee terminations) is joined with predictions to track recall. A drop of more than 5 percentage points automatically triggers the retraining pipeline.
* **Fairness & Bias**: A quarterly batch job computes the difference in recall across protected attributes (e.g., gender, age). An alert is sent if the gap exceeds 5 percentage points.

#### **Automated Retraining Pipeline (Airflow)**
A weekly Airflow DAG manages the model update process:
1.  **Extract & Update**: Pull the latest data from the HRIS to update the feature store snapshot.
2.  **Train & Evaluate**: Train a new candidate model and evaluate it against a hold-out dataset.
3.  **Log & Register**: Log all artifacts, metrics, and parameters to **MLflow**.
4.  **Promotion Gate**: Automatically compare the candidate to the production model. To be promoted, the candidate must show superior recall and precision and have a bias metric within a ±2 pp threshold.
5.  **Deploy**: If the gate is passed, the model's stage in MLflow is updated to "Production," and **ArgoCD** syncs the new Helm chart to the EKS cluster.

### 5. Risk, Governance & Compliance

Managing risk is critical for any HR-related AI system.

* **Security**: All traffic is encrypted with TLS. Secrets and credentials are managed externally in **HashiCorp Vault**. Role-based access control (RBAC) is enforced with OAuth2 scopes.
* **Privacy & Data Retention**: Input features are logged for audit but purged after 180 days to respect privacy. The system includes a flag to honor employee opt-out requests.
* **Ethical Use**: The model serves as a decision-support tool. It is not fully automated. A human manager is always in the loop and must record a reason for any action taken.
* **Regulatory Compliance**: The system and its data handling procedures are designed for compliance with GDPR, India's PDP, and EEOC AI guidance, with checks performed quarterly.

### 6. Conclusion and Next Steps

The deployment design meets all latency, cost, and governance requirements while leveraging modern DevOps and MLOps patterns. It provides a clear path to a production-ready system.

The next milestone is to implement the **Infrastructure as Code (IaC)** using Terraform and Helm. Following this, a pilot program will be launched with the Sales and R&D teams in **Q3 2025**.