## Deploying the Attrition Model: A Production Plan (Managed Stack)

This document outlines a complete, production-grade deployment plan for the XGBoost attrition model. The objective is to turn the model into a reliable, scalable, and secure business tool that delivers real-time insights using the fully managed AWS SageMaker stack.

---

### 1. Deployment Strategy & Objectives

We will deploy the model as a **real-time REST API**. This approach supports on-demand use cases, such as integrating predictions directly into an HR portal to provide managers with immediate employee attrition risk scores.

**Key Performance Indicators (KPIs) & Targets:**
* **Latency Target**: p95 latency of ≤ 200ms for end-to-end predictions.
* **Traffic Estimate**: Approximately 500 prediction requests per day.
* **Cost Envelope**: Target of ≈ $80/month, covering compute, monitoring, and storage.
* **Update Cadence**: Features will be refreshed daily, with a full model retrain scheduled monthly.
* **Fallback Plan**: If the real-time API is unavailable for more than 5 minutes, the system will fall back to serving nightly cached batch scores.

### 2. Technology Stack & Architecture (AWS SageMaker)

The architecture will be built entirely on the AWS SageMaker platform, leveraging its managed services to create a secure, scalable, and automated MLOps workflow.

| Component | AWS SageMaker Service | Purpose |
|---|---|---|
| **Model Hosting** | **SageMaker Real-Time Endpoint** | Deploys the trained model as a fully managed, auto-scaling HTTPS endpoint for real-time predictions.  |
| **MLOps Pipeline** | **SageMaker Pipelines** | Orchestrates the entire end-to-end workflow: data prep, training, evaluation, model registration, and deployment. |
| **Model Registry** | **SageMaker Model Registry** | A central, secure repository to version, catalog, and manage the approval status of trained models before deployment. |
| **Feature Management**| **SageMaker Feature Store** | Provides a managed repository for features, ensuring consistency between training and real-time inference. |
| **Drift Detection** | **SageMaker Model Monitor** | Automatically monitors the live endpoint for data and model quality drift, triggering alerts when deviations are detected. |
| **Bias & Explainability**| **SageMaker Clarify** | Measures for potential bias in training data and post-training, and provides explanations for model predictions. |
| **Secret Management** | **AWS Secrets Manager** | Securely stores and manages access to secrets like database credentials and API keys for other services. |

#### **High-Level Architecture Diagram**
![SageMaker Deployment Diagram](../resources/images/sagemaker_deployment_architecture_v2.png)

**Figure 1.** High-Level Deployment Architecture for the HR Attrition Model.

**Inference Path:** Client requests are routed through Amazon API Gateway to a SageMaker Endpoint. The endpoint retrieves real-time data from the SageMaker Feature Store to generate a prediction.

**Training Pipeline:** A SageMaker Pipeline automates the MLOps lifecycle, using data from S3 to conduct training jobs. The resulting model artifacts are versioned in the Model Registry before being deployed to the endpoint.

**Monitoring:** The SageMaker Endpoint sends logs to Amazon CloudWatch. SageMaker Model Monitor analyzes these logs to detect data and model drift over time.

### 3. API Endpoint Design

The model will be exposed via a managed **SageMaker Real-Time Endpoint**. SageMaker handles the underlying infrastructure, container hosting, and API creation, providing a secure and scalable HTTPS endpoint.

* **Endpoint:** A unique HTTPS URL provided by SageMaker.
* **Method:** `POST`
* **Authentication:** Handled by standard AWS IAM roles and policies, which can be integrated with an Amazon API Gateway for external access control.

#### **API Interaction**
Clients will interact with the SageMaker Runtime using the AWS SDK or by making an HTTP request to the endpoint URL. The request body is typically a CSV or JSON payload formatted as the model expects.

**Example Request Body (JSON):**
```json
{
  "Age": 35,
  "JobRole": "Sales Executive",
  "MonthlyIncome": 5500,
  "YearsAtCompany": 5,
  "OverTime": "Yes"
}
```

### 4. Monitoring, Maintenance & Retraining

Continuous monitoring and automated retraining are managed natively within the AWS SageMaker ecosystem.

* **Service Health**: Endpoint metrics like **Latency, Invocations, and Error Rates** are automatically published to **Amazon CloudWatch**, with alarms configured for performance degradation.
* **Model & Data Drift**: **SageMaker Model Monitor** is scheduled to run against the live endpoint. It compares the live traffic against the training baseline and triggers CloudWatch Events if drift is detected.
* **Fairness & Bias**: **SageMaker Clarify** is used within the pipeline to generate pre-training bias reports and post-training fairness metrics. A new model version will be rejected if fairness metrics (e.g., recall difference across demographics) exceed a predefined threshold.

#### **Automated Retraining with SageMaker Pipelines**
The entire MLOps workflow is defined and automated as a **SageMaker Pipeline**, which is triggered on a schedule (e.g., monthly) or when significant model drift is detected.

1.  **Data Processing**: A processing step pulls the latest data and prepares features using the SageMaker Feature Store.
2.  **Train Model**: A training step launches a SageMaker Training Job to train the candidate model.
3.  **Evaluate Model**: An evaluation step compares the candidate model's performance (e.g., recall, precision) against a baseline on a hold-out dataset.
4.  **Register Model**: If the new model meets the performance criteria, it is versioned and registered in the **SageMaker Model Registry** with a "Pending Approval" status.
5.  **Deploy**: Upon manual approval (or an automated rule), a final step in the pipeline deploys the approved model version to the production SageMaker Endpoint, automatically updating it with zero downtime.

### 5. Risk, Governance & Compliance

Managing risk is critical for any HR-related AI system.

* **Security**: All traffic is encrypted with TLS. Secrets and credentials are managed natively and securely in **AWS Secrets Manager**. Role-based access control (RBAC) is enforced with IAM policies.
* **Privacy & Data Retention**: Input features are logged for audit but purged after 180 days to respect privacy. The system includes a flag to honor employee opt-out requests.
* **Ethical Use**: The model serves as a decision-support tool. It is not fully automated. A human manager is always in the loop and must record a reason for any action taken.
* **Regulatory Compliance**: The system and its data handling procedures are designed for compliance with GDPR, India's PDP, and EEOC AI guidance, with checks performed quarterly.

### 6. Conclusion and Next Steps

The deployment design meets all latency, cost, and governance requirements while leveraging a modern, fully managed MLOps stack on AWS. It provides a clear path to a production-ready system with reduced operational overhead.

The next milestone is to implement the **Infrastructure as Code (IaC)** using the **AWS CDK** or **Terraform** to define the SageMaker resources. Following this, a pilot program will be launched with the Sales and R&D teams in **Q3 2025**.