# A Predictive Model to Strengthen Retention in Government Agencies: Sentiment Factors Driving Employee Exits

**Authors:** _Sophia Jensen, Duy Nguyen_  
_Applied Data Science Master’s Program <br>
Shiley Marcos School of Engineering / University of San Diego_ <br>
**Date:** August 11, 2025

---


## Abstract

Factors that lead employees to consider leaving the organization are a challenge for organizations,resulting in lost productivity, increased hiringcosts, and reduced team morale (De Winne et al.,2019). This project investigated factors leading tothe exit consideration of the federal workforce by analyzing survey data from the Federal Employee Viewpoint Survey (FEVS) between 2020 and 2024. Using a combination of data preprocessing, feature engineering, and machine learning techniques, predictive models were developed to classify whether an employee would consider leaving their agency. Key steps included: (a) handling class imbalance through Synthetic Minority Oversampling Technique (SMOTE), (b) encoding categorical features, and (c) transforming ordinal Likert-scale survey itemsinto binary predictors. Three classification algorithms—logistic regression, decision tree, and XGBoost—were evaluated using accuracy,
precision, recall, and F1-score. Although the XGBoost model achieved the highest overall accuracy (81%) and provided meaningful feature insights, its recall for the minority leave class was limited, which highlights a challenge of detecting at-risk employees. Feature importance analysis revealed that employee recognition, role clarity, and supervisor alignment are critical predictors of turnover intent. Notably, gender emerged as a top predictive feature, suggesting the need for future fairness analysis. This study provided a data-driven framework for human resources (HR) teams to proactively identify and support employees who are considering leaving the organization.


### Background and Problem Statement
Employee turnover is a costly, noisy, and complex challenge for organizations. Research shows that replacing managers or technical professionals can cost up to 200% of their annual salary, and for other roles, up to 80% (Tatel & Wigert, 2024). 
Beyond the financial burden, the scale of the problem is significant; according to Tatal and Wigert, over half of U.S. employees are actively seeking new job opportunities. This creates instability, disrupts team performance, and increases recruitment and training demands that could have been mitigated

An employee’s decision to leave is shaped by a wide range of experiences—how supported they feel in their work, the quality of their team relationships, their trust in their direct and senior leadership team, their day-to-day satisfaction, and whether they feel recognized and rewarded for their contributions.
This multidimensional nature of employee experience means that solving turnover requires more than just addressing pay or workload—it demands a deeper, data-driven understanding of the interconnected factors influencing retention
.
 

### Objectives
- Identify key factors influencing turnover intent.
- Build a predictive classification model for at-risk employee detection.
- Provide prescriptive insights to guide targeted retention strategies

### Data & Methodology
**Data Source:** Federal Employee Viewpoint Survey (FEVS) from the U.S. Office of Personnel Management, 2020–2024 (~2.77M responses). <br>
**Key Steps:** <br>
**Data Cleaning:** Removed incomplete target values; standardized columns across years; filled missing demographic values with mode. <br>
**Feature Engineering:** Recoded Likert-scale responses (1–3 = unfavorable, 4–5 = favorable); one-hot encoded categorical variables. <br>
**Class Imbalance Handling:** Applied SMOTE to training data. <br>
**Modeling:** Compared Logistic Regression, Decision Tree, and XGBoost classifiers. <br>
**Evaluation:** Measured accuracy, precision, recall, and F1-score.

### Results
- **Best Model:** XGBoost (Accuracy: 81%, F1-score: 77% for majority class)
- **Top Predictors:** Employee recognition (Q6), job expectation clarity (Q36), and alignment with agency goals (Q34), plus gender
- **Gender Analysis:** Male employees had a slightly higher attrition rate (20.04%) than female employees (18.29%), with statistically significant differences (p < 0.001)

### Key Insights
- Recognition, clarity, and alignment are strong drivers of retention
- Gender differences in attrition rates may warrant targeted HR interventions
- The model excels at detecting employees likely to stay but needs improvement in minority-class recall

### Recommendations
1. Implement recognition programs that highlight employee contributions
2. Enhance role clarity through improved communication from leadership
3. Align individual roles with organizational goals to boost engagement
4. Conduct fairness analysis to ensure predictive models do not amplify demographic biases
5. Explore advanced balancing techniques to improve minority-class recall

### Limitations
- Limited hyperparameter tuning due to computational constraints
- Lower recall for the leave class indicates potential bias toward majority class
- Sensitivity of HR data restricted some analyses

### Future Work
- Optimize XGBoost hyperparameters
- Incorporate additional demographic and engagement variables
- Deploy model in a live HR environment with continuous feedback loops

### Conclusion
A data-driven, predictive approach to employee retention enables proactive HR strategies, improving workforce stability and reducing costs. 
With refinement, such models can significantly enhance the federal workforce's operational resilience.