Skip to content

Classifiers to predict employee attrition with IBM HR dataset.

Notifications You must be signed in to change notification settings

min-tee/HR-Analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 

Repository files navigation

HR-Analytics

Define

Attrition is one of the most common issues at any organization. The amount of time, money and effort to train new employees can lead to great loss for the company. The loss cost includes on-boarding, advertising, hiring, training and lost productivity. Additionally, attrition causes doubt or distrust among current employees with the management. According to the Gallup, U.S business lose a trillion dollars every year due to employee turnover. It also argues that problem is fixable with right strategy and retention plan.

This project aims to provide in-depth analysis of factors that lead to employee turnover, create predictive model and suggest retention strategy using Kaggle IBM HR dataset. Please follow this link, Understanding and Predicting IBM Employee Attrition for details.

Discover

Tools Used : Google Colab

Packages : pandas, numpy, matplotlib, seaborn, sklearn

Data : 35 features with 1470 observations.

EDA

Feature Distributions feature_dist age Attrition is high among employees in late 20's and early 30's.

Attrition Counts

attrition

Data is highly imbalanced.

Correlation

correlation Instances of multicollinearity.

Develop

Models : Logistic Regression, Random Forest, KNN, SVM

Feature Engineering : One-hot Encoding

Model Performance Techniques: Feature Selection, Feature Scaling, Treat imbalanced dataset(SMOTE, Up sampling & Down sampling), Hyperparameter Tuning

Metrics : F1-Score, ROC Graph

First, developed baseline models without model improvement techniques.

Baseline AUC & F1-score

scores

Baseline ROC Graph

roc

Improved AUC & F1-score

improved

Improved ROC Graph

improved_roc

Random Forest Feature Importance

rf_feat

Deploy

Retention Strategy

Based on feature importance chart, we can say that that overtime, job level, stock option level, time with current manager, marital status and income also play a vital role in employee attrition. On the contrary, department, job role, education tend not to contribute for turnover. The company can focus on the factors contributing higher contributing attrition. However, there are other factors such as selection bias, type of employment(interns, contractors, part time or full time) that may need to be considered that are not necessarily captured by the model. Additionally, it is recommended that models are tuned at a certain frequency to include recent data and drop features of lower importance. Chi-Square test may be used to determine the dependence between attrition and other features.

About

Classifiers to predict employee attrition with IBM HR dataset.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published