HR-Analytics

Define

Attrition is one of the most common issues at any organization. The amount of time, money and effort to train new employees can lead to great loss for the company. The loss cost includes on-boarding, advertising, hiring, training and lost productivity. Additionally, attrition causes doubt or distrust among current employees with the management. According to the Gallup, U.S business lose a trillion dollars every year due to employee turnover. It also argues that problem is fixable with right strategy and retention plan.

This project aims to provide in-depth analysis of factors that lead to employee turnover, create predictive model and suggest retention strategy using Kaggle IBM HR dataset. Please follow this link, Understanding and Predicting IBM Employee Attrition for details.

Discover

Tools Used : Google Colab

Packages : pandas, numpy, matplotlib, seaborn, sklearn

Data : 35 features with 1470 observations.

EDA

Feature Distributions Attrition is high among employees in late 20's and early 30's.

Attrition Counts

Data is highly imbalanced.

Correlation

Instances of multicollinearity.

Develop

Models : Logistic Regression, Random Forest, KNN, SVM

Feature Engineering : One-hot Encoding

Model Performance Techniques: Feature Selection, Feature Scaling, Treat imbalanced dataset(SMOTE, Up sampling & Down sampling), Hyperparameter Tuning

Metrics : F1-Score, ROC Graph

First, developed baseline models without model improvement techniques.

Baseline AUC & F1-score

Baseline ROC Graph

Improved AUC & F1-score

Improved ROC Graph

Random Forest Feature Importance

Deploy

Retention Strategy

Based on feature importance chart, we can say that that overtime, job level, stock option level, time with current manager, marital status and income also play a vital role in employee attrition. On the contrary, department, job role, education tend not to contribute for turnover. The company can focus on the factors contributing higher contributing attrition. However, there are other factors such as selection bias, type of employment(interns, contractors, part time or full time) that may need to be considered that are not necessarily captured by the model. Additionally, it is recommended that models are tuned at a certain frequency to include recent data and drop features of lower importance. Chi-Square test may be used to determine the dependence between attrition and other features.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
Charts		Charts
HR_Analytics.ipynb		HR_Analytics.ipynb
README.md		README.md
WA_Fn-UseC_-HR-Employee-Attrition.csv		WA_Fn-UseC_-HR-Employee-Attrition.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HR-Analytics

Define

Discover

Develop

Deploy

About

Releases

Packages

Languages

min-tee/HR-Analytics

Folders and files

Latest commit

History

Repository files navigation

HR-Analytics

Define

Discover

Develop

Deploy

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages