# Can you help reduce employee turnover?

## Background
You work for the human capital department of a large corporation. The Board is worried about the relatively high turnover, and your team must look into ways to reduce the number of employees leaving the company.

The team needs to understand better the situation, which employees are more likely to leave, and why. Once it is clear what variables impact employee churn, you can present your findings along with your ideas on how to attack the problem.

## Challenge

Create a report that covers the following:
1. Which department has the highest employee turnover? Which one has the lowest?
2. Investigate which variables seem to be better predictors of employee departure.
3. What recommendations would you make regarding ways to reduce employee turnover?

In [16]:
import pandas as pd
import numpy as np
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline

In [2]:
df = pd.read_csv('employee_churn_data.csv')

In [3]:
df.head()

Unnamed: 0,department,promoted,review,projects,salary,tenure,satisfaction,bonus,avg_hrs_month,left
0,operations,0,0.577569,3,low,5.0,0.626759,0,180.86607,no
1,operations,0,0.7519,3,medium,6.0,0.443679,0,182.708149,no
2,support,0,0.722548,3,medium,6.0,0.446823,0,184.416084,no
3,logistics,0,0.675158,4,high,8.0,0.440139,0,188.707545,no
4,sales,0,0.676203,3,high,5.0,0.577607,1,179.821083,no


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9540 entries, 0 to 9539
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   department     9540 non-null   object 
 1   promoted       9540 non-null   int64  
 2   review         9540 non-null   float64
 3   projects       9540 non-null   int64  
 4   salary         9540 non-null   object 
 5   tenure         9540 non-null   float64
 6   satisfaction   9540 non-null   float64
 7   bonus          9540 non-null   int64  
 8   avg_hrs_month  9540 non-null   float64
 9   left           9540 non-null   object 
dtypes: float64(4), int64(3), object(3)
memory usage: 745.4+ KB


### 1. Which department has the highest employee turnover? Which one has the lowest?

In [7]:
df_dept = df[df['left'] == 'yes'][['department', 'left']].groupby('department').count().reset_index().sort_values(by='left', ascending=False)
df_dept

Unnamed: 0,department,left
8,sales,537
7,retail,471
2,engineering,437
6,operations,436
5,marketing,243
9,support,212
1,admin,119
4,logistics,111
0,IT,110
3,finance,108


### 2. Investigate which variables seem to be better predictors of employee departure.

In [8]:
df.corr()

Unnamed: 0,promoted,review,projects,tenure,satisfaction,bonus,avg_hrs_month
promoted,1.0,0.001879,0.010107,0.00141,-0.011704,0.001072,-0.00219
review,0.001879,1.0,0.000219,-0.184133,-0.349778,-0.003627,-0.196096
projects,0.010107,0.000219,1.0,0.022596,0.002714,0.002654,0.021299
tenure,0.00141,-0.184133,0.022596,1.0,-0.146246,-0.000392,0.978618
satisfaction,-0.011704,-0.349778,0.002714,-0.146246,1.0,0.000704,-0.143142
bonus,0.001072,-0.003627,0.002654,-0.000392,0.000704,1.0,-0.00037
avg_hrs_month,-0.00219,-0.196096,0.021299,0.978618,-0.143142,-0.00037,1.0


### 3. What recommendations would you make regarding ways to reduce employee turnover?

In [25]:
df_dept = df[['salary', 'avg_hrs_month', 'left']]
px.histogram(df_dept, x='salary', y='avg_hrs_month', color='left', barmode='group')

In [20]:
df_dept = df[['department', 'salary', 'left']].groupby(['department', 'salary']).count().reset_index().sort_values(by='left', ascending=False)
df_dept

Unnamed: 0,department,salary,left
26,sales,medium,1284
8,engineering,medium,1077
23,retail,medium,1062
20,operations,medium,1061
17,marketing,medium,530
29,support,medium,504
24,sales,high,310
5,admin,medium,306
25,sales,low,289
11,finance,medium,274


In [24]:
px.sunburst(df_dept, path=['department', 'salary'], values='left', title= 'Employee Churn: Depatment and Salary wise distribution', height=700)