## Employee Attrition:
It is defined as **the natural process by which employees leave the workforce – for example, through resignation for personal reasons or retirement – and are not immediately replaced.**

In this Notebook, you will uncover the factors that lead to employee attrition and explore important questions such as ‘show me a breakdown of distance from home by job role and attrition’ or ‘compare average monthly income by education and attrition’. This is a fictional data set created by IBM data scientists.

Attrition is an inevitable part of any business. There will come a time when an employee wants to leave your company – for either personal or professional reasons.

[Dataset Link](https://www.kaggle.com/pavansubhasht/ibm-hr-analytics-attrition-dataset)

![Reason For Attrition](https://images.toolbox.demandshore.com/c2/bc/78e0829f4431b922c53463368584/the-factors-that-cause-employee-attrition.1.png)


In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
import plotly_express as px
import plotly.io as pio
pio.templates.default = "plotly_dark"
sns.set_style('darkgrid')
import pprint

import cufflinks as cf
import plotly.offline
cf.go_offline()
cf.set_config_file(offline=False, world_readable=True)

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
print('Successful')

In [None]:
data = pd.read_csv('/kaggle/input/ibm-hr-analytics-attrition-dataset/WA_Fn-UseC_-HR-Employee-Attrition.csv')
print('Data shape:',data.shape)
data.head(3)


In [None]:
## changing the datatype to the appropriate Datatypes
data['Attrition'] = data['Attrition'].replace('Yes',1)
data['Attrition'] = data['Attrition'].replace('No',0)

# Data Analysis using Visualisation

In [None]:
BusT = data['BusinessTravel'].value_counts()
fig = px.pie(BusT, values =BusT.values, names = BusT.index, title='Distribution of Business Travel among Employees')
fig.show()

### Note: 
- 71% of the employees are travelling rarely
- Around 18% are travelling frequently 
- 10% of them do not travel for Business trips

In [None]:
sns.distplot(data['DailyRate'])

In [None]:
Dep = data['Department'].value_counts()
fig = px.pie(Dep, values = Dep.values, names = Dep.index, title='Distribution of Department')
fig.show()

### We see that there are 3 major Departments:
### 1. R&D - 961 Employees
### 2. Sales - 446 Employees
### 3. HR - 63 Employeed

This clearly tells us that IBM gives emphasis to its **Research and Development Department**


In [None]:
sns.distplot(data['DistanceFromHome'])

<div class="alert alert-block alert-success">
Most of the employees stay nearby to Office, which is a good for both the Company and the Employees.
</div>


In [None]:
ED = data['Education'].value_counts()
px.pie(ED , values = ED.values, names = ED.index)

### Education
- 1 'Below College'
- 2 'College'
- 3 'Bachelor'
- 4 'Master'
- 5 'Doctor'

<div class="alert alert-block alert-success">
The above graph clearly represents the vast majority of Employees are Bachelor and very few hold Doctarate degree
</div>


In [None]:
Education = data['EducationField'].value_counts()
px.pie(Education, values = Education.values, names = Education.index)

<div class="alert alert-block alert-success">
<li> 606 Employees have studies Life Science </li>
    <li> 464 Employees have studied Medical </li>
    <li> Only 132 Employees have Technical Degree at IBM</li>
</div>

In [None]:
data.drop(['EmployeeCount'], axis = 1, inplace = True)

In [None]:
#%matplotlib notebook
data.drop(['EmployeeNumber'], axis = 1, inplace = True)

In [None]:
Satisfaction = data['EnvironmentSatisfaction'].value_counts()
px.pie(Satisfaction, values = Satisfaction.values, names = Satisfaction.index)

### EnvironmentSatisfaction
- 1 'Low'
- 2 'Medium'
- 3 'High'
- 4 'Very High'

<div class="alert alert-block alert-success">
Around 80% of the Employees are Happy with the Working Environment
</div>

<div class="alert alert-block alert-danger">
Around 20% of the Employees are not at all satified with the Working Environment and have Rated '1'
</div>

In [None]:
data['Gender'] = data['Gender'].replace('Female',0)
data['Gender'] = data['Gender'].replace('Male',1)
Gender = data['Gender'].value_counts()
%matplotlib inline
sns.countplot(x = 'Gender', data =data)
print(Gender)

<div class="alert alert-block alert-info">
<b>
Company has <li>882 Male Employees</li> 
    <li>588 Female Employees</li>
    </b>
</div>

In [None]:
sns.distplot(data['HourlyRate'])

<div class="alert alert-block alert-info">
<b>
Hourly Rate ranges from Indian Ruppes 30 to 100 and is distributed uniformly
 </b>
</div>

In [None]:
job = data['JobInvolvement'].value_counts()
px.pie(job, values = job.values, names = job.index)

### Job Involvement
- 1 'Low'
- 2 'Medium'
- 3 'High'
- 4 'Very High'

<div class="alert alert-block alert-success">
70% Employees are highly Involved with their jobs.
</div>

<div class="alert alert-block alert-danger">
Around 30% of the Employees are least likely to Involved with their jobs.
</div>



In [None]:
data.drop(['JobLevel'], axis = 1, inplace = True)

In [None]:
JobRole = data['JobRole'].value_counts()
df1 = pd.DataFrame(JobRole)
df1.reset_index(inplace = True)
%matplotlib notebook
px.bar(df1 ,x = 'index', y = 'JobRole', text="JobRole")

In [None]:
satis = data['JobSatisfaction'].value_counts()
%matplotlib notebook
px.pie(satis, values = satis.values, names = satis.index)

## Job Satisfaction
- 1 'Low'
- 2 'Medium'
- 3 'High'
- 4 'Very High'

<div class="alert alert-block alert-success">
60+% Employees are Satisfied with their jobs at IBM
</div>

In [None]:
data['MaritalStatus'] = data['MaritalStatus'].replace('Single',0)
data['MaritalStatus'] = data['MaritalStatus'].replace('Married',1)
data['MaritalStatus'] = data['MaritalStatus'].replace('Divorced',2)

Marital = data['MaritalStatus'].value_counts()
%matplotlib inline
df2 = pd.DataFrame(Marital)
df2.reset_index(inplace = True)
%matplotlib notebook
px.bar(df2 ,x = 'index', y = 'MaritalStatus', text="MaritalStatus")

- There are 470 Singles Employees
- There are 673 Married Employees
- There are 327 Divorced Employees

In [None]:
sns.distplot(data['MonthlyIncome'])

<div class="alert alert-block alert-info">
The Average salary of an IBM employee is around 6500/-, starting from 1009 and going upto 19999
</div>

## This Notebook was Succesfully completed using this [Medium Article](https://towardsdatascience.com/10-simple-hacks-to-speed-up-your-data-analysis-in-python-ec18c6396e6b).

<div class="alert alert-block alert-info">
<b>
I hope this EDA came in handy to many, DO UPVOTE and SHARE YOUR THOUHTS here! </b>
</div>