<h1><center>HR Analysis and Employee Segmentation</center></h1>

<center><img src="https://personalprojekte.com/wp-content/uploads/2020/07/ki-human-resources.jpeg"></center>

## What is HR Analytics
**Definition**: Human Resource analytics (HR Analytics) is defined as the area in the field of analytics that deals with people analysis and applying analytical process to the human capital within the organization to improve employee performance and improving employee retention.

HR analytics doesn’t collect data about how your employees are performing at work, instead, its sole aim is to provide better insight into each of the human resource processes, gathering related data and then using this data to make informed decisions on how to improve these processes.

For example, using HR analytics you can answer the following questions about the organization’s HR system:

* How high is your employee turnover rate?
* Do you know which of your employees will leave your organization within a year?
* What percentage of employee turnover is regretted loss?

Most human resource professionals will be easily able to answer the first question for their organization. However, answering the other two questions will be tricky, especially if you don’t have a detailed data for it.

In order to answer the other two questions, as a professional, you would need to combine different data and analyze it thoroughly. Human resources tend to collect a good amount of data but are unaware of how to use this data. Well, here is the answer! Use it now to analyze your human capital and make informed decisions. As soon as an organization starts to analyze their people problems using the collected data, they are engaged in active HR analytics.

* Source: https://www.questionpro.com/blog/hr-analytics-and-trends/

In [None]:
import pandas as pd
import numpy as np
import plotly.express as px
import missingno
import matplotlib.pyplot as plt
import seaborn as sns
import warnings 

In [None]:
plt.style.use('ggplot')
warnings.simplefilter('ignore')

In [None]:
from sklearn.cluster import KMeans
from sklearn.preprocessing import scale
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE

In [None]:
plt.style.use('ggplot')

orange_black = ['#fdc029', '#df861d', 'FF6347', '#aa3d01',
                '#a30e15', '#800000', '#171820']

plt.rcParams['figure.figsize'] = (10,5) 
plt.rcParams['figure.facecolor'] = '#FFFACD' 
plt.rcParams['axes.facecolor'] = 'FFFFE0' 
plt.rcParams['axes.grid'] = True 
plt.rcParams['grid.color'] = orange_black[3]
plt.rcParams['grid.linestyle'] = '--' 

In [None]:
df = pd.read_csv('/kaggle/input/employee-dataset-for-human-resources-analysis/HRdata.csv').iloc[:,1:]
df.head()

In [None]:
df.describe().T

In [None]:
df[df.Disability=='Yes']

In [None]:
df.groupby(['Gender','Statu','Region'])['Salary'].mean().reset_index().sort_values('Salary').style.bar(color='#d65f5f')

In [None]:
df.groupby(['Gender','Statu','Region'])['Age'].mean().reset_index().sort_values('Age').style.bar(color='#d65f5f')

In [None]:
df.groupby(['Education Level'])['Salary'].mean().reset_index().style.bar(color='#d65f5f')

In [None]:
df.groupby(['Education Level'])['Age'].mean().reset_index().style.bar(color='#d65f5f')

In [None]:
df.groupby(['Department'])['Salary'].mean().reset_index().style.bar(color='#d65f5f')

In [None]:
df.groupby(['Department'])['Age'].mean().reset_index().style.bar(color='#d65f5f')

In [None]:
ageCut = pd.cut(df.Age, [0,20,35,45,65],labels=['18-20','21-35','36-45','46-65']).value_counts().reset_index()
ageCut.columns = ['Age Range','Count']
ageCut.sort_values('Age Range').reset_index(drop=True).style.bar(color='#d65f5f')

In [None]:
plt.figure(figsize=(14,8))
plt.subplot(2,2,(1,2))
sns.scatterplot(data = df, x='Age', y='Salary',hue='Statu', edgecolor='k')
plt.subplot(2,2,3)
sns.violinplot(data=df, x='Statu', y='Age')
plt.subplot(2,2,4)
sns.violinplot(data=df, x='Statu', y='Salary')
plt.show()

In [None]:
plt.figure(figsize=(14,8))
plt.subplot(2,2,(1,2))
sns.scatterplot(data = df, x='Age', y='Salary',hue='Region', edgecolor='k')
plt.subplot(2,2,3)
sns.violinplot(data=df, x='Region', y='Age')
plt.subplot(2,2,4)
sns.violinplot(data=df, x='Region', y='Salary')
plt.show()

In [None]:
plt.figure(figsize=(14,8))
plt.subplot(2,2,(1,2))
sns.scatterplot(data = df, x='Age', y='Salary',hue='Gender', edgecolor='k')
plt.subplot(2,2,3)
sns.violinplot(data=df, x='Gender', y='Age')
plt.subplot(2,2,4)
sns.violinplot(data=df, x='Gender', y='Salary')
plt.show()

In [None]:
plt.figure(figsize=(14,8))
plt.subplot(2,2,(1,2))
sns.scatterplot(data = df, x='Age', y='Salary',hue='Martial', edgecolor='k')
plt.subplot(2,2,3)
sns.violinplot(data=df, x='Martial', y='Age')
plt.subplot(2,2,4)
sns.violinplot(data=df, x='Martial', y='Salary')
plt.show()

In [None]:
plt.figure(figsize=(15,4))
plt.subplot(1,3,1)
sns.kdeplot(df[df['Statu']=='Blue-Collar']['Age'], label='Blue-Collar')
sns.kdeplot(df[df['Statu']=='White-Collar']['Age'], label='White-Collar')
plt.title('Age')
plt.legend()
plt.subplot(1,3,2)
sns.kdeplot(df[df['Statu']=='Blue-Collar']['Senior'], label='Blue-Collar')
sns.kdeplot(df[df['Statu']=='White-Collar']['Senior'], label='White-Collar')
plt.title('Senior')
plt.legend()
plt.subplot(1,3,3)
sns.kdeplot(df[df['Statu']=='Blue-Collar']['Salary'], label='Blue-Collar')
sns.kdeplot(df[df['Statu']=='White-Collar']['Salary'], label='White-Collar')
plt.title('Salary')
plt.legend()
plt.show()

In [None]:
plt.figure(figsize=(15,4))
sns.countplot(data = df, x='Department', hue='Statu')
plt.xticks(rotation=90)
plt.ylabel('')
plt.show()

In [None]:
plt.figure(figsize=(15,4))
sns.countplot(data = df.sort_values('Education Level'), x='Education Level', hue='Statu')
plt.xticks(rotation=90)
plt.ylabel('')
plt.show()

In [None]:
columns = ['Gender', 'Region', 'Department', 'Statu', 'Martial', 'Education Level', 'Disability']
get_dummy = {}
for col in columns:
    get_dummy[col] = pd.get_dummies(df[col])

X = get_dummy[columns[0]]

for col in columns[1:]:
    X = pd.concat([X,get_dummy[col]], axis=1)
    
X = pd.concat([X, df[['Age','Senior']]],axis=1)    
    
X.head()

X_scaled = scale(X)

In [None]:
tsne = TSNE(verbose=1, perplexity=100, random_state=42)
X_embedded = tsne.fit_transform(X_scaled)

In [None]:
plt.figure(figsize=(15,5))
sns.scatterplot(X_embedded[:,0], X_embedded[:,1])
plt.title('t-SNE with no Labels' , size=18, fontweight='bold', fontfamily='monospace')
plt.show()

In [None]:
n_clusters = 10

model = KMeans(n_clusters=n_clusters)
model.fit(X_scaled)

df['Cluster'] = model.predict(X_scaled)
df.head()

In [None]:
df.groupby('Cluster')['Age','Salary','Senior'].mean().style.bar(color='#d65f5f')

In [None]:
palette = sns.hls_palette(10, l=.4, s=.9)
plt.figure(figsize=(15,5))
sns.scatterplot(X_embedded[:,0], X_embedded[:,1], hue=df['Cluster'], palette=palette)
plt.title('t-SNE with no Labels' , size=18, fontweight='bold', fontfamily='monospace')
plt.show()

In [None]:
n_components = 5
pca_df = pd.DataFrame()
pca = PCA(n_components=n_components)
pca.fit(X_scaled)
X_pca = pca.transform(X_scaled)
for i in range(n_components):
    pca_df["PCA{}".format(i+1)] = X_pca[:,i]
pca_df['Cluster'] = df['Cluster']
pca_df.head()

In [None]:
sns.pairplot(pca_df, hue='Cluster', palette=palette)
plt.show()