## ----------------------- Satisfication Index of Employee for Work Evaluation -----------------------

# Definition, Problems, Goals, Limitasi Model

------------------------------------------------------------------------------------------------------------------------------------------------------

### Definition

This Machine Learning is a model for determining the employee satisfaction index at work that can be used by companies to evaluate employee performance. So, it can be used further as a tool to select employees for promotion or attrition.

### Problems & Goals

2.1 Problems
- August 2020, Covid cases in Indonesia are still increasing (national.kompas.com). One of the effects of this covid is the reduction in the number of employees working in a company. Lots of companies lay off employees to stabilize their finances. However, when carrying out layoffs, it will be very disadvantage to the company if they fire quality employees. Therefore, this machine model will be very helpful to be used as an employee evaluation while working in a company

- According to Maier (2000: 116), the factors that affect termination of employment are:
  1. Age
  2. Length of Work
  3. Satisfaction
  4. Company Culture
  Based on the factors mentioned above, it is very important to know the Employee Job Satisfaction Index

2.2 Goals
- Knowing what variables can affect the employee job satisfaction index.
- Creating a model that can determine the employee job satisfaction index at work that can be used as a job evaluation material.

2.3 Limitation
- This model is only a tool to determine the employee satisfaction index in percentage levels, but it can be developed further to detect employee churn / no churn.
- This model can be used for various companies as a consideration in terminating or promoting employee positions.

# Import Package

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
import statsmodels.api as sm
warnings.filterwarnings('ignore')
pd.options.display.max_columns=999

%matplotlib inline

# Import Data

------------------------------------------------------------------------------------------------------------------------------------------------------

In [2]:
df = pd.read_csv('satisfaction.csv', index_col=0)
df1 = df.copy()

In [3]:
df1.head()

Unnamed: 0,emp_id,age,Dept,location,education,recruitment_type,job_level,rating,onsite,awards,certifications,salary,satisfied
0,HR8270,28,HR,Suburb,PG,Referral,5,2,0,1,0,86750,1
1,TECH1860,50,Technology,Suburb,PG,Walk-in,3,5,1,2,1,42419,0
2,TECH6390,43,Technology,Suburb,UG,Referral,4,1,0,2,0,65715,0
3,SAL6191,44,Sales,City,PG,On-Campus,2,3,1,0,0,29805,1
4,HR6734,33,HR,City,UG,Recruitment Agency,2,1,0,5,0,29805,1


------------------------------------------------------------------------------------------------------------------------------------------------------

# Feature Engineering & Selection

#### The dataset has no imbalances and no outliers

#### 1. Feature Engineering for df_base

In [4]:
df1.head()

Unnamed: 0,emp_id,age,Dept,location,education,recruitment_type,job_level,rating,onsite,awards,certifications,salary,satisfied
0,HR8270,28,HR,Suburb,PG,Referral,5,2,0,1,0,86750,1
1,TECH1860,50,Technology,Suburb,PG,Walk-in,3,5,1,2,1,42419,0
2,TECH6390,43,Technology,Suburb,UG,Referral,4,1,0,2,0,65715,0
3,SAL6191,44,Sales,City,PG,On-Campus,2,3,1,0,0,29805,1
4,HR6734,33,HR,City,UG,Recruitment Agency,2,1,0,5,0,29805,1


In [5]:
df1['Dept_enc'] = df1.Dept.map({
    "HR":0,
    "Technology":1,
    "Sales":2,
    "Purchasing":3,
    "Marketing":4
})
df1.head()

Unnamed: 0,emp_id,age,Dept,location,education,recruitment_type,job_level,rating,onsite,awards,certifications,salary,satisfied,Dept_enc
0,HR8270,28,HR,Suburb,PG,Referral,5,2,0,1,0,86750,1,0
1,TECH1860,50,Technology,Suburb,PG,Walk-in,3,5,1,2,1,42419,0,1
2,TECH6390,43,Technology,Suburb,UG,Referral,4,1,0,2,0,65715,0,1
3,SAL6191,44,Sales,City,PG,On-Campus,2,3,1,0,0,29805,1,2
4,HR6734,33,HR,City,UG,Recruitment Agency,2,1,0,5,0,29805,1,0


In [6]:
df1.location.unique()

array(['Suburb', 'City'], dtype=object)

In [7]:
df1['loc_enc']=df1.location.map({
    "Suburb":0,
    "City":1,
})
df1.head()

Unnamed: 0,emp_id,age,Dept,location,education,recruitment_type,job_level,rating,onsite,awards,certifications,salary,satisfied,Dept_enc,loc_enc
0,HR8270,28,HR,Suburb,PG,Referral,5,2,0,1,0,86750,1,0,0
1,TECH1860,50,Technology,Suburb,PG,Walk-in,3,5,1,2,1,42419,0,1,0
2,TECH6390,43,Technology,Suburb,UG,Referral,4,1,0,2,0,65715,0,1,0
3,SAL6191,44,Sales,City,PG,On-Campus,2,3,1,0,0,29805,1,2,1
4,HR6734,33,HR,City,UG,Recruitment Agency,2,1,0,5,0,29805,1,0,1


In [8]:
df1['edu_enc']=df1.education.map({
    "UG":0,
    "PG":1
})
df1.head()

Unnamed: 0,emp_id,age,Dept,location,education,recruitment_type,job_level,rating,onsite,awards,certifications,salary,satisfied,Dept_enc,loc_enc,edu_enc
0,HR8270,28,HR,Suburb,PG,Referral,5,2,0,1,0,86750,1,0,0,1
1,TECH1860,50,Technology,Suburb,PG,Walk-in,3,5,1,2,1,42419,0,1,0,1
2,TECH6390,43,Technology,Suburb,UG,Referral,4,1,0,2,0,65715,0,1,0,0
3,SAL6191,44,Sales,City,PG,On-Campus,2,3,1,0,0,29805,1,2,1,1
4,HR6734,33,HR,City,UG,Recruitment Agency,2,1,0,5,0,29805,1,0,1,0


In [9]:
df1['recruitmen_enc'] = df1.recruitment_type.map({
    "Referral":0,
    "Walk-in":1,
    "On-Campus":2,
    "Recruitment Agency":3
})

In [10]:
df1.head()

Unnamed: 0,emp_id,age,Dept,location,education,recruitment_type,job_level,rating,onsite,awards,certifications,salary,satisfied,Dept_enc,loc_enc,edu_enc,recruitmen_enc
0,HR8270,28,HR,Suburb,PG,Referral,5,2,0,1,0,86750,1,0,0,1,0
1,TECH1860,50,Technology,Suburb,PG,Walk-in,3,5,1,2,1,42419,0,1,0,1,1
2,TECH6390,43,Technology,Suburb,UG,Referral,4,1,0,2,0,65715,0,1,0,0,0
3,SAL6191,44,Sales,City,PG,On-Campus,2,3,1,0,0,29805,1,2,1,1,2
4,HR6734,33,HR,City,UG,Recruitment Agency,2,1,0,5,0,29805,1,0,1,0,3


In [11]:
df1['empl_no'] = np.arange(0,500)

In [12]:
new_col = ['empl_no','age','Dept_enc','loc_enc', 'edu_enc', 'recruitmen_enc','job_level','rating','onsite','awards','certifications','salary','satisfied']
df_base = df1[new_col]
df_base.head()

Unnamed: 0,empl_no,age,Dept_enc,loc_enc,edu_enc,recruitmen_enc,job_level,rating,onsite,awards,certifications,salary,satisfied
0,0,28,0,0,1,0,5,2,0,1,0,86750,1
1,1,50,1,0,1,1,3,5,1,2,1,42419,0
2,2,43,1,0,0,0,4,1,0,2,0,65715,0
3,3,44,2,1,1,2,2,3,1,0,0,29805,1
4,4,33,0,1,0,3,2,1,0,5,0,29805,1


#### 2. Feature Engineering & Selection for df_tunning

In [13]:
df1.drop(index=[188,215], inplace=True)

In [14]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 498 entries, 0 to 499
Data columns (total 18 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   emp_id            498 non-null    object
 1   age               498 non-null    int64 
 2   Dept              498 non-null    object
 3   location          498 non-null    object
 4   education         498 non-null    object
 5   recruitment_type  498 non-null    object
 6   job_level         498 non-null    int64 
 7   rating            498 non-null    int64 
 8   onsite            498 non-null    int64 
 9   awards            498 non-null    int64 
 10  certifications    498 non-null    int64 
 11  salary            498 non-null    int64 
 12  satisfied         498 non-null    int64 
 13  Dept_enc          498 non-null    int64 
 14  loc_enc           498 non-null    int64 
 15  edu_enc           498 non-null    int64 
 16  recruitmen_enc    498 non-null    int64 
 17  empl_no         

In [15]:
dft = df_base.copy()
dft2 = df1.copy()
dft.head()

Unnamed: 0,empl_no,age,Dept_enc,loc_enc,edu_enc,recruitmen_enc,job_level,rating,onsite,awards,certifications,salary,satisfied
0,0,28,0,0,1,0,5,2,0,1,0,86750,1
1,1,50,1,0,1,1,3,5,1,2,1,42419,0
2,2,43,1,0,0,0,4,1,0,2,0,65715,0
3,3,44,2,1,1,2,2,3,1,0,0,29805,1
4,4,33,0,1,0,3,2,1,0,5,0,29805,1


Binning Age:
- 0-20 = Very Young (0)
- 21-40 = Ideal (1)
- 41-54 = Quite Old

In [16]:
# dft['Age_bin'] = pd.cut(dft['age'], bins=[0, 20, 40, dft['age'].max()], labels=[0, 1, 2])

In [17]:
dft2.head()

Unnamed: 0,emp_id,age,Dept,location,education,recruitment_type,job_level,rating,onsite,awards,certifications,salary,satisfied,Dept_enc,loc_enc,edu_enc,recruitmen_enc,empl_no
0,HR8270,28,HR,Suburb,PG,Referral,5,2,0,1,0,86750,1,0,0,1,0,0
1,TECH1860,50,Technology,Suburb,PG,Walk-in,3,5,1,2,1,42419,0,1,0,1,1,1
2,TECH6390,43,Technology,Suburb,UG,Referral,4,1,0,2,0,65715,0,1,0,0,0,2
3,SAL6191,44,Sales,City,PG,On-Campus,2,3,1,0,0,29805,1,2,1,1,2,3
4,HR6734,33,HR,City,UG,Recruitment Agency,2,1,0,5,0,29805,1,0,1,0,3,4


In [18]:
dft2 = pd.get_dummies(dft2, columns=['job_level'], prefix_sep='_')

In [19]:
dft2.drop(columns=['emp_id','Dept','location','education','recruitment_type'], inplace=True)

In [20]:
new_col = ['empl_no','age','Dept_enc','loc_enc', 'edu_enc', 'recruitmen_enc','job_level_1','job_level_2','job_level_3','job_level_4','job_level_5','rating','onsite','awards','certifications','salary','satisfied']
dft2 = dft2[new_col]
dft2.set_index('empl_no', inplace=True)

In [21]:
dft2.head()

Unnamed: 0_level_0,age,Dept_enc,loc_enc,edu_enc,recruitmen_enc,job_level_1,job_level_2,job_level_3,job_level_4,job_level_5,rating,onsite,awards,certifications,salary,satisfied
empl_no,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
0,28,0,0,1,0,0,0,0,0,1,2,0,1,0,86750,1
1,50,1,0,1,1,0,0,1,0,0,5,1,2,1,42419,0
2,43,1,0,0,0,0,0,0,1,0,1,0,2,0,65715,0
3,44,2,1,1,2,0,1,0,0,0,3,1,0,0,29805,1
4,33,0,1,0,3,0,1,0,0,0,1,0,5,0,29805,1


In [22]:
dft3 = dft2.copy()

In [23]:
dft3.head()

Unnamed: 0_level_0,age,Dept_enc,loc_enc,edu_enc,recruitmen_enc,job_level_1,job_level_2,job_level_3,job_level_4,job_level_5,rating,onsite,awards,certifications,salary,satisfied
empl_no,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
0,28,0,0,1,0,0,0,0,0,1,2,0,1,0,86750,1
1,50,1,0,1,1,0,0,1,0,0,5,1,2,1,42419,0
2,43,1,0,0,0,0,0,0,1,0,1,0,2,0,65715,0
3,44,2,1,1,2,0,1,0,0,0,3,1,0,0,29805,1
4,33,0,1,0,3,0,1,0,0,0,1,0,5,0,29805,1


In [24]:
dft3 = pd.get_dummies(dft3, columns=['rating'], prefix_sep='_')

In [25]:
dft3.head()

Unnamed: 0_level_0,age,Dept_enc,loc_enc,edu_enc,recruitmen_enc,job_level_1,job_level_2,job_level_3,job_level_4,job_level_5,onsite,awards,certifications,salary,satisfied,rating_1,rating_2,rating_3,rating_4,rating_5
empl_no,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
0,28,0,0,1,0,0,0,0,0,1,0,1,0,86750,1,0,1,0,0,0
1,50,1,0,1,1,0,0,1,0,0,1,2,1,42419,0,0,0,0,0,1
2,43,1,0,0,0,0,0,0,1,0,0,2,0,65715,0,1,0,0,0,0
3,44,2,1,1,2,0,1,0,0,0,1,0,0,29805,1,0,0,1,0,0
4,33,0,1,0,3,0,1,0,0,0,0,5,0,29805,1,1,0,0,0,0


In [26]:
new_col = ['age','Dept_enc','loc_enc', 'edu_enc', 'recruitmen_enc','job_level_1','job_level_2','job_level_3','job_level_4','job_level_5','rating_1','rating_2','rating_3','rating_4','rating_5','onsite','awards','certifications','salary','satisfied']
dft3 = dft3[new_col]
dft3.head()

Unnamed: 0_level_0,age,Dept_enc,loc_enc,edu_enc,recruitmen_enc,job_level_1,job_level_2,job_level_3,job_level_4,job_level_5,rating_1,rating_2,rating_3,rating_4,rating_5,onsite,awards,certifications,salary,satisfied
empl_no,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
0,28,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,1,0,86750,1
1,50,1,0,1,1,0,0,1,0,0,0,0,0,0,1,1,2,1,42419,0
2,43,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,2,0,65715,0
3,44,2,1,1,2,0,1,0,0,0,0,0,1,0,0,1,0,0,29805,1
4,33,0,1,0,3,0,1,0,0,0,1,0,0,0,0,0,5,0,29805,1


------------------------------------------------------------------------------------------------------------------------------------------------------

# Export File

In [27]:
df_base.to_csv('dfbase.csv')

In [28]:
dft3.to_csv('dft3.csv')