# Visualization of Employee Satisfaction 

# Introduction

TThe Employee Satisfaction Survey dataset contains information regarding employees within a company. The dataset includes, employee identification numbers, self-reported satisfaction levels, performance evaluations, project involvement, work hours, tenure with the company, work accidents, promotions received in the last 5 years, departmental affiliations, and salary levels. 

**Dataset Available at Kaggle:**

https://www.kaggle.com/datasets/redpen12/employees-satisfaction-analysis

In [127]:
!pip install altair_viewer
!pip install pillow


You should consider upgrading via the '/opt/conda/bin/python3 -m pip install --upgrade pip' command.[0m
You should consider upgrading via the '/opt/conda/bin/python3 -m pip install --upgrade pip' command.[0m


In [1]:
# Libraries
import altair as alt
import pandas as pd
import numpy as np
from PIL import Image


In [2]:
# Import data
data= pd.read_csv("EmployeeAttrition.csv")
data.head()

Unnamed: 0,Emp ID,satisfaction_level,last_evaluation,number_project,average_montly_hours,time_spend_company,Work_accident,promotion_last_5years,dept,salary
0,1.0,0.38,0.53,2.0,157.0,3.0,0.0,0.0,sales,low
1,2.0,0.8,0.86,5.0,262.0,6.0,0.0,0.0,sales,medium
2,3.0,0.11,0.88,7.0,272.0,4.0,0.0,0.0,sales,medium
3,4.0,0.72,0.87,5.0,223.0,5.0,0.0,0.0,sales,low
4,5.0,0.37,0.52,2.0,159.0,3.0,0.0,0.0,sales,low


satisfaction_level: Employee's self-reported job satisfaction level

last_evaluation: Employee's most recent performance evaluation score	

number_project: Number of projects the employee is currently working on

average_montly_hours: Average number of hours worked per month by the employee

time_spend_company: Number of years the employee has spent with the company

Work_accident: Indicates whether the employee has experienced a work accident (1 for yes, 0 for no)

promotion_last_5years: Indicates whether the employee has received a promotion in the last 5 years (1 for yes, 0 for no)

dept: The department or division in which the employee works

salary: Employee's salary level (e.g., low, medium)

In [3]:
data.shape[0]


15787

## Data Cleaning
Rows containing null values will be dropped and the column Emp Id will be removed. Also a smaller subset of data will be randomly selected because the original dataset has too many entries for base Altair.

In [4]:
data = data.dropna()

In [5]:
subset = data.sample(n=1500, random_state=1)
subset = subset.drop('Emp ID', axis = 1)
subset.shape[0]

1500

# Visualizations

## Distribution of Satisfaction Level
A basic look at the distribution of the variable of interest, satisfaction level. 

In [6]:
satisfaction_hist = alt.Chart(subset).mark_bar(color="cornflowerblue").encode(x = alt.X('satisfaction_level', 
                                                       bin = alt.BinParams(maxbins = 50)), 
                                             y = 'count()') 


satisfaction_hist.display()

## Correlation Matrix of All Features

Examine the correlation between features with blue indicating a negative correlation and orange indicating a positive correlation. 

In [7]:
# Compute correlation matrix
corr_matrix = subset.corr()

# Convert correlation matrix to long-form
corr_data = pd.melt(corr_matrix.reset_index(), id_vars='index')
corr_data.columns = ['Variable1', 'Variable2', 'Correlation']

# Plot heatmap
heatmap = alt.Chart(corr_data).mark_rect().encode(
    alt.X('Variable1:N', title=None),
    alt.Y('Variable2:N', title=None),
    alt.Color('Correlation:Q', scale=alt.Scale(scheme='blueorange', domain=(-1, 1))),
    tooltip=['Variable1', 'Variable2', 'Correlation']
).properties(
    width=300,
    height=300,
    title='Correlation Heatmap'
).interactive()

heatmap

## Binary Variables
There are wo binary variables in the dataset, Work_accident (1 for yes, 0 for no) and promotion_last_5years (1 for yes, 0 for no).

In [8]:
# Histogram with color coding for promotion
satisfaction_promotion_hist = alt.Chart(subset).mark_bar().encode(
    x=alt.X('satisfaction_level', bin=alt.BinParams(maxbins=50), title='Satisfaction Level'),
    y=alt.Y('count()', title='Counts'),
    color=alt.Color('promotion_last_5years:N', 
                    scale=alt.Scale(domain=[0, 1], range=['darkblue', 'gold']),
                    legend=alt.Legend(title="Promotion in Last 5 Years",
                        labelExpr="datum.value == 0 ? 'No Promotion' : 'Promotion'"))
).properties(
    title='Distribution of Satisfaction Level by Promotion Status',
    width=300,
    height=200
)


# Bar plot for promotion
satisfaction_promotion_bar = alt.Chart(subset).mark_bar(size=100, color="gold").encode(
    x=alt.X("promotion_last_5years:N", title="Promotion in Last 5 Years", 
            axis=alt.Axis(values=[0, 1], tickCount=2)),  
    y=alt.Y("satisfaction_level:Q", title="Satisfaction Level"),
    color=alt.Color('promotion_last_5years:N', 
                    scale=alt.Scale(domain=[0, 1], range=['darkblue', 'gold']))
).properties(
    width=300,  
    height=200
)


satisfaction_promotion_hist | satisfaction_promotion_bar

In [9]:
# Histogram with color coding for workplace accident
satisfaction_accident_hist = alt.Chart(subset).mark_bar().encode(
    x=alt.X('satisfaction_level', bin=alt.BinParams(maxbins=50), title='Satisfaction Level'),
    y=alt.Y('count()', title='Counts'),
    color=alt.Color('Work_accident:N', 
                    scale=alt.Scale(domain=[0, 1], range=['cornflowerblue','red' ]),
                    legend=alt.Legend(title="Accident at Work",
                        labelExpr="datum.value == 0 ? 'No Accident' : 'Accident'"))
).properties(
    title='Distribution of Satisfaction Level by Work Accidents',
    width=300,
    height=200
)

# Bar plot for accidents
satisfaction_accident_bar = alt.Chart(subset).mark_bar(size=100, color="cornflowerblue").encode(
    x=alt.X("Work_accident:N", title="Accident at Work", 
            axis=alt.Axis(values=[0, 1], tickCount=2)),  
    y=alt.Y("satisfaction_level:Q", title="Satisfaction Level"),
    color=alt.Color('Work_accident:N', 
            scale=alt.Scale(domain=[0, 1], range=['red', 'cornflowerblue']))
).properties(
    width=300,  
    height=200
)

satisfaction_accident_hist | satisfaction_accident_bar

## Categorical Variables

In [10]:
# Mean barplot for salary
mean_order_salary = subset.groupby('salary')['satisfaction_level'].mean().reset_index()
mean_order_salary = mean_order_salary.sort_values('satisfaction_level', ascending=False)['salary'].tolist()

mean_satisfaction_salary = alt.Chart(subset).mark_bar().encode(
    x=alt.X('salary:N', sort=mean_order_salary, title='Salary'),
    y=alt.Y('mean(satisfaction_level):Q', title='Mean Satisfaction Level'),
    color='salary:N'
).properties(
    width=300,
    title='Mean Satisfaction Level by Salary'
)

# Boxplot for salary with adjusted y-axis range
boxplot_salary = alt.Chart(subset).mark_boxplot().encode(
    x=alt.X('salary:N', sort=mean_order_salary, title='Salary'),
    y=alt.Y('satisfaction_level:Q',
            scale=alt.Scale(domain=[0, 1.2]),  
            axis=alt.Axis(title='Satisfaction Level')),
    color='salary:N'
).properties(
    width=300,
    title='Satisfaction Level by Salary'
)

mean_satisfaction_salary | boxplot_salary

In [20]:
# Mean barplot for projects
mean_order_projects = subset.groupby('number_project')['satisfaction_level'].mean().reset_index()
mean_order_projects = mean_order_projects.sort_values('satisfaction_level', ascending=False)['number_project'].tolist()

# Bar plot for mean satisfaction level by number of projects
mean_satisfaction_projects = alt.Chart(subset).mark_bar().encode(
    x=alt.X('number_project:N', sort=mean_order_projects, title='Number of Projects'),
    y=alt.Y('mean(satisfaction_level):Q', title='Mean Satisfaction Level'),
    color='number_project:N'
).properties(
    width=300,
    title='Mean Satisfaction Level by Number of Projects'
)

# Boxplot for satisfaction level by number of projects
boxplot_projects = alt.Chart(subset).mark_boxplot().encode(
    x=alt.X('number_project:N', sort=mean_order_projects, title='Number of Projects'),
    y=alt.Y('satisfaction_level:Q',
            scale=alt.Scale(domain=[0.0, 1.2]),  
            axis=alt.Axis(title='Satisfaction Level')),
    color='number_project:N'
).properties(
    width=300,
    title='Satisfaction Level by Number of Projects'
)

mean_satisfaction_projects | boxplot_projects

In [18]:
# Mean barplot for dept
mean_order_dept = subset.groupby('dept')['satisfaction_level'].mean().reset_index()
mean_order_dept = mean_order_dept.sort_values('satisfaction_level', ascending=False)['dept'].tolist()

mean_satisfaction_dept = alt.Chart(subset).mark_bar().encode(
    x=alt.X('dept:N', sort=mean_order_dept, title='Department'),
    y=alt.Y('mean(satisfaction_level):Q', title='Mean Satisfaction Level'),
    color='dept:N'
).properties(
    width=300,
    title='Mean Satisfaction Level by Department'
)

# Boxplot for Department
boxplot_dept = alt.Chart(subset).mark_boxplot().encode(
    x=alt.X('dept:N', sort=mean_order_dept, title='Department'),
    y=alt.Y('satisfaction_level:Q',
            scale=alt.Scale(domain=[0.0, 1.2]),  
            axis=alt.Axis(title='Satisfaction Level')),
    color='dept:N'
).properties(
    width=300,
    title='Satisfaction Level by Department'
)

mean_satisfaction_dept | boxplot_dept

In [19]:
# Mean barplot for time
mean_order_time = subset.groupby('time_spend_company')['satisfaction_level'].mean().reset_index()
mean_order_time = mean_order_time.sort_values('satisfaction_level', ascending=False)['time_spend_company'].tolist()

mean_satisfaction_time = alt.Chart(subset).mark_bar().encode(
    x=alt.X('time_spend_company:N', sort=mean_order_time, title='Years Spent at Company'),
    y=alt.Y('mean(satisfaction_level):Q', title='Mean Satisfaction Level'),
    color='time_spend_company:N'
).properties(
    width=300,
    title='Mean Satisfaction Level by Years Spent at Company'
)

# Boxplot for Time
boxplot_time = alt.Chart(subset).mark_boxplot().encode(
    x=alt.X('time_spend_company:N', sort=mean_order_time, title='Years Spent at Company'),
    y=alt.Y('satisfaction_level:Q',
            scale=alt.Scale(domain=[0.0, 1.2]),  
            axis=alt.Axis(title='Satisfaction Level')),
    color='time_spend_company:N'
).properties(
    width=300,
    title='Satisfaction Level by Years at Company'
)

mean_satisfaction_time | boxplot_time

## Continuous Variables

In [11]:
# Scatter plot
dropdown = alt.binding_select (options=data["dept"].unique(), name="Select a Department:")

# Create a new selection that uses my dynamic query widget
selection = alt.selection(type="single", fields=["dept"], bind=dropdown)

scatter_plot = alt.Chart(subset).mark_circle().encode(
    x=alt.X('last_evaluation:Q'),
    y='satisfaction_level:Q',
    color=alt.Color('dept', scale=alt.Scale(scheme='observable10')),
    tooltip=['satisfaction_level', 'last_evaluation', 'dept'],
    opacity=alt.condition(selection,alt.value(1),alt.value(.1))
).properties(
    title='Satisfaction Level vs. Last Evaluation-Select Dept'
).add_selection(selection).interactive()

scatter_plot

In [12]:
# Scatter plot
dropdown = alt.binding_select (options=data["number_project"].unique(), name="Select # of Projects:")

# Create a new selection that uses my dynamic query widget
selection = alt.selection(type="single", fields=["number_project"], bind=dropdown)

scatter_plot = alt.Chart(subset).mark_circle().encode(
    x=alt.X('last_evaluation:Q'),
    y='satisfaction_level:Q',
    color=alt.Color('number_project', scale=alt.Scale(scheme='observable10')),
    tooltip=['satisfaction_level', 'last_evaluation', 'salary'],
    opacity=alt.condition(selection,alt.value(1),alt.value(.1))
).properties(
    title='Satisfaction Level vs. Last Evaluation-Select # of Projects'
).add_selection(selection).interactive()

scatter_plot

In [17]:
# Scatter plot
dropdown = alt.binding_select (options=data["time_spend_company"].unique(), name="Select # of Years at Company:")

# Create a new selection that uses my dynamic query widget
selection = alt.selection(type="single", fields=["time_spend_company"], bind=dropdown)

scatter_plot = alt.Chart(subset).mark_circle().encode(
    x=alt.X('last_evaluation:Q'),
    y='satisfaction_level:Q',
    color=alt.Color('time_spend_company', scale=alt.Scale(scheme= 'set1')),
    tooltip=['satisfaction_level', 'last_evaluation', 'salary'],
    opacity=alt.condition(selection,alt.value(1),alt.value(.1))
).properties(
    title='Satisfaction Level vs. Last Evaluation-Select Years at Company'
).add_selection(selection).interactive()

scatter_plot

In [14]:
# Scatter plot
dropdown = alt.binding_select (options=data["dept"].unique(), name="Select a Department:")

# Create a new selection that uses my dynamic query widget
selection = alt.selection(type="single", fields=["dept"], bind=dropdown)

scatter_plot = alt.Chart(subset).mark_circle().encode(
    x=alt.X('average_montly_hours:Q'),
    y='satisfaction_level:Q',
    color=alt.Color('dept', scale=alt.Scale(scheme='observable10')),
    tooltip=['satisfaction_level', 'average_montly_hours', 'dept'],
    opacity=alt.condition(selection,alt.value(1),alt.value(.1))
).properties(
    title='Satisfaction Level vs. Average Monthly Hours-Select Dept'
).add_selection(selection).interactive()

scatter_plot

In [15]:
# Scatter plot
dropdown = alt.binding_select (options=data["number_project"].unique(), name="Select # of Projects:")

# Create a new selection that uses my dynamic query widget
selection = alt.selection(type="single", fields=["number_project"], bind=dropdown)

scatter_plot = alt.Chart(subset).mark_circle().encode(
    x=alt.X('average_montly_hours:Q'),
    y='satisfaction_level:Q',
    color=alt.Color('number_project', scale=alt.Scale(scheme='observable10')),
    tooltip=['satisfaction_level', 'average_montly_hours', 'salary'],
    opacity=alt.condition(selection,alt.value(1),alt.value(.1))
).properties(
    title='Satisfaction Level vs. Average Monthly Hours-Select # of Projects'
).add_selection(selection).interactive()

scatter_plot

In [16]:
# Scatter plot
dropdown = alt.binding_select (options=data["time_spend_company"].unique(), name="Select # of Years at Company:")

# Create a new selection that uses my dynamic query widget
selection = alt.selection(type="single", fields=["time_spend_company"], bind=dropdown)

scatter_plot = alt.Chart(subset).mark_circle().encode(
    x=alt.X('average_montly_hours:Q'),
    y='satisfaction_level:Q',
    color=alt.Color('time_spend_company', scale=alt.Scale(scheme= 'set1')),
    tooltip=['satisfaction_level', 'average_montly_hours', 'salary'],
    opacity=alt.condition(selection,alt.value(1),alt.value(.1))
).properties(
    title='Satisfaction Level vs. Average Monthly Hours-Select Years at Company'
).add_selection(selection).interactive()

scatter_plot

# Conclusion

The visualizations created here help users gain an understanding of the important factors influencing employee satisfaction. It is easy to see that the number of projects an employee is working on has a prominent impact on their satisfaction. Exploring the data more in-depth, and using the interactive features in the visualizations, it becomes apparent that a relationship may exists between the number of projects being worked on and the features, number of hours as well as time spent at the company. Interestingly, promotion, accidents, and salary did not appear to be important factors in employee satisfaction.
