## Load relevant packages and tools for analysis

In [116]:
### Load relevant packages
import pandas as pd
import sqlite3
import csv

In [112]:
!pip install ipython-sql



In [113]:
# Connecting to database 
conn = sqlite3.connect('hr_data_database2.db')

In [114]:
# Load the SQL extension to execute SQL commands in this notebook cell
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [115]:
# Connect to the SQLite database named 'hr_database.db'
%sql sqlite:///hr_data_database2.db

## Understanding the Profile of the Organization

#### Key KPIs of the Organization
Overall, the presented KPIs reveal a workforce that generally appears engaged and satisfied, contributing to a positive organizational environment. However, the relatively high attrition rate should be carefully managed to retain valuable talent and sustain growth. Further investigation into factors influencing attrition, along with continuous monitoring and strategic interventions, will be essential for organizational success.

In [110]:
%%sql

SELECT 
    '$' || ROUND(SUM(salary) / 1000000, 0) || 'M' as total_cost_in_mil,
    COUNT(*) AS total_employees,
    '$' || ROUND(AVG(salary)/1000, 1) || 'k' AS avg_salary,
    ROUND(AVG(engagementsurvey), 2) AS avg_engmt_scor,
    ROUND(AVG(empsatisfaction), 2) AS avg_empsatisfaction,
    ROUND(AVG(specialprojectscount), 2) AS avg_specialprojects,
    ROUND(AVG(perfscore_ID), 2) AS avg_perf_scor,
    ROUND(AVG(Tenure),2) AS avg_tenure,
    ROUND(SUM(CASE WHEN DateofTermination IS NOT NULL THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 2) || '%' AS attrition_rate
FROM 
    HR_Dataset;

   sqlite:///hr_data_database.db
 * sqlite:///hr_data_database2.db
Done.


total_cost_in_mil,total_employees,avg_salary,avg_engmt_scor,avg_empsatisfaction,avg_specialprojects,avg_perf_scor,avg_tenure,attrition_rate
$21.0M,311,$69.0k,4.11,3.89,1.22,2.98,7.36,33.44%


### Gender Breakdown of Organization

The analysis of employee distribution by gender sheds light on the gender composition within the organization. The dataset reflects a gender distribution where female employees constitute 56.6% of the workforce, while male employees make up 43.4%. This distribution signifies a significant representation of both genders within the organization.

In [108]:
%%sql

SELECT
    Sex,
    COUNT(*) AS Total_Employees,
    ROUND(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM HR_Dataset), 1) || '%' AS Percentage
FROM
    HR_Dataset
GROUP BY
    Sex
ORDER BY
    Sex;

   sqlite:///hr_data_database.db
 * sqlite:///hr_data_database2.db
Done.


Sex,Total_Employees,Percentage
F,176,56.6%
M,135,43.4%


#### Race / Ethnicty Breakdown of Organization

The analysis provides insights into the diversity landscape within the organization. The organization exhibits a diverse workforce with representation from multiple racial backgrounds, including Black or African American, White, Asian, and others. While the dataset reflects positive diversity efforts, there are certain demographics, such as American Indian or Alaska Native and Hispanic, with lower representation. Identifying and addressing potential barriers to equitable representation for these groups will be important for fostering a more inclusive workplace.

In [107]:
%%sql

SELECT
    RaceDesc,
    COUNT(*) AS Total_Employees,
    ROUND(COUNT(*) * 100.0 / (SELECT COUNT(*) FROM HR_Dataset), 1) || '%' AS Percentage
FROM
    HR_Dataset
GROUP BY
    RaceDesc
ORDER BY
    RaceDesc;

   sqlite:///hr_data_database.db
 * sqlite:///hr_data_database2.db
Done.


RaceDesc,Total_Employees,Percentage
American Indian or Alaska Native,3,1.0%
Asian,29,9.3%
Black or African American,80,25.7%
Hispanic,1,0.3%
Two or more races,11,3.5%
White,187,60.1%


## Can we predict who is going to terminate and who isn't?

#### Terminated vs Non-Terminated Employees Metrics
Overall, the data suggests that "Not Terminated" employees tend to have higher average salaries, engagement scores, performance scores, tenure, less absences, and have worked on more special projects compared to terminated employees. However, the difference in most of these metrics is relatively small, indicating that various factors contribute to both groups' characteristics.

In [21]:
%%sql

SELECT 
    'Terminated' AS employee_status,
    '$' || ROUND(AVG(CASE WHEN dateoftermination IS NOT NULL THEN salary ELSE NULL END) / 1000, 1) || 'K' as avg_salary_thousands,
    ROUND(AVG(CASE WHEN dateoftermination IS NOT NULL THEN engagementsurvey ELSE NULL END), 2) AS avg_engmt_scor,
    ROUND(AVG(CASE WHEN dateoftermination IS NOT NULL THEN empsatisfaction ELSE NULL END), 2) AS avg_empsatisfaction,
    ROUND(AVG(CASE WHEN dateoftermination IS NOT NULL THEN specialprojectscount ELSE NULL END), 2) AS avg_specialprojects,
    ROUND(AVG(CASE WHEN dateoftermination IS NOT NULL THEN perfscore_ID ELSE NULL END), 2) AS avg_perf_scor,
    ROUND(AVG(CASE WHEN dateoftermination IS NOT NULL THEN tenure ELSE NULL END), 2) AS avg_tenure,
    ROUND(AVG(CASE WHEN dateoftermination IS NOT NULL THEN absences ELSE NULL END), 2) AS avg_absences
    
FROM HR_Dataset

UNION

SELECT 
    'Not Terminated' AS employee_status,
    '$' || ROUND(AVG(CASE WHEN dateoftermination IS NULL THEN salary ELSE NULL END) / 1000, 1) || 'K' as avg_salary_thousands,
    ROUND(AVG(CASE WHEN dateoftermination IS NULL THEN engagementsurvey ELSE NULL END), 2) AS avg_engmt_scor,
    ROUND(AVG(CASE WHEN dateoftermination IS NULL THEN empsatisfaction ELSE NULL END), 2) AS avg_empsatisfaction,
    ROUND(AVG(CASE WHEN dateoftermination IS NULL THEN specialprojectscount ELSE NULL END), 2) AS avg_specialprojects,
    ROUND(AVG(CASE WHEN dateoftermination IS NULL THEN perfscore_ID ELSE NULL END), 2) AS avg_perf_scor,
    ROUND(AVG(CASE WHEN dateoftermination IS NULL THEN tenure ELSE NULL END), 2) AS avg_tenure,
    ROUND(AVG(CASE WHEN dateoftermination IS NULL THEN absences ELSE NULL END), 2) AS avg_absences
    
FROM HR_Dataset;

 * sqlite:///hr_data_database.db
Done.


employee_status,avg_salary_thousands,avg_engmt_scor,avg_empsatisfaction,avg_specialprojects,avg_perf_scor,avg_tenure,avg_absences
Not Terminated,$70.7K,4.12,3.89,1.46,3.02,9.46,9.83
Terminated,$65.7K,4.09,3.88,0.73,2.88,3.08,11.05


## Is there any relationship between who a person works for and their performance score?

#### Metric evaluation by Manager
Overall, the analysis provides insights into the relationships between various managerial metrics. Managers with higher engagement scores often have better performance scores and higher employee satisfaction, which could contribute to lower attrition rates. Managers with lower performance scores might benefit from strategies to improve employee engagement and satisfaction to potentially enhance their team's performance and reduce turnover.


In [77]:
%%sql

SELECT
    ManagerName,
    COUNT(*) AS total_emp,
    ROUND(AVG(perfscore_ID),2) AS avg_perf_scor,
    ROUND(AVG(engagementsurvey),2) AS avg_engmt_scor,
    ROUND(AVG(empsatisfaction),2) AS avg_emp_satisfaction,
    ROUND(SUM(CASE WHEN dateoftermination IS NOT NULL THEN 1 ELSE 0 END) * 1.0 / COUNT(*),2) AS attrition_rate

FROM
    HR_Dataset
GROUP BY
    ManagerName
ORDER BY 
    Avg_perf_scor;

   sqlite:///hr_data_database.db
 * sqlite:///hr_data_database2.db
Done.


ManagerName,total_emp,avg_perf_scor,avg_engmt_scor,avg_empsatisfaction,attrition_rate
Debra Houlihan,3,2.67,3.84,4.33,0.33
John Smith,14,2.71,3.79,3.93,0.21
Lynn Daneault,13,2.85,3.8,4.08,0.08
Michael Albert,22,2.86,4.07,4.05,0.41
Peter Monroe,14,2.86,4.03,3.93,0.07
Amy Dunn,21,2.9,3.92,3.81,0.62
Brannon Miller,22,2.91,4.04,3.41,0.27
Kissy Sullivan,22,2.95,4.04,3.91,0.55
Board of Directors,2,3.0,4.92,3.0,0.0
Brandon R. LeBlanc,7,3.0,4.35,3.57,0.29


## Are there areas of the company where pay is not equitable?

### Average Salaries by Gender and Department 

The analysis presented insightful patterns in compensation across diverse organizational units. While certain departments demonstrate marginally higher average salaries for males and others for females, these differences are influenced by multifaceted factors beyond gender. Standout disparities in departments like "IT/IS," "Sales," and "Software Engineering" underscore the necessity for thorough investigations into compensation practices to ensure parity in remuneration regardless of gender. This serves as an initial step, urging the pursuit of comprehensive analyses and proactive strategies to cultivate authentic pay equity within departments.


In [66]:
%%sql

SELECT
    Department,
    '$' || ROUND(AVG(CASE WHEN Sex = 'M ' THEN avg_salary END) / 1000, 1) || 'k' AS Male_Avg_Salary,
    '$' || ROUND(AVG(CASE WHEN Sex = 'F' THEN avg_salary END) / 1000, 1) || 'k' AS Female_Avg_Salary
FROM
    (SELECT
        Department,
        Sex,
        AVG(salary) AS avg_salary
    FROM
        HR_Dataset
    GROUP BY
        Department, Sex) AS department_avg_salary
GROUP BY
    Department
ORDER BY
    Department;

   sqlite:///hr_data_database.db
 * sqlite:///hr_data_database2.db
Done.


Department,Male_Avg_Salary,Female_Avg_Salary
Admin Offices,$73.5k,$70.9k
Executive Office,,$250.0k
IT/IS,$99.0k,$94.6k
Production,$60.5k,$59.6k
Sales,$66.3k,$72.0k
Software Engineering,$92.7k,$96.9k


### Average Salaries by Gender and Position 
The analysis reveals varying compensation trends across different roles within the organization. While certain positions exhibit higher average salaries for males and others for females, the disparities underscore the complexities of factors impacting pay, extending beyond gender. Notably, the absence of data for certain positions underscores potential gender imbalances in those roles. Identified pay differences in specific positions like "Network Engineer," "Production Technician I," "Sales Manager," and "Sr. DBA" emphasize the need for deeper investigations into compensation practices to ensure equitable pay regardless of gender. This analysis serves as a starting point, calling for further examination of comprehensive variables and proactive strategies to achieve genuine pay equity within the organization.

In [51]:
%%sql

SELECT
    Position,
    '$' || ROUND(AVG(CASE WHEN Sex = 'M ' THEN avg_salary END)/1000 ,1) || 'k' AS Male_Avg_Salary,
    '$' || ROUND(AVG(CASE WHEN Sex = 'F' THEN avg_salary END)/1000,1) || 'K' AS Female_Avg_Salary
FROM
    (SELECT
        Position,
        Sex,
        AVG(salary) AS avg_salary
    FROM
        HR_Dataset
    GROUP BY
        Position, Sex) AS position_avg_salary
GROUP BY
    Position
ORDER BY
    Position;

 * sqlite:///hr_data_database.db
Done.


Position,Male_Avg_Salary,Female_Avg_Salary
Accountant I,$63.8k,$63.0K
Administrative Assistant,,$52.3K
Area Sales Manager,$65.8k,$63.8K
BI Developer,$95.3k,$95.9K
BI Director,$110.9k,
CIO,,$220.5K
Data Analyst,$89.1k,$90.9K
Data Architect,,$150.3K
Database Administrator,$114.0k,$107.1K
Director of Operations,$170.5k,
