## Exploratory Data Analysis

1. What is the gender breakdown of employees in the company?

In [3]:
SELECT gender, count(*) as count 
FROM 'human_resources_cleaned.csv'
WHERE age >= 18 AND termdate = '1000-01-01'
GROUP BY gender;

Unnamed: 0,gender,count
0,Male,8911
1,Non-Conforming,481
2,Female,8090


2. What is the race/ethnicity breakdown of employees in the company?

In [4]:
SELECT race, count(*) AS count
FROM 'human_resources_cleaned.csv'
WHERE age >= 18 AND termdate = '1000-01-01'
GROUP BY race
ORDER BY count(*) DESC;

Unnamed: 0,race,count
0,White,4987
1,Two or More Races,2867
2,Black or African American,2840
3,Asian,2791
4,Hispanic or Latino,1994
5,American Indian or Alaska Native,1051
6,Native Hawaiian or Other Pacific Islander,952


3. What is the age distribution of employees in the company?

In [5]:
SELECT 
	min(age) AS youngest,
	max(age) AS older
FROM 'human_resources_cleaned.csv'
WHERE age >= 18 AND termdate = '1000-01-01';

Unnamed: 0,youngest,older
0,22,59


In [6]:
SELECT 
	CASE 
		WHEN age >= 18 AND age <= 24 THEN '18-24'
        WHEN age >= 25 AND age <= 34 THEN '25-34'
        WHEN age >= 35 AND age <= 44 THEN '35-44'
        WHEN age >= 45 AND age <= 54 THEN '45-54'
        WHEN age >= 55 AND age <= 64 THEN '55-64'
        ELSE '65+'
        
	End AS age_group,
    count(*) AS count
FROM 'human_resources_cleaned.csv'
WHERE age >= 18 AND termdate = '1000-01-01'
GROUP BY age_group
ORDER BY age_group;

Unnamed: 0,age_group,count
0,18-24,1299
1,25-34,4903
2,35-44,5037
3,45-54,4868
4,55-64,1375


In [7]:
SELECT 
	CASE 
		WHEN age >= 18 AND age <= 24 THEN '18-24'
        WHEN age >= 25 AND age <= 34 THEN '25-34'
        WHEN age >= 35 AND age <= 44 THEN '35-44'
        WHEN age >= 45 AND age <= 54 THEN '45-54'
        WHEN age >= 55 AND age <= 64 THEN '55-64'
        ELSE '65+'
        
	End AS age_group, gender,
    count(*) AS count
FROM 'human_resources_cleaned.csv'
WHERE age >= 18 AND termdate = '1000-01-01'
GROUP BY age_group, gender
ORDER BY age_group, gender;

Unnamed: 0,age_group,gender,count
0,18-24,Female,598
1,18-24,Male,671
2,18-24,Non-Conforming,30
3,25-34,Female,2291
4,25-34,Male,2476
5,25-34,Non-Conforming,136
6,35-44,Female,2274
7,35-44,Male,2626
8,35-44,Non-Conforming,137
9,45-54,Female,2269


4. How many employees work at headquarters versus remote locations?

In [8]:
SELECT location, count(*) AS count
FROM 'human_resources_cleaned.csv'
WHERE age >= 18 AND termdate = '1000-01-01'
GROUP BY location;

Unnamed: 0,location,count
0,Headquarters,13107
1,Remote,4375


5. What is the average length of employment for employees who have been terminated?

In [10]:
SELECT 
	round(avg(datediff('day', hire_date, termdate))/365,0) AS ave_length_employed
FROM 'human_resources_cleaned.csv'
WHERE termdate <= current_date AND termdate <> '1000-01-01' AND age >= 18;

Unnamed: 0,ave_length_employed
0,8.0


6. How does the gender distribution vary across departments and job titles?

In [15]:
SELECT department, gender, count(*) As count
FROM 'human_resources_cleaned.csv'
WHERE age >= 18 AND termdate = '1000-01-01'
GROUP BY department, gender
ORDER BY count DESC;

Unnamed: 0,department,gender,count
0,Engineering,Male,2671
1,Engineering,Female,2442
2,Accounting,Male,1375
3,Accounting,Female,1175
4,Sales,Male,739
5,Human Resources,Male,721
6,Training,Male,688
7,Human Resources,Female,672
8,Business Development,Male,672
9,Services,Male,661


7. What is the distribution of job titles across the company?

In [13]:
SELECT jobtitle, count(*) AS count
FROM 'human_resources_cleaned.csv'
WHERE age >= 18 AND termdate = '1000-01-01'
GROUP BY jobtitle
ORDER BY count DESC;

Unnamed: 0,jobtitle,count
0,Research Assistant II,608
1,Business Analyst,552
2,Human Resources Analyst II,477
3,Research Assistant I,408
4,Account Executive,386
...,...,...
177,Associate Professor,1
178,Engineer IV,1
179,Office Assistant IV,1
180,VP of Training and Development,1


8. Which department has the highest turnover rate?

In [16]:
SELECT 
	department,
	total_count,
    terminated_count,
    terminated_count/total_count AS termination_rate
FROM (
	SELECT department,
    count(*) AS total_count,
    SUM(CASE WHEN termdate <> '1000-01-01' AND termdate <= current_date THEN 1 ELSE 0 END) AS terminated_count
    FROM 'human_resources_cleaned.csv'
    WHERE age >= 18
    GROUP BY department
    ) AS subquery
ORDER BY termination_rate DESC;

Unnamed: 0,department,total_count,terminated_count,termination_rate
0,Auditing,50,9.0,0.18
1,Legal,299,45.0,0.150502
2,Training,1622,211.0,0.130086
3,Human Resources,1727,212.0,0.122756
4,Research and Development,1032,126.0,0.122093
5,Engineering,6387,775.0,0.12134
6,Sales,1745,209.0,0.119771
7,Accounting,3192,382.0,0.119674
8,Support,903,107.0,0.118494
9,Services,1618,187.0,0.115575


9. What is the distribution of employees across locations by city and state?

In [17]:
SELECT location_state, count(*) AS count
FROM 'human_resources_cleaned.csv'
WHERE age >= 18 AND termdate = '1000-01-01'
GROUP BY location_state
ORDER BY count DESC;

Unnamed: 0,location_state,count
0,Ohio,14144
1,Pennsylvania,892
2,Illinois,698
3,Michigan,550
4,Indiana,545
5,Kentucky,347
6,Wisconsin,306


10. How has the company's employee count changed over time based on hire and term dates?

In [18]:
SELECT 
	year,
    hires,
    terminations,
    hires - terminations AS net_chage,
	round((hires - terminations)/hires * 100, 2) AS net_change_percent
FROM(
	SELECT YEAR(hire_date) AS year,
    count(*) AS hires,
    SUM(CASE WHEN termdate <> '1000-01-01' AND termdate <= current_date THEN 1 ELSE 0 END) AS terminations
    FROM 'human_resources_cleaned.csv'
    WHERE age >=18
    GROUP BY YEAR(hire_date)
    )AS subquery
ORDER BY year ASC;

Unnamed: 0,year,hires,terminations,net_chage,net_change_percent
0,2000,211,26.0,185.0,87.68
1,2001,1082,197.0,885.0,81.79
2,2002,1012,162.0,850.0,83.99
3,2003,1088,198.0,890.0,81.8
4,2004,1087,201.0,886.0,81.51
5,2005,1038,189.0,849.0,81.79
6,2006,1069,184.0,885.0,82.79
7,2007,1058,153.0,905.0,85.54
8,2008,1061,145.0,916.0,86.33
9,2009,1094,154.0,940.0,85.92


11. What is the tenure distribution for each department?

In [24]:
SELECT department, round(avg(datediff('day',hire_date,termdate)/365), 0) AS avg_tenure
FROM 'human_resources_cleaned.csv'
WHERE termdate <= current_date AND termdate <> '1000-01-01' AND age >= 18
GROUP BY department
ORDER BY avg_tenure DESC;

Unnamed: 0,department,avg_tenure
0,Marketing,9.0
1,Sales,9.0
2,Human Resources,8.0
3,Engineering,8.0
4,Services,8.0
5,Support,8.0
6,Training,8.0
7,Business Development,8.0
8,Auditing,8.0
9,Accounting,8.0


12. Average age

In [21]:
SELECT round(AVG(age),0) as average_age
FROM 'human_resources_cleaned.csv'
WHERE age >= 18 AND termdate = '1000-01-01';

Unnamed: 0,average_age
0,40.0
