# Graded Assignment 3: 9 to 5

Time to show off your SQL skills! For each question, copy the SQL query you used and make note of the answer.

## The Dataset

For this assignment, you will be using the Bureau of Labor Statistics (BLS) Current Employment Survey (CES) results which can be found on [Kaggle](https://www.kaggle.com/datasets/bls/employment).

## Business Issue

You are working for the Bureau of Labor Statistics with the United States government and have been approached by your boss with an important meeting request. You have been asked by your supervisor to meet with Dolly Parton whose nonprofit is looking to shed light on the state of employment in the United States. As part of the 9 to 5 project, their research is focused on production and nonsupervisory employees and how those employees fare compared to all employees in the United States. While the data the BLS collects from the CES is publicly available, Dolly Parton and her colleagues need your assistance navigating the thousands of rows in each table in LaborStatisticsDB.

## About the Dataset

This dataset comes directly from the Bureau of Labor Statistics’ Current Employment Survey (CES). Here are some things you need to know:

1. The industry table contains an NAICS code. This is different from the industry code. NAICS stands for North American Industry Classification System.
1. Series ID is composed of multiple different codes. CES stands for Current Employment Survey, the name of the survey which collected the data. The industry code as specified by the BLS and the data type code as specified in the datatype table.

## Set Up

To connect to the database, use the same connection info used during the SQL lessons. 

For the assignment, we will be using `LaborStatisticsDB`.

## Database Exploration

To start with, let’s get to know the database further.

1. Use this space to make note of each table in the database, the columns within each table, each column’s data type, and how the tables are connected. You can write this down or draw a diagram. Whatever method helps you get an understanding of what is going on with `LaborStatisticsDB`.
   
   To add a photo, diagram or document to your file, drop the file into the folder that holds this notebook.  Use the link button to the right of the  </> symbol in the gray part of this cell, the link is just the name of your file.

In [None]:

Table																									
dbo.annual_2016	    id	series_id	year	period	value	footnote_codes	original_file																		
dbo.datatype								                                               data_type_code	data_type_text																
dbo.footnote						                        footnote_code				                                   footnote_text															
dbo.industry	    id										                                                                              industry_code	naics_code	publishing_status	industry_name	display_level	selectable	sort_sequence								
dbo.january_2017	id	series_id	year	period	value	footnote_codes	original_file																		
dbo.period									period_code	                                                                                                                                                                                          month_abbr	month					
dbo.seasonal																				                                              industry_codes                                                                                                              seasonal_text			
dbo.series		        series_id						                                   data_type_code								  industry_code	                                                                                                              seasonal	     supersector_code	series_title	
dbo.supersector																							                                                                                                                                                                             supersector_code		supersector_name

2. What is the datatype for women employees?

data_type_code = 10

In [None]:
SELECT data_type_code
FROM LaborStatisticsDB.dbo.datatype
WHERE data_type_text = 'women employees'

3. What is the series id for  women employees in the commercial banking industry in the financial activities supersector?

series_id:
CES5552211010
CEU5552211010

In [None]:

SELECT series_id
FROM LaborStatisticsDB.dbo.series AS s  
INNER JOIN LaborStatisticsDB.dbo.datatype AS d  
ON s.data_type_code = d.data_type_code
INNER JOIN  LaborStatisticsDB.dbo.industry AS i  
ON i.industry_code = s.industry_code
INNER JOIN LaborStatisticsDB.dbo.supersector AS sup 
ON sup.supersector_code = s.supersector_code
WHERE d.data_type_text = 'women employees' AND industry_name = 'commercial banking' AND supersector_name = 'financial activities'

## Aggregate Your Friends and Code some SQL

Put together the following:

1. How many employees were reported in 2016 in all industries? Round to the nearest whole number.

2,340,612

In [None]:
SELECT round(sum(value),0)
FROM LaborStatisticsDB.dbo.annual_2016 AS data
INNER JOIN LaborStatisticsDB.dbo.series AS s  
ON s.series_id = data.series_id
INNER JOIN LaborStatisticsDB.dbo.datatype AS d  
ON s.data_type_code = d.data_type_code
WHERE d.data_type_code = 1

2. How many women employees were reported in 2016 in all industries? Round to the nearest whole number. 

1,125,490

In [None]:
SELECT round(sum(value),0)
FROM LaborStatisticsDB.dbo.annual_2016 AS data
INNER JOIN LaborStatisticsDB.dbo.series AS s  
ON s.series_id = data.series_id
INNER JOIN LaborStatisticsDB.dbo.datatype AS d  
ON s.data_type_code = d.data_type_code
WHERE d.data_type_code = 10

3. How many production/nonsupervisory employees were reported in 2016? Round to the nearest whole number. 

1,263,650

In [None]:
SELECT round(sum(value),0)
FROM LaborStatisticsDB.dbo.annual_2016 AS data
INNER JOIN LaborStatisticsDB.dbo.series AS s  
ON s.series_id = data.series_id
INNER JOIN LaborStatisticsDB.dbo.datatype AS d  
ON s.data_type_code = d.data_type_code
WHERE d.data_type_code = 6

4. In January 2017, what is the average weekly hours worked by production and nonsupervisory employees across all industries?

79,473

In [None]:

Select round(sum(value),0)
FROM LaborStatisticsDB.dbo.january_2017 AS data
WHERE data.series_id IN (
SELECT s.series_id
FROM LaborStatisticsDB.dbo.january_2017 AS data
INNER JOIN LaborStatisticsDB.dbo.series AS s  
ON s.series_id = data.series_id
INNER JOIN LaborStatisticsDB.dbo.datatype AS d  
ON s.data_type_code = d.data_type_code
WHERE d.data_type_code = 7
)

5. What is the total weekly payroll for production and nonsupervisory employees across all industries in January 2017? Round to the nearest penny.

$1,838,753,220

In [None]:
Select round(sum(value),2)
FROM LaborStatisticsDB.dbo.january_2017 AS data
WHERE data.series_id IN (
SELECT s.series_id
FROM LaborStatisticsDB.dbo.january_2017 AS data
INNER JOIN LaborStatisticsDB.dbo.series AS s  
ON s.series_id = data.series_id
INNER JOIN LaborStatisticsDB.dbo.datatype AS d  
ON s.data_type_code = d.data_type_code
WHERE d.data_type_code = 82
)

6. In January 2017, for which industry was the average weekly hours worked by production and nonsupervisory employees the highest? Which industry was the lowest?

Highest: Motor vehicle power train components
Lowest: Fitness and recreational sports centers

In [None]:
SELECT TOP 1 i.industry_code, i.industry_name, sum(data.value) AS total_average_weekly_hours
FROM LaborStatisticsDB.dbo.industry AS i  
INNER JOIN LaborStatisticsDB.dbo.series AS s   
ON i.industry_code = s.industry_code
INNER JOIN LaborStatisticsDB.dbo.january_2017 AS data
ON data.series_id = s.series_id
INNER JOIN LaborStatisticsDB.dbo.datatype AS d  
ON s.data_type_code = d.data_type_code
WHERE d.data_type_code = 7 
GROUP by i.industry_code, i.industry_name
ORDER BY total_average_weekly_hours DESC

SELECT TOP 1 i.industry_code, i.industry_name, sum(data.value) AS total_average_weekly_hours
FROM LaborStatisticsDB.dbo.industry AS i  
INNER JOIN LaborStatisticsDB.dbo.series AS s 
ON i.industry_code = s.industry_code
INNER JOIN LaborStatisticsDB.dbo.january_2017 AS data
ON data.series_id = s.series_id
INNER JOIN LaborStatisticsDB.dbo.datatype AS d  
ON s.data_type_code = d.data_type_code
WHERE d.data_type_code = 7 
GROUP by i.industry_code, i.industry_name
ORDER BY total_average_weekly_hours ASC


7. In January 2021, for which industry was the total weekly payroll for production and nonsupervisory employees the highest? Which industry was the lowest?

Highest: Total private
Lowest: Coin-operated laundries and drycleaners

In [None]:
SELECT TOP 1 i.industry_code, i.industry_name, sum(data.value) AS total_weekly_payroll
FROM LaborStatisticsDB.dbo.industry AS i  
INNER JOIN LaborStatisticsDB.dbo.series AS s 
ON i.industry_code = s.industry_code
INNER JOIN LaborStatisticsDB.dbo.january_2017 AS data
ON data.series_id = s.series_id
INNER JOIN LaborStatisticsDB.dbo.datatype AS d  
ON s.data_type_code = d.data_type_code
WHERE d.data_type_code = 82 
GROUP by i.industry_code, i.industry_name
ORDER BY total_weekly_payroll DESC

SELECT TOP 1 i.industry_code, i.industry_name, sum(data.value) AS total_weekly_payroll
FROM LaborStatisticsDB.dbo.industry AS i  
INNER JOIN LaborStatisticsDB.dbo.series AS s 
ON i.industry_code = s.industry_code
INNER JOIN LaborStatisticsDB.dbo.january_2017 AS data
ON data.series_id = s.series_id
INNER JOIN LaborStatisticsDB.dbo.datatype AS d  
ON s.data_type_code = d.data_type_code
WHERE d.data_type_code = 82 
GROUP by i.industry_code, i.industry_name
ORDER BY total_weekly_payroll ASC

## Join in on the Fun

Time to start joining! You can choose the type of join you use, just make sure to make a  note!

1. Join `annual_2016` with `series` on `series_id`. We only want the data in the `annual_2016` table to be included in the result.

In [None]:
-- Limiting rows returned from query, uncomment the line below to start on your query!
SELECT TOP 50 *
FROM LaborStatisticsDB.dbo.annual_2016 AS annual
LEFT JOIN LaborStatisticsDB.dbo.series as s     
ON annual.series_id = s.series_id


-- Uncomment the line below when you are ready to run the query, leaving it as your last!
SELECT TOP 50 *
FROM LaborStatisticsDB.dbo.annual_2016 AS annual
LEFT JOIN LaborStatisticsDB.dbo.series as s     
ON annual.series_id = s.series_id
ORDER BY annual.series_id

2. Join `series` and `datatype` on `data_type_code`.

In [None]:
-- Limiting rows returned from query, uncomment the line below to start on your query!
SELECT TOP 50 *
FROM LaborStatisticsDB.dbo.series as s     
INNER JOIN LaborStatisticsDB.dbo.datatype as d  
ON s.data_type_code = d.data_type_code

-- Uncomment the line below when you are ready to run the query, leaving it as your last!

SELECT TOP 50 *
FROM LaborStatisticsDB.dbo.series as s     
INNER JOIN LaborStatisticsDB.dbo.datatype as d  
ON s.data_type_code = d.data_type_code
ORDER BY s.series_id

3. Join `series` and `industry` on `industry_code`.

In [None]:
-- Limiting rows returned from query, uncomment the line below to start on your query!
SELECT TOP 50 *
FROM LaborStatisticsDB.dbo.series as s  
INNER JOIN LaborStatisticsDB.dbo.industry as i  
ON s.industry_code = i.industry_code

-- Uncomment the line below when you are ready to run the query, leaving it as your last!
SELECT TOP 50 *
FROM LaborStatisticsDB.dbo.series as s  
INNER JOIN LaborStatisticsDB.dbo.industry as i  
ON s.industry_code = i.industry_code
ORDER BY s.series_id

## Subqueries, Unions, Derived Tables, Oh My!

1. Write a query that returns the `series_id`, `industry_code`, `industry_name`, and `value` from the `january_2017` table but only if that value is greater than the average value for `annual_2016` of `data_type_code` 82.

In [None]:
SELECT s.series_id, s.industry_code, i.industry_name, data.value  
FROM LaborStatisticsDB.dbo.series AS s
INNER JOIN LaborStatisticsDB.dbo.industry AS i 
ON i.industry_code = s.industry_code
INNER JOIN LaborStatisticsDB.dbo.january_2017 AS data 
ON s.series_id = data.series_id
WHERE data.series_id IN (
    SELECT annual.series_id
    FROM LaborStatisticsDB.dbo.annual_2016 AS annual 
    GROUP BY annual.series_id
    HAVING data.value > avg(annual.value) AND s.data_type_code = 82
)


**Optional Bonus Question:** Write the above query as a common table expression!

In [None]:
-- Optional CTE below
WITH avg_annual AS (
    SELECT series_id, avg(value) AS avg_value
    FROM LaborStatisticsDB.dbo.annual_2016
    GROUP BY series_id
)
SELECT s.series_id, s.industry_code, i.industry_name, data.value  
FROM LaborStatisticsDB.dbo.series AS s
INNER JOIN LaborStatisticsDB.dbo.industry AS i 
ON i.industry_code = s.industry_code
INNER JOIN LaborStatisticsDB.dbo.january_2017 AS data 
ON s.series_id = data.series_id
INNER JOIN avg_annual AS a   
ON s.series_id = a.series_id
WHERE s.data_type_code = 82
AND data.value > a.avg_value
ORDER BY data.value DESC;

2. Create a `Union` table comparing average weekly earnings of production and nonsupervisory employees between `annual_2016` and `january_2017` using the data type 30.  Round to the nearest penny.  You should have a column for the average earnings and a column for the year, and the period.

In [None]:
SELECT annual.value, annual.year, annual.period
FROM LaborStatisticsDB.dbo.annual_2016 AS annual
INNER JOIN LaborStatisticsDB.dbo.series AS s  
ON annual.series_id = s.series_id
INNER JOIN LaborStatisticsDB.dbo.datatype AS d  
ON s.data_type_code = d.data_type_code
WHERE d.data_type_code = 30
UNION  
SELECT jan.value, jan.year, jan.period
FROM LaborStatisticsDB.dbo.january_2017 AS jan  
INNER JOIN LaborStatisticsDB.dbo.series AS s  
ON jan.series_id = s.series_id
INNER JOIN LaborStatisticsDB.dbo.datatype AS d  
ON s.data_type_code = d.data_type_code
WHERE d.data_type_code = 30


## Summarize Your Results

With what you know now about the  Bureau of Labor Statistics (BLS) Current Employment Survey (CES) results and working with the Labor Statistics Database, answer the following questions. Note that while this is subjective, you should include relevant data to back up your opinion.

1. During which time period did production and nonsupervisory employees fare better?

It appears that January 2017 was a better time period for production and nonsupervisory employees.  The average weekly hours of production and nonsupervisory employees was much higher in Jan 2017 than in 2016.  Total weekly payroll for production and nonsupervisory employees was also higher in Jan 2017.

2. In which industries did production and nonsupervisory employees fare better?

In both time period, Motor vehicle power train components industry had the highest total average weekly hours.  In terms of highest total weekly payroll, Total private, Private service-providing and Professional and business services were the top 3.

3. Now that you have explored the datasets, is there any data or information that you wish you had in this analysis?

Maybe it would be helpful to see information by location to compare to have the data compares nation-wide or even globally.