# Graded Assignment 3: 9 to 5

Time to show off your SQL skills! For each question, copy the SQL query you used and make note of the answer.

## The Dataset

For this assignment, you will be using the Bureau of Labor Statistics (BLS) Current Employment Survey (CES) results which can be found on [Kaggle](https://www.kaggle.com/datasets/bls/employment).

## Business Issue

You are working for the Bureau of Labor Statistics with the United States government and have been approached by your boss with an important meeting request. You have been asked by your supervisor to meet with Dolly Parton whose nonprofit is looking to shed light on the state of employment in the United States. As part of the 9 to 5 project, their research is focused on production and nonsupervisory employees and how those employees fare compared to all employees in the United States. While the data the BLS collects from the CES is publicly available, Dolly Parton and her colleagues need your assistance navigating the thousands of rows in each table in LaborStatisticsDB.

## About the Dataset

This dataset comes directly from the Bureau of Labor Statistics’ Current Employment Survey (CES). Here are some things you need to know:

1. The industry table contains an NAICS code. This is different from the industry code. NAICS stands for North American Industry Classification System.
1. Series ID is composed of multiple different codes. CES stands for Current Employment Survey, the name of the survey which collected the data. The industry code as specified by the BLS and the data type code as specified in the datatype table.

## Set Up

To connect to the database, use the same connection info used during the SQL lessons. 

For the assignment, we will be using `LaborStatisticsDB`.

## Database Exploration

To start with, let’s get to know the database further.

1. Use this space to make note of each table in the database, the columns within each table, each column’s data type, and how the tables are connected. You can write this down or draw a diagram. Whatever method helps you get an understanding of what is going on with `LaborStatisticsDB`.
   
   To add a photo, diagram or document to your file, drop the file into the folder that holds this notebook.  Use the link button to the right of the  </> symbol in the gray part of this cell, the link is just the name of your file.

In [None]:
There are tables of: Data combined, datatype, footnote, industry, period, seasonal, series, and supersector.
Columns: Data Type code(Int), Data Type text (STRING), Dates, footnode code (INT), footnode text (STRING), 
industry_code (INT), naics_code (STRING), publishing_status (STRING), industry_name (STRING), display_level (INT), 
selectable (BOOLEAN), sort_sequence (INT), series_id (STRING), supersector_code (INT), industry_code (INT), 
data_type_code (INT), seasonal (STRING), series_title (STRING), footnote_codes (STRING), begin_year (INT), 
begin_period (STRING), end_year (INT), end_period (STRING), supersector_name (STRING), seasonal_text (STRING), 


2. What is the datatype for women employees?

In [None]:
The datatype for women employees is 10. This was found by a quick look in the series table.

3. What is the series id for  women employees in the commercial banking industry in the financial activities supersector?

In [None]:
SELECT 
    s.series_id,
    i.industry_name,
    ss.supersector_name,
    s.data_type_code
FROM dbo.series AS s
INNER JOIN dbo.industry AS i
    ON s.industry_code = i.industry_code
INNER JOIN dbo.supersector AS ss
    ON s.supersector_code = ss.supersector_code
WHERE s.data_type_code = 10
  AND i.industry_name = 'Commercial Banking'
  AND ss.supersector_name = 'Financial Activities';


Output: 
CES5552211010	Commercial banking	Financial activities	10
CEU5552211010	Commercial banking	Financial activities	10

## Aggregate Your Friends and Code some SQL

Put together the following:

1. How many employees were reported in 2016 in all industries? Round to the nearest whole number.

In [None]:
SELECT 
    ROUND(SUM(a.value), 0) AS employees_in_2016
FROM dbo.annual_2016 AS a
INNER JOIN dbo.series AS s 
    ON a.series_id = s.series_id
WHERE s.data_type_code = 1;


Output: 2340612 employees

2. How many women employees were reported in 2016 in all industries? Round to the nearest whole number. 

In [None]:
SELECT 
    ROUND(SUM(a.value), 0) AS total_women_employees_2016
FROM dbo.annual_2016 AS a
INNER JOIN dbo.series AS s
    ON a.series_id = s.series_id
WHERE s.data_type_code = 10;


Output: 1125490

3. How many production/nonsupervisory employees were reported in 2016? Round to the nearest whole number. 

In [None]:
SELECT 
    ROUND(SUM(a.value), 0) AS Nonsupervisory
FROM dbo.annual_2016 AS a
INNER JOIN dbo.series AS s
    ON a.series_id = s.series_id
WHERE s.data_type_code = 6;


Output: 1263650

4. In January 2017, what is the average weekly hours worked by production and nonsupervisory employees across all industries?

In [None]:
SELECT 
    ROUND(AVG(j.value), 2) AS avg_weekly_hours_jan2017
FROM dbo.january_2017 AS j
INNER JOIN dbo.series AS s
    ON j.series_id = s.series_id
WHERE s.data_type_code = 7;


Output: 36.06

5. What is the total weekly payroll for production and nonsupervisory employees across all industries in January 2017? Round to the nearest penny.

In [None]:
SELECT 
    ROUND(SUM(j.value), 2) AS total_weekly_payroll_jan2017
FROM dbo.january_2017 AS j
INNER JOIN dbo.series AS s
    ON j.series_id = s.series_id
WHERE s.data_type_code = 31;  

Output: 1124969.28

6. In January 2017, for which industry was the average weekly hours worked by production and nonsupervisory employees the highest? Which industry was the lowest?

In [None]:
SELECT TOP 1 
    i.industry_name,
    ROUND(AVG(j.value), 2) AS avg_weekly_hours
FROM dbo.january_2017 AS j
INNER JOIN dbo.series AS s
    ON j.series_id = s.series_id
INNER JOIN dbo.industry AS i
    ON s.industry_code = i.industry_code
WHERE s.data_type_code = 7  
GROUP BY i.industry_name
ORDER BY avg_weekly_hours DESC;

High: Motor vehicle power train components	49.6


Low: (switch DESC to asc): Fitness and recreational sports centers	16.85

7. In January 2021, for which industry was the total weekly payroll for production and nonsupervisory employees the highest? Which industry was the lowest?

In [None]:
SELECT TOP 1 
    i.industry_name,
    ROUND(SUM(j.value), 2) AS total_weekly_payroll
FROM dbo.january_2017 AS j
INNER JOIN dbo.series AS s
    ON j.series_id = s.series_id
INNER JOIN dbo.industry AS i
    ON s.industry_code = i.industry_code
WHERE s.data_type_code = 31
AND i.industry_name IS NOT NULL
GROUP BY i.industry_name
ORDER BY total_weekly_payroll DESC;


Highest Output:
I have a NULL output generating 75224.07
With IS NOT NULL = General freight trucking	11272.02

Lowest: Bowling centers	592.5

## Join in on the Fun

Time to start joining! You can choose the type of join you use, just make sure to make a  note!

1. Join `annual_2016` with `series` on `series_id`. We only want the data in the `annual_2016` table to be included in the result.

In [None]:
-- Limiting rows returned from query, uncomment the line below to start on your query!
-- SELECT TOP 50 *
SELECT TOP 50 *
FROM dbo.annual_2016 AS a
LEFT JOIN dbo.series AS s
    ON a.series_id = s.series_id
ORDER BY a.id ASC;

Sample Output:
0	CEU5500000007	2016	M13	36.9	NULL	ce.data.55c.FinancialActivities.ProductionEmployeeHoursAndEarnings.csv	CEU5500000007	55	55000000	7	U	Average weekly hours of production and nonsupervisory employees
1	CEU5500000008	2016	M13	26.11	NULL	ce.data.55c.FinancialActivities.ProductionEmployeeHoursAndEarnings.csv	CEU5500000008	55	55000000	8	U	Average hourly earnings of production and nonsupervisory employees
2	CEU5500000030	2016	M13	962.73	NULL	ce.data.55c.FinancialActivities.ProductionEmployeeHoursAndEarnings.csv	CEU5500000030	55	55000000	30	U	Average weekly earnings of production and nonsupervisory employees
-- Uncomment the line below when you are ready to run the query, leaving it as your last!
-- ORDER BY id

2. Join `series` and `datatype` on `data_type_code`.

In [None]:
-- Limiting rows returned from query, uncomment the line below to start on your query!
-- SELECT TOP 50 *
SELECT TOP 50 *
FROM dbo.series AS s
INNER JOIN dbo.datatype AS d
    ON s.data_type_code = d.data_type_code
ORDER BY s.series_id ASC;

Sample Output:
CES0000000001	0	00000000	1	S	All employees	1	ALL EMPLOYEES
CES0000000010	0	00000000	10	S	Women employees	10	WOMEN EMPLOYEES
CES0000000025	0	00000000	25	S	All employees	25	ALL EMPLOYEES

-- Uncomment the line below when you are ready to run the query, leaving it as your last!
-- ORDER BY id

3. Join `series` and `industry` on `industry_code`.

In [None]:
-- Limiting rows returned from query, uncomment the line below to start on your query!
-- SELECT TOP 50 *
SELECT TOP 50 *
FROM dbo.series AS s
INNER JOIN dbo.industry AS i
    ON s.industry_code = i.industry_code
ORDER BY s.series_id ASC;

Sample Output:
CES0000000001	0	00000000	1	S	All employees	0	0	-	B	Total nonfarm	0	T	1
CES0000000010	0	00000000	10	S	Women employees	0	0	-	B	Total nonfarm	0	T	1
CES0000000025	0	00000000	25	S	All employees	0	0	-	B	Total nonfarm	0	T	1

-- Uncomment the line below when you are ready to run the query, leaving it as your last!
-- ORDER BY id

## Subqueries, Unions, Derived Tables, Oh My!

1. Write a query that returns the `series_id`, `industry_code`, `industry_name`, and `value` from the `january_2017` table but only if that value is greater than the average value for `annual_2016` of `data_type_code` 82.

In [None]:
SELECT 
    j.series_id,
    s.industry_code,
    i.industry_name,
    j.value
FROM dbo.january_2017 AS j
INNER JOIN dbo.series AS s
    ON j.series_id = s.series_id
INNER JOIN dbo.industry AS i
    ON s.industry_code = i.industry_code
WHERE j.value > (
    SELECT AVG(a.value)
    FROM dbo.annual_2016 AS a
    INNER JOIN dbo.series AS s2
        ON a.series_id = s2.series_id
    WHERE s2.data_type_code = 82
);

Sample Output:
CES0500000056	05000000	Total private	4239112
CES0500000056	05000000	Total private	4239112
CES0500000057	05000000	Total private	110301694

**Optional Bonus Question:** Write the above query as a common table expression!

In [None]:
-- Optional CTE below

2. Create a `Union` table comparing average weekly earnings of production and nonsupervisory employees between `annual_2016` and `january_2017` using the data type 30.  Round to the nearest penny.  You should have a column for the average earnings and a column for the year, and the period.

In [None]:
SELECT 
    ROUND(AVG(a.value), 2) AS average,
    '2016' AS year,
    'Annual' AS period
FROM dbo.annual_2016 AS a
INNER JOIN dbo.series AS s
    ON a.series_id = s.series_id
WHERE s.data_type_code = 30
UNION ALL
SELECT 
    ROUND(AVG(j.value), 2) AS average,
    '2017' AS year,
    'January' AS period
FROM dbo.january_2017 AS j
INNER JOIN dbo.series AS s
    ON j.series_id = s.series_id
WHERE s.data_type_code = 30;

Output: 
797.2	2016	Annual
808.53	2017	January


## Summarize Your Results

With what you know now about the  Bureau of Labor Statistics (BLS) Current Employment Survey (CES) results and working with the Labor Statistics Database, answer the following questions. Note that while this is subjective, you should include relevant data to back up your opinion.

1. During which time period did production and nonsupervisory employees fare better?

Accoridng to my union table (hopefully its grabbing data correctly), the January 2017 average weekly earnings was higher by $11.33 than the 2016 average. Safe to say it was better in January 2017.

2. In which industries did production and nonsupervisory employees fare better?

I made another table to determine the highest rate-of-change between the time periods. The top 5 industries according to the rate-of-change was

Other heavy construction
Heavy machinery rental and leasing
Computer and software
Tile and terrazzo contractors
Coal mining

3. Now that you have explored the datasets, is there any data or information that you wish you had in this analysis?

I would like to see benefits data incoorperated into this since that also deals with job placement and financials. I think seeing how much benefits
are going out to the different groups can help the government find potholes of industries that are lacking/abusing the beneficiaries.