# Data Storytelling - JOLTS
#### _A look at US Job Openings data over time_
***
_Michael Garber_

# Table of Contents
> 1. Introduce dataset
2. Pick Audience
3. EDA
4. Create Narrative
5. Presentation

## Introduction

- I have chosen to analyze the the ***JOLTS*** dataset

- "The Job Openings and Labor Turnover Survey (JOLTS) program of the Bureau of Labor Statistics (BLS) produces monthly and annual estimates of job openings, hires, and separations for the nation."

- Goal: share the story told by the recent trends found within. I will specifically focus on the job openings to understand the labor demands.

- Data Set
> Info URL - https://www.bls.gov/jlt/jltover.htm \
Data URL - https://download.bls.gov/pub/time.series/jt/jt.data.2.JobOpenings \
Data Definitions - https://download.bls.gov/pub/time.series/jt/jt.txt

## Pick Audience

- I have chosen to target a ***non-technical audience***.
    - _I will take care to make the findings accessible and interesting._

## EDA
> What can we learn about the data?

- Can I count something interesting?
- Can I find trends (e.g. high, low, increasing, decreasing, anomalies)?
- Can I make a bar plot or a histogram?
- Can I compare two related quantities?
- Can I make a scatterplot?
- Can I make a time-series plot?
- Looking at the plots, what are some insights I can make?
- Can I see any correlations?
- Is there a hypothesis I can - and should - investigate further?
- What other questions are the insights leading me to ask?

In [8]:
# Import Libraries
from pathlib import Path
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

In [20]:
# set input file col widths
colWidths = [31, 5, 12, 4, 1]

# Import JOLTS Data set
joltsData = pd.read_fwf(filepath_or_buffer="Data/jt.data.2.JobOpenings", widths=colWidths)

In [37]:
# check data head
joltsData.head()

Unnamed: 0,series_id,year,period,va,l
0,JTS000000000000000JOL,2000,M12,5088.0,
1,JTS000000000000000JOL,2001,M01,5234.0,
2,JTS000000000000000JOL,2001,M02,5097.0,
3,JTS000000000000000JOL,2001,M03,4762.0,
4,JTS000000000000000JOL,2001,M04,4615.0,


In [45]:
joltsData.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105634 entries, 0 to 105633
Data columns (total 5 columns):
 #   Column     Non-Null Count   Dtype  
---  ------     --------------   -----  
 0   series_id  105634 non-null  object 
 1   year       105634 non-null  int64  
 2   period     105634 non-null  object 
 3   va         105634 non-null  float64
 4   l          0 non-null       float64
dtypes: float64(2), int64(1), object(2)
memory usage: 4.0+ MB


In [43]:
# clean data
'''
drop cols: series_id, I
update col name va to values
drop rows where period contains \t
'''
joltsData['period'].value_counts()

period
M02              8540
M03              8540
M05              8540
M12              8539
M04              8539
M06              8539
M09              8539
M01              8538
M08              8538
M07              8536
M11              8183
M10              8181
M13              3816
M07\t       1       8
M10\t       1       7
M08\t       1       6
M01\t       1       6
M09\t       1       5
M11\t       1       5
M12\t       1       5
M04\t       1       5
M06\t       1       5
M02\t       1       4
M03\t       1       4
M05\t       1       4
M13\t       1       2
Name: count, dtype: int64

In [75]:
joltsData['year'].value_counts()

year
2012    4438
2001    4438
2021    4438
2020    4438
2019    4438
2018    4438
2017    4438
2016    4438
2015    4438
2014    4438
2013    4438
2011    4438
2023    4438
2010    4438
2009    4438
2008    4438
2007    4438
2006    4438
2005    4438
2004    4438
2003    4438
2002    4438
2022    4438
2024    3204
2000     356
Name: count, dtype: int64

In [73]:
joltsData[joltsData['period'].str.contains('\t')]

Unnamed: 0,series_id,year,period,va,l
246,JTS000000000000000JOL,2021,M06\t 1,317.0,
247,JTS000000000000000JOL,2021,M07\t 1,991.0,
248,JTS000000000000000JOL,2021,M08\t 1,884.0,
249,JTS000000000000000JOL,2021,M09\t 1,875.0,
250,JTS000000000000000JOL,2021,M10\t 1,1365.0,
...,...,...,...,...,...
85793,JTU100000000000000JOL,2022,M04\t 1,1453.0,
85794,JTU100000000000000JOL,2022,M05\t 1,60.0,
85796,JTU100000000000000JOL,2022,M07\t 1,1248.0,
85799,JTU100000000000000JOL,2022,M10\t 1,212.0,


## Create Narrative
Narrative here...

## Presentation
presentation images and text in this section...

> The questions you asked? \
The trends you investigated? \
The resulting visualizations and conclusions?

## TODO
- clean data before EDA



------
- read file command (read_fwf) not workng