# US labor market for College and High School Graduates

The goal of this notebook was to utilize the various functions plotly has to offer. 

The data used was collected from the Federal Reserve Bank of New York directly from their website: https://www.newyorkfed.org/

The notebook first concentrates specifically on degree majors and their median annual income upon starting thier career along with those who are mid career level. 

Next this notebook identifies which majors make the most and least money at those two stages in their career, along with giving the unemployment rate for each field of study.

After looking at the earnings potential for each field, unemployment and underemployment are explored more specifically for all degree majors.

The last part looks at the median income of college graduates vs high school graduates over the last few decades. The effects of the financial crises of the Dot com boom, the 2008 financial crash, and the more recent Covid-19 pandemic are looked at in comparison to wage growth, underemployment and unemployment for all workers.


In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
import plotly.express as px
import plotly.graph_objects as go
# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# 1. Data import and exploration

First we'll import the data into pandas dataframes. We'll get rid of the 'Overall' column as we are interested in each degree major specifically.

In [None]:
df_subject = pd.read_csv('../input/us-college-graduates-wages/labor_market_college_grads.csv')

In [None]:
df_subject = df_subject[df_subject.Major != 'Overall']

First we take a look at all the degree majors listed in the dataset.

In [None]:
df_subject.Major.unique()

We can look at the first 10 majors and the unemployment rate, median income in the early stages of their career, median wage at the mid-level stage of their career, and the share of those graduating in the field who go on to receive a master's degree as well.

In [None]:
df_subject.head(10)

If we wish to look at a specific major we can. My wife studied Fine Arts in undergraduate and I studied Mathematics so those are the two I chose:

In [None]:
display(df_subject[df_subject['Major']=='Fine Arts'])
display(df_subject[df_subject['Major'] == 'Mathematics'])

# 2. Visualizing the Degree Majors

First let us take a look at all the majors offered and their median annual income early in the career vs mid-stage of their career.
The graph is interactive so please feel free to zoom in or hover over specific degree majros.

Some degree majors of interest are Pharmacy, Chemical Engineering, Mechanical Engineering and Computer Engineering as they all have extremely high earnings potentials ($100k+). Pharmacy in particular is interesting as it has such a wide gap in early stage and mid stage career earnings.

In [None]:
x = df_subject.Major
fig = go.Figure(data=[
    go.Bar(name='Early Career', x=x, y=df_subject['Median Wage Early Career']),
    go.Bar(name='Mid Career', x=x, y=df_subject['Median Wage Mid-Career'])
])
fig.update_layout(barmode='group')
fig.update_layout(
    font_family="Ariel",
    font_color="black",
    font_size = 8,
    title={
        'text': 'Wages by Degree Major',
        'font_size' : 20,
        'y':0.8,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title={
        'text' : "Degree Major",
        'font_size' : 16
    },
    yaxis_title={
        'text' : "Annual Wage",
        'font_size' : 16
    }
)
fig.show()

We can sort the early career stage annual income and add some color. We can then do the same for the mid-stage level of those careers.

In [None]:
import plotly.graph_objs as go

xs = df_subject['Major']
ys = df_subject["Median Wage Early Career"]

data = [go.Bar(
    x=df_subject['Major'],
    y=df_subject["Median Wage Early Career"],
    marker={
        'color': ys,
        'colorscale': 'Viridis'
    }
)]
layout = {
    'xaxis': {
        'categoryorder': 'array',
        'categoryarray': [x for _, x in sorted(zip(ys, xs))]
    }
}
fig = go.FigureWidget(data=data, layout=layout)
fig.update_layout(
    font_family="Ariel",
    font_color="black",
    font_size = 8,
    title={
        'text': 'Early Career Median Wage by Major',
        'y':0.8,
        'x':0.5,
        'font_size' : 20,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title={
        'text' : "Degree Major",
        'font_size' : 16
    },
    yaxis_title={
        'text' : "Annual Wage",
        'font_size' : 16
    }
)
fig.show()

In [None]:
xs = df_subject['Major']
ys = df_subject["Median Wage Mid-Career"]

data = [go.Bar(
    x=df_subject['Major'],
    y=df_subject["Median Wage Mid-Career"],
    marker={
        'color': ys,
        'colorscale': 'Viridis'
    }
)]
layout = {
    'xaxis': {
        'categoryorder': 'array',
        'categoryarray': [x for _, x in sorted(zip(ys, xs))]
    }
}
fig = go.FigureWidget(data=data, layout=layout)
fig.update_layout(
    font_family="Ariel",
    font_color="black",
    font_size = 8,
    title={
        'text': 'Mid-Career Median Wage by Major',
        'y':0.8,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top',
        'font_size' : 20
    },
    xaxis_title={
        'text' : "Degree Major",
        'font_size' : 16
    },
    yaxis_title={
        'text' : "Annual Wage",
        'font_size' : 16
    }
)
fig.show()

Now that we have looked at the data overall, let's dive into the highest and lowest earners. First we'll create new DataFrames to store just the top 10 for the early and mid stages. Next we can do the same for the bottom earners.

In [None]:
dftop10_early = df_subject.sort_values(by=['Median Wage Early Career'], ascending=False).head(10)
dftop10_mid = df_subject.sort_values(by=['Median Wage Mid-Career'], ascending=False).head(10)
dftop10_early

If you're looking for a major in college and hope to maximize your income immediately following graduation this graph might help. Each bar also contains the unemployment rate for those graduating with these majors. Unsurprisingly those making a lot right out of university in their field of study tend to have rather low unemployment rates.

In [None]:

fig = px.bar(dftop10_early, x="Major", y=["Median Wage Early Career"], color="Major", text =dftop10_early['Unemployment Rate'])
fig.update_layout(
    font_family="Ariel",
    font_color="black",
    font_size = 16,
    title={
        'text': 'Top 10 Early Career Wages by Degree Major with Unemployment Rate',
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title="Degree Major",
    yaxis_title="Annual Wage",
)
fig.update_traces(texttemplate='%{text:.2f}', textposition='inside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()

If instead you're willing to play the long game, these degree majors might interest you as they tend to be the highest earners once getting to mid-stage of their careers. Most of them tend to have very low unemployment rates, although one might want to consider taking a look at majoring in physics as this might be a little riskier than others!

In [None]:
fig = px.bar(dftop10_mid, x="Major", y=["Median Wage Mid-Career"], color="Major", text =dftop10_mid['Unemployment Rate'])
fig.update_layout(
    font_family="Ariel",
    font_color="black",
    font_size = 16,
    title={
        'text': 'Top 10 Mid Career Wages by Degree Major with Unemployment Rate',
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title="Degree Major",
    yaxis_title="Annual Wage",
)
fig.update_traces(texttemplate='%{text:.2f}', textposition='inside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()

**The same process is repeated looking at the lowest earners this time.**

In [None]:
dfbottom10_early = df_subject.sort_values(by=['Median Wage Early Career'], ascending=True).head(10)
dfbottom10_mid = df_subject.sort_values(by=['Median Wage Mid-Career'], ascending=True).head(10)
dfbottom10_early

In [None]:
fig = px.bar(dfbottom10_early, x="Major", y=["Median Wage Early Career"], color="Major", text =dfbottom10_early['Unemployment Rate'])
fig.update_layout(
    font_family="Ariel",
    font_color="black",
    font_size = 16,
    title={
        'text': 'Bottom 10 Early Career Wages by Degree Major with Unemployment Rate',
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title="Degree Major",
    yaxis_title="Annual Wage",
)
fig.update_traces(texttemplate='%{text:.2f}', textposition='inside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()

Education might not make you the most money but it does seem to have little very low rates for unemployment. Might be a less risky career choice for those who want stability.

In [None]:
fig = px.bar(dfbottom10_mid, x="Major", y=["Median Wage Mid-Career"], color="Major", text =dfbottom10_mid['Unemployment Rate'])
fig.update_layout(
    font_family="Ariel",
    font_color="black",
    font_size = 16,
    title={
        'text': 'Bottom 10 Mid Career Wages by Degree Major with Unemployment Rate',
        'y':1,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title="Degree Major",
    yaxis_title="Annual Wage",
)
fig.update_traces(texttemplate='%{text:.2f}', textposition='inside')
fig.update_layout(uniformtext_minsize=8, uniformtext_mode='hide')
fig.show()

# 3. Unemployment and Underemployment

Now let's take a closer look at unemployment and underemployment.

https://www.investopedia.com defines underemployment as:

"Underemployment is a measure of employment and labor utilization in the economy that looks at how well the labor force is being utilized in terms of skills, experience, and availability to work. People who are classified as underemployed include those workers who are highly skilled but working in low paying or low skill jobs, and part-time workers who would prefer to be full-time."

I took that to mean those that are underemployed tend to have a hard time finding a job within their field of study, not being able to apply their skills learned in university to their daily work.

In [None]:
df_ue = df_subject[['Major','Unemployment Rate', 'Underemployment Rate']]
df_ue.head()

Once again we can see the high rate of unemployment for those that majored in Physics, along with Mass Media, Misc. Technologies, and Anthropology. There are very low levels of unemployment for those that majored in education, medical technicians, civil engineering, and theology.

In [None]:
df1 = df_ue.sort_values(by=['Unemployment Rate'])
fig = px.bar(df1, x="Major", y=["Unemployment Rate"], color="Major")
fig.update_layout(
    font_family="Ariel",
    font_color="black",
    font_size = 8,
    title={
        'text': "Unemployment Rate",
        'y':0.8,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top',
        'font_size' : 20},
    xaxis_title={
        'text' : 'Degree Major',
        'font_size' : 16
    },
    yaxis_title={
        'text' : 'Percentage',
        'font_size' : 16
    }
)
fig.show()

For underemployment it seems Criminal Justice, Performing Arts, Leisure and Hospitality and Liberal Arts tend to have the highest rates. On the low end is educational fields, engineering fields, and Nursing.

In [None]:
df2 = df_ue.sort_values(by=['Underemployment Rate'])
fig = px.bar(df2, x="Major", y=['Underemployment Rate'], color="Major", title="Underemployment Rate")
fig.update_layout(
    font_family="Times New Roman",
    font_color="black",
    font_size = 8,
    title={
        'text': 'Underemployment Rate',
        'y':0.8,
        'x':0.5,
        'font_size' : 20,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title={
        'text' : 'Degree Major',
        'font_size' : 16
    },
    yaxis_title={
        'text' : 'Percentage',
        'font_size' : 16
    }
)
fig.show()

# 4. Wages over time

Now let's take a look at a comparison of wages for college graduates and high school graduates over the last few decades. The plots mark out important economic events such as the Dot com boom and bust, along with the 2008 financial crash and the 2020 Covid-19 pandemic (if data is recent enough).

In [None]:
df_wages = pd.read_csv('../input/us-college-graduates-wages/wages.csv')
df_wages.head()

**Format the data for time-series analysis:**

In [None]:
df_wages['Date'] = pd.to_datetime(df_wages['Date'], format = '%m/%d/%Y')
df_wages = df_wages.set_index(df_wages['Date'])
df_wages = df_wages.drop(columns = ['Date'])

In [None]:
df_wages.head()

**We'll just look at data as recent as 1995.**

In [None]:
df_wages = df_wages[df_wages.index >= '1995-01-01']

Seems like the dot com era had a bit of a negative effect on wages, but overall not much growth has occurred over the last 15 years, although the top 75th percentile seem to have gained the most during that time frame.

In [None]:
fig = go.FigureWidget(data=[
    go.Scatter(x=df_wages.index, y=df_wages["Bachelor's degree: 25th percentile"], mode='lines', line={'dash': 'dash', 'color': 'red'}, name = "25th perc - Degree"),
    go.Scatter(x=df_wages.index, y=df_wages["Bachelor's degree: median"], mode='lines', line={'dash': 'solid', 'color': 'purple'}, name = "Median - Degree"),
    go.Scatter(x=df_wages.index, y=df_wages["Bachelor's degree: 75th percentile"], mode='lines', line={'dash': 'dash', 'color': 'blue'}, name  = "75th perc - Degree"),
    go.Scatter(x=df_wages.index, y=df_wages["High school diploma: median"], mode='lines', line={'dash': 'solid', 'color': 'green'}, name = "Median - H.S. Diploma")
])
fig.update_layout(
    font_family="Times New Roman",
    font_color="black",
    font_size = 16,
    title={
        'text': 'Annual Wages of College Graduates and High School Graduates',
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title="Year",
    yaxis_title="Annual Wages ($)",
    shapes = [
                    dict(
            type="rect",
            x0='2000-03-01',
            x1='2002-10-01',
            y0=24000,
            y1=70000,
            fillcolor="Red",
            opacity=0.4,
            layer="below",
            line_width=0
                    ),
                    dict(
            type="rect",
            x0="2007-12-01",
            x1="2009-06-01",
            y0=24000,
            y1=70000,
            fillcolor="Red",
            opacity=0.4,
            layer="below",
            line_width=0
                    ),
                    dict(
            type = 'line',
            x0 = '2008-10-01',
            x1 = '2008-10-01',
            y0 = 24000,
            y1 = 70000,
            line = dict(
            color = 'Black',
            dash = 'dashdot'))
        ],
    annotations=[
             dict(text="The Great Recession",x = '2007-12-01', y=70000),
             dict(text="Dot Com Bubble", x='2000-03-01', y=70000, hovertext = 'Market Height'),
             dict(text = "EESA Passed", x = '2008-10-01', y = 70000, showarrow=True, arrowhead=1, ax=80, ay=-50, hovertext = 'Emergency Economic Stabilization Act')
         ]
)
fig.show()

Now let's look at underemployment and unemployment over this same time frame.

In [None]:
df2 = pd.read_csv('../input/us-college-graduates-wages/under_employment_college_grads.csv')

In [None]:
df2['Date'] = pd.to_datetime(df2['Date'], format = '%m/%d/%Y')
df2 = df2.set_index(df2['Date'])
df2 = df2.drop(columns = ['Date'])

In [None]:
df2.head()

In [None]:
df2 = df2[df2.index >= '1995-01-01']

Underemployment rates seem to have gone up after both the dot com and the 2008 financial crisis but it seems the recent graduates were effected the more.

In [None]:
fig = go.FigureWidget(data=[
    go.Scatter(x=df2.index, y=df2["Recent graduates"], mode='lines', line={'dash': 'solid', 'color': 'purple'}, name = "Recent grads"),
    go.Scatter(x=df2.index, y=df2["College graduates"], mode='lines', line={'dash': 'solid', 'color': 'blue'}, name = "College grads")
])
fig.update_layout(
    font_family="Times New Roman",
    font_color="black",
    font_size = 16,
    title={
        'text': 'Underemployment Rate for College Graduates',
        'y':0.95,
        'x':0.45,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title="Year",
    yaxis_title="Percentage",
    shapes = [
                    dict(
            type="rect",
            x0='2000-03-01',
            x1='2002-10-01',
            y0=30,
            y1=49,
            fillcolor="Red",
            opacity=0.4,
            layer="below",
            line_width=0
                    ),
                    dict(
            type="rect",
            x0="2007-12-01",
            x1="2009-06-01",
            y0=30,
            y1=49,
            fillcolor="Red",
            opacity=0.4,
            layer="below",
            line_width=0
                    ),
                    dict(
            type = "line",
            x0='2020-03-11',
            x1='2020-03-11',
            y0=30,
            y1=49,
            line=dict(
            color="Red",
            dash="dash"
            )
                    ),
                    dict(
            type = 'line',
            x0 = '2008-10-01',
            x1 = '2008-10-01',
            y0 = 30,
            y1 = 49,
            line = dict(
            color = 'Black',
            dash = 'dashdot'))
        ],
    annotations=[
             dict(text="The Great Recession",x = '2007-12-01', y=49, showarrow=True, arrowhead=1),
             dict(text="Dot Com Bubble", x='2000-03-01', y=49, hovertext = 'Market Height', showarrow=True, arrowhead=1),
             dict(text ="Covid-19", x='2020-03-11', y=49, hovertext = 'WHO declares Covid-19 a pandemic', showarrow=True, arrowhead=1),
             dict(text = "EESA Passed", x = '2008-10-01', y = 49, showarrow=True, arrowhead=1, ax=80, ay=-50, hovertext = 'Emergency Economic Stabilization Act')
         ]
)
fig.show()

In [None]:
df3 = pd.read_csv('../input/us-college-graduates-wages/Unemployment_rate.csv')

In [None]:
df3['Date'] = pd.to_datetime(df3['Date'], format = '%m/%d/%Y')
df3 = df3.set_index(df3['Date'])
df3 = df3.drop(columns = ['Date'])

In [None]:
df3 = df3[df3.index >= '1995-01-01']

Now we can graph the unemployment for young workers, recent graduates, college graduates, and all workers in general. We can see if the financials crises along with the Covid-19 pandemic had any effect on unemployment. It seems the two drivers of employment over the last 15 years were in fact the 2008 financial crisis along with the Covid-19 pandemic. These two effected all levels of employment, while the dot com bubble didn't seem to effect unemployment much if at all.

In [None]:
fig = go.FigureWidget(data=[
    go.Scatter(x=df3.index, y=df3["Young workers"], mode='lines', line={'dash': 'solid', 'color': 'orange'}, name = "Young workers"),
    go.Scatter(x=df3.index, y=df3["All workers"], mode='lines', line={'dash': 'solid', 'color': 'green'}, name = "All workers"),
    go.Scatter(x=df3.index, y=df3["Recent graduates"], mode='lines', line={'dash': 'solid', 'color': 'purple'}, name = "Recent graduates"),
    go.Scatter(x=df3.index, y=df3["College graduates"], mode='lines', line={'dash': 'solid', 'color': 'blue'}, name = "College graduates")
])
fig.update_layout(
    font_family="Times New Roman",
    font_color="black",
    font_size = 16,
    title={
        'text': 'Unemployment Rate',
        'y':0.95,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
    xaxis_title="Year",
    yaxis_title="Percentage",
    shapes = [
                    dict(
            type="rect",
            x0='2000-03-01',
            x1='2002-10-01',
            y0=0,
            y1=22,
            fillcolor="Red",
            opacity=0.4,
            layer="below",
            line_width=0
                    ),
                    dict(
            type="rect",
            x0="2007-12-01",
            x1="2009-06-01",
            y0=0,
            y1=22,
            fillcolor="Red",
            opacity=0.4,
            layer="below",
            line_width=0
                    ),
                    dict(
            type = "line",
            x0='2020-03-11',
            x1='2020-03-11',
            y0=0,
            y1=22,
            line=dict(
            color="Red",
            dash="dash"
            )
                    ),
                    dict(
            type = 'line',
            x0 = '2008-10-01',
            x1 = '2008-10-01',
            y0 = 0,
            y1 = 22,
            line = dict(
            color = 'Black',
            dash = 'dashdot'))
        ],
    annotations=[
             dict(text="The Great Recession",x = '2007-12-01', y=22, showarrow=True, arrowhead=1),
             dict(text="Dot Com Bubble", x='2000-03-01', y=22, hovertext = 'Market Height', showarrow=True, arrowhead=1),
             dict(text ="Covid-19", x='2020-03-11', y=22, hovertext = 'WHO declares Covid-19 a pandemic', showarrow=True, arrowhead=1),
             dict(text = "EESA Passed", x = '2008-10-01', y = 22, showarrow=True, arrowhead=1, ax=80, ay=-50, hovertext = 'Emergency Economic Stabilization Act')
         ]
)
fig.show()

I hope you enjoyed and possibly learned something from this notebook!

Please like and/or leave a comment if you can.

Thanks for reading!