# Job Growth Over Time: A Visualization of Non-farm employees in each U.S. State
###### by Pranav Pamidighantam

When analyzing the economic health of a country or state, jobs numbers always surface as an indicator. A state with job growth and low unemployment is seen as much more healthy than the alternative. In this vein, when looking at the trend of jobs numbers over time, we can get a sense of how different states' economies developed over time. This analysis focuses on non-farm employees which is defined by the U.S. Bureau of Labor Statistics as "All Employees: Total Nonfarm, commonly known as Total Nonfarm Payroll, is a measure of the number of U.S. workers in the economy that excludes proprietors, private household employees, unpaid volunteers, farm employees, and the unincorporated self-employed. This measure accounts for approximately 80 percent of the workers who contribute to Gross Domestic Product (GDP)." \[U.S. Bureau of Labor Statistics, All Employees, Total Nonfarm [PAYEMS], retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/PAYEMS, December 7, 2020.\] Using non-farm employees as a base for this analysis will provide us with less noise in the data as farm employees vary over season. In addition, the farm employee trends, as seen in the figure below from [USDA](https://www.ers.usda.gov/topics/farm-economy/farm-labor/#size), have their own pattern which is useful to analyze separate from the broader jobs numbers [\[Source\]](https://www.ers.usda.gov/topics/farm-economy/farm-labor/#size). Specifically, we can see that farm employees have decresed significantly over time (most likely due to increases in efficiency of farming techonology, food storage, etc.) and plateau around 1990, but we will see that non-farm jobs tend to increase over time. 

![](projectimages/farm_emp_over_time.png)



In addition to looking at trends over time, comparing states based on geography is another angle that is worth looking at. We can see the differences in geographical regions first by looking at farm employee data again as separate from the main dataset in this article. The figure below shows very clearly the modern dominance of farm employment by the pacific region (CA,WA,OR,AK,HI) [\[Source\]](https://www.ers.usda.gov/topics/farm-economy/farm-labor/#geography). It is important to note this difference as it allows to understand how non-farm employees represent a different number from total employees as well as how different regions have different industries and that can affect jobs numbers. 

![](projectimages/farmgeo.png)

The below visualization is an interactive graph that uses data from the U.S. Bureau of Labor Statistics. The data at a basic level contains the amount of non-farm employees in each state for each month since January 1939. The visualization allows you to change the date and doing so will allow you to see the numbers in each state colored with states with larger number being more red and yellow. Hovering over a state will display the number associated with that state for the data selected. Then, when you select a state by clicking on it, a line graph will appear showing the selected state's numbers over time. You can select multiple states to compare multiple states over time and deselect states to remove them from the bottom line chart. All of the underlying data is available through [GeoFred: https://geof.red/m/oiC](https://geof.red/m/oiC). Specifically to download the data, under "Choose Data" the region type should be "State", the data types should be "All Employees: Total Nonfarm" "Not Seasonally Adjusted, Monthly, Thousands of Persons", the frequency should be "Monthly", the units should be"Thousands of Persons", and under "Download" the start date should be "1939 January" and the end date should be "2020 September".


# Total Non-Farm Employees by State

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import geopandas 
import ipyleaflet
import numpy as np
import ipywidgets
import fiona
import json
import IPython
from ipyleaflet import Map, GeoData, GeoJSON, Choropleth
from branca.colormap import linear

dict_of_sheets = pd.read_excel('GeoFRED_All_Employees__Total_Nonfarm_by_State_Thousands_of_Persons.xls',
                               sheet_name = [0,1,2,3], skiprows = [0])
first_merge = pd.merge(dict_of_sheets[0],dict_of_sheets[1], how = 'left', on = ['Series ID', 'Region Name', 'Region Code']) 
#https://stackoverflow.com/questions/41815079/pandas-merge-join-two-data-frames-on-multiple-columns
second_merge = pd.merge(first_merge,dict_of_sheets[2], how = 'left', on = ['Series ID', 'Region Name', 'Region Code'])
total_nonfarm_by_state = pd.merge(second_merge,dict_of_sheets[3], how = 'left', on = ['Series ID', 'Region Name', 'Region Code'])

#total_nonfarm_by_state['State_abbr'] = total_nonfarm_by_state['Series ID'][:2][:]
total_nonfarm_by_state['State_abbr'] = total_nonfarm_by_state['Series ID'].str[:2] #https://stackoverflow.com/questions/48773767/how-to-slice-column-values-in-python-pandas-dataframe
total_nonfarm_by_state = total_nonfarm_by_state.fillna(0)

total_nonfarm_by_state['STUSPS'] = total_nonfarm_by_state['State_abbr']

gdf_new_small = geopandas.read_file('https://www2.census.gov/geo/tiger/GENZ2019/shp/cb_2019_us_state_20m.zip')

gdf_new_small = gdf_new_small.drop([1])
gdf_new_small_merged = gdf_new_small.merge(total_nonfarm_by_state, on='STUSPS', how='left')
final_gdf_new_small_merged = geopandas.GeoDataFrame(gdf_new_small_merged)
final_new_gdf_small = final_gdf_new_small_merged[['geometry','STUSPS']]

# final_new_gdf_small.to_file("final_new_gdf_small.geojson", driver='GeoJSON')
with open ('final_new_gdf_small.geojson', 'r') as f:
    data_small_new = json.load(f)
for item in data_small_new['features']:
    item['id'] = item['properties']['STUSPS']
    
curr_states = dict(zip(final_gdf_new_small_merged.STUSPS, [False]*51))
@ipywidgets.interact(Date = ipywidgets.Dropdown(options = final_gdf_new_small_merged.columns.values[13:-1]))
def plot(Date):
    m = Map(center=(40,-100), zoom = 3, title = "Total non-farm employees by state")
#     out = ipywidgets.Output(layout={'border': '1px solid black'})
    def click(event = None, feature = None, id = None, properties = None):
        IPython.display.clear_output()
        display(m)
        curr_states[properties['STUSPS']] = not curr_states[properties['STUSPS']]
#         with out:
        st_str = ''
#         state_selectedlist = []
        fig, ax = plt.subplots(figsize = (20,10))
        for k,v in curr_states.items():
            if v:
                st_str += k + ','
#                 with out:
#                 print('A')
#                 state_selectedlist.append(k)
                ax.plot(final_gdf_new_small_merged.columns.values[13:-1].flatten(),final_gdf_new_small_merged.loc[final_gdf_new_small_merged['STUSPS']==k,:].iloc[:,13:-1].values.flatten(), label = k)
        if st_str:
            ax.axvline(x=Date, c = 'black', ls = '--')
            plt.legend()
            plt.xticks(final_gdf_new_small_merged.columns.values[13:-1:12].flatten(),rotation = 'vertical')
        plt.xlabel('Date')
        plt.ylabel('Thousands of Non-farm employees')
        plt.title('Total non-farm employees by state')
        plt.grid(color='grey', linestyle=':', linewidth=1)
        st_str = st_str[:-1]
#         States_Selected = tuple(state_selectedlist)
        wid2.value = 'Currently Selected States:' + st_str
        
#         print('b')
#         print(ax)
        return 
   
    def hover(event = None, feature = None, id = None, properties = None):
        date = Date
        wid1.value = properties['STUSPS'] + ': ' + date + " -- " + str(final_gdf_new_small_merged.loc[final_gdf_new_small_merged['STUSPS'] == properties['STUSPS'],date].values[0]) + ' thousand non-farm employees'
        return
        
    chorodata = dict(zip(final_gdf_new_small_merged.STUSPS, final_gdf_new_small_merged[Date]))
    geo_data = Choropleth(geo_data = data_small_new,choro_data = chorodata, key_on = 'id', 
                              hover_style={'fillColor': 'blue' , 'fillOpacity': 0.2},
                             colormap=linear.YlOrRd_04,
                             style={'fillOpacity': 0.8})
    
    wid1 = ipywidgets.Label(value='')
    st_str = ''
    fig, ax = plt.subplots(figsize = (20,10))
    for k,v in curr_states.items():
        if v:
            st_str += k + ','
            ax.plot(final_gdf_new_small_merged.columns.values[13:-1].flatten(),final_gdf_new_small_merged.loc[final_gdf_new_small_merged['STUSPS']==k,:].iloc[:,13:-1].values.flatten(), label = k)
    if st_str:
        ax.axvline(x=Date, c = 'black', ls = '--')
        plt.legend()
        plt.xticks(final_gdf_new_small_merged.columns.values[13:-1:12].flatten(),rotation = 'vertical')
    plt.grid(color='grey', linestyle=':', linewidth=1)
    plt.xlabel('Date')
    plt.ylabel('Thousands of Non-farm employees')
    plt.title('Total non-farm employees over time by state')
    st_str = st_str[:-1]
    wid2 = ipywidgets.Label(value='Currently Selected States:' + st_str)
    summ_stats = Date + '  Stats <p>'
    for i in range(1,8):
        summ_stats += final_gdf_new_small_merged[Date].describe().index.values[i] + ': ' + str(final_gdf_new_small_merged[Date].describe().values[i].round(2)) + '<p>'
#     print(summ_stats)
    wid3 = ipywidgets.HTML(value=summ_stats)
    geo_data.on_hover(hover)
    geo_data.on_click(click)
    m.add_control(ipyleaflet.WidgetControl(widget = wid1, position = 'topright'))
    m.add_control(ipyleaflet.WidgetControl(widget = wid2, position = 'bottomleft'))
    m.add_control(ipyleaflet.WidgetControl(widget = wid3, position = 'bottomright'))
    m.add_layer(geo_data)
    display(m)

interactive(children=(Dropdown(description='Date', options=('1939 January', '1939 February', '1939 March', '19…

In playing around with this visualization, you may notice that certain states (e.g. California, New York, Texas) seems to dominate regardless of the month and year the data is from. This is one of the failings of using jobs data that isn't contextualized by the population of the state. The figures below show the population growth in a few large states for reference. These charts (and their underlying data) all can be accessed through the following link by clicking on the state of interest: [https://fred.stlouisfed.org/release/tables?rid=118&eid=259194](https://fred.stlouisfed.org/release/tables?rid=118&eid=259194). We see here that at present California has about twice as many people as New York, but not twice as many non-farm employees. Thus, as a percentage of population New York has more non-farm employment. Remembering the discussion above, this may indicate that New York has a large percentage of non-farm employees as compared to farm employees, and this would be consistent with the USDA data about regional farm employment. This tell us something about the make up of each states' workforce.


![](projectimages/fredgraph_cal.png)

![](projectimages/fredgraph_ny.png)

![](projectimages/fredgraph_tx.png)


However, we can still gain insights from this data. Namely, a large portion of the economy of the U.S. at-large will be driven by the states with the most people and most employees, and one can also notice regional trends. For example, both Texas and California surpass New York by a large amount in the modern day, but they start out with lower jobs numbers. In addition, over time more and more states start to have larger amounts of employees. Some regions are relatively consistent. For example, Montana, Wyoming and the Dakotas tend to be on the lower end of non-farm employees throughout history, and the states surrounding New York tend to be behing New York but not by too much. 

Overall, this visualization gives us an understanding of where the jobs tend to be in the states, and that that distribution is actually getting more equitable over time as more and more states contribute to the picture of non-farm jobs in the U.S.