<a href="https://colab.research.google.com/github/npr99/PlanningMethods/blob/master/Location_Quotient_Using_BLS_QCEW.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Location Quotient Example Using BLS QCEW

The Bureau of Labor Statistics provides employment and wage data by year, county, state, and nation by industry.

An overview of this data is available on the BLS website for the Quarterly Census of Employment and Wages (QCEW) data

https://www.bls.gov/cew/

The BLS provides data in "data slices" - for individual states and counties.

https://data.bls.gov/cew/doc/access/csv_data_slices.htm

The BLS provides a complete layout of the datasets - specifically for Location Quotient the Annual Average data is a good choice:

https://data.bls.gov/cew/doc/access/csv_data_slices.htm#ANNUAL_LAYOUT

The BLS provides a complete list of Areas (states and counties) - the "area codes" are based on the state and county FIPS codes:

https://data.bls.gov/cew/doc/titles/area/area_titles.htm

### Example Area Slice Files

2016 QCEW Annual averages for Georgia
http://data.bls.gov/cew/data/api/2016/a/area/13000.csv

2016 QCEW Annual averages for Chatham County, GA
http://data.bls.gov/cew/data/api/2016/a/area/13051.csv

### File Layout
https://www.bls.gov/cew/about-data/downloadable-file-layouts/annual/naics-based-annual-layout.htm

#### Details on codes for different ownership types:
https://www.bls.gov/cew/classifications/ownerships/ownership-titles.htm


Note totals for employment are the sums of all Annual average of monthly employment levels for a given year (annual_avg_emplvl) for all ownership codes.

### BLS data as time series
Chatham County QCEW Time Series Data
https://data.bls.gov/timeseries/ENU1305110010


## Step 1: Obtain BLS QCEW Data File
The Pandas read csv command is a fast way to download .csv datafiles directly to the notebook session.

In [2]:
import pandas as pd # For reading, writing and wrangling data

In [3]:
blsqcew_areatitles = pd.read_csv('https://data.bls.gov/cew/doc/titles/area/area_titles.csv')
blsqcew_areatitles.head()

Unnamed: 0,area_fips,area_title
0,US000,U.S. TOTAL
1,USCMS,U.S. Combined Statistical Areas (combined)
2,USMSA,U.S. Metropolitan Statistical Areas (combined)
3,USNMS,U.S. Nonmetropolitan Area Counties (combined)
4,01000,Alabama -- Statewide


In [14]:
# Find Area FIPS code based on county name
blsqcew_areatitles.loc[blsqcew_areatitles['area_title'] == 'Chatham County, Georgia']


Unnamed: 0,area_fips,area_title
486,13051,"Chatham County, Georgia"


In [15]:
# Save Area Title for later use
area_title_df = blsqcew_areatitles.loc[blsqcew_areatitles['area_fips'] == '13051']
area_title = area_title_df['area_title'].values[0]
area_title

'Chatham County, Georgia'

In [5]:
blsqcew = pd.read_csv('http://data.bls.gov/cew/data/api/2016/a/area/13051.csv')
blsqcew.head()

Unnamed: 0,area_fips,own_code,industry_code,agglvl_code,size_code,year,qtr,disclosure_code,annual_avg_estabs,annual_avg_emplvl,total_annual_wages,taxable_annual_wages,annual_contributions,annual_avg_wkly_wage,avg_annual_pay,lq_disclosure_code,lq_annual_avg_estabs,lq_annual_avg_emplvl,lq_total_annual_wages,lq_taxable_annual_wages,lq_annual_contributions,lq_annual_avg_wkly_wage,lq_avg_annual_pay,oty_disclosure_code,oty_annual_avg_estabs_chg,oty_annual_avg_estabs_pct_chg,oty_annual_avg_emplvl_chg,oty_annual_avg_emplvl_pct_chg,oty_total_annual_wages_chg,oty_total_annual_wages_pct_chg,oty_taxable_annual_wages_chg,oty_taxable_annual_wages_pct_chg,oty_annual_contributions_chg,oty_annual_contributions_pct_chg,oty_annual_avg_wkly_wage_chg,oty_annual_avg_wkly_wage_pct_chg,oty_avg_annual_pay_chg,oty_avg_annual_pay_pct_chg
0,13051,0,10,70,0,2016,A,,8654,149090,6613717155,1354478311,22797107,853,44361,,1.0,1.0,1.0,1.0,1.0,1.0,1.0,,277,3.3,3627,2.5,155753462,2.4,40394517,3.1,-1835966,-7.5,-1,-0.1,-35,-0.1
1,13051,1,10,71,0,2016,A,,64,2603,184370024,0,0,1362,70837,,1.19,0.89,0.97,0.0,0.0,1.09,1.09,,0,0.0,32,1.2,2698690,1.5,0,0.0,0,0.0,3,0.2,180,0.3
2,13051,1,102,72,0,2016,A,,64,2603,184370024,0,0,1362,70837,,1.19,0.9,0.99,0.0,0.0,1.09,1.09,,0,0.0,32,1.2,2698690,1.5,0,0.0,0,0.0,3,0.2,180,0.3
3,13051,1,1021,73,0,2016,A,,15,498,30001695,0,0,1159,60275,,0.56,0.7,0.85,0.0,0.0,1.22,1.22,,0,0.0,8,1.6,-509113,-1.7,0,0.0,0,0.0,-38,-3.2,-1971,-3.2
4,13051,1,1023,73,0,2016,A,,2,17,1615844,0,0,1828,95050,,2.81,1.23,1.26,0.0,0.0,1.03,1.03,,0,0.0,-1,-5.6,-58322,-3.5,0,0.0,0,0.0,48,2.7,2469,2.7


## Step 3: Explore Data
Look at descripitive statistics for key variabiables.

In [23]:
# Explore the Location quotient of annual average employment relative to the U.S. (Rounded to the hundredths place)
blsqcew[['annual_avg_estabs','annual_avg_emplvl','lq_annual_avg_emplvl','lq_total_annual_wages',]].describe()

Unnamed: 0,annual_avg_estabs,annual_avg_emplvl,lq_annual_avg_emplvl,lq_total_annual_wages
count,1703.0,1703.0,1703.0,1703.0
mean,45.806812,702.179683,0.766753,0.771321
std,365.10619,5845.989363,2.602384,2.100285
min,0.0,0.0,0.0,0.0
25%,2.0,0.0,0.0,0.0
50%,5.0,13.0,0.16,0.14
75%,15.0,197.0,0.94,0.95
max,8654.0,149090.0,56.37,35.96


# Look at top Location Quotients by industry

## Full list of industry codes
For a full list of industry codes use the link below:

https://www.bls.gov/cew/classifications/industry/industry-titles.htm


## More details on NAICS
North American Industry Classification System (NAICS) 

https://www.naics.com/search/

## NAICS FAQ
https://www.naics.com/frequently-asked-questions/#NAICSfaq

## Add Industry Titles

In [25]:
industry_titles_df = pd.read_csv('https://www.bls.gov/cew/classifications/industry/industry-titles-csv.csv')
industry_titles_df.head()

Unnamed: 0,industry_code,industry_title
0,10,"10 Total, all industries"
1,101,101 Goods-producing
2,1011,1011 Natural resources and mining
3,1012,1012 Construction
4,1013,1013 Manufacturing


In [32]:
industry_titles_df.industry_code.describe()

count      2497
unique     2497
top       42449
freq          1
Name: industry_code, dtype: object

In [34]:
blsqcew_titles = pd.merge(left = industry_titles_df,
                          right = blsqcew,
                          left_on = 'industry_code',
                          right_on = 'industry_code',
                          how = 'right')
blsqcew_titles.head()

Unnamed: 0,industry_code,industry_title,area_fips,own_code,agglvl_code,size_code,year,qtr,disclosure_code,annual_avg_estabs,annual_avg_emplvl,total_annual_wages,taxable_annual_wages,annual_contributions,annual_avg_wkly_wage,avg_annual_pay,lq_disclosure_code,lq_annual_avg_estabs,lq_annual_avg_emplvl,lq_total_annual_wages,lq_taxable_annual_wages,lq_annual_contributions,lq_annual_avg_wkly_wage,lq_avg_annual_pay,oty_disclosure_code,oty_annual_avg_estabs_chg,oty_annual_avg_estabs_pct_chg,oty_annual_avg_emplvl_chg,oty_annual_avg_emplvl_pct_chg,oty_total_annual_wages_chg,oty_total_annual_wages_pct_chg,oty_taxable_annual_wages_chg,oty_taxable_annual_wages_pct_chg,oty_annual_contributions_chg,oty_annual_contributions_pct_chg,oty_annual_avg_wkly_wage_chg,oty_annual_avg_wkly_wage_pct_chg,oty_avg_annual_pay_chg,oty_avg_annual_pay_pct_chg
0,10,"10 Total, all industries",13051,0,70,0,2016,A,,8654,149090,6613717155,1354478311,22797107,853,44361,,1.0,1.0,1.0,1.0,1.0,1.0,1.0,,277,3.3,3627,2.5,155753462,2.4,40394517,3.1,-1835966,-7.5,-1,-0.1,-35,-0.1
1,10,"10 Total, all industries",13051,1,71,0,2016,A,,64,2603,184370024,0,0,1362,70837,,1.19,0.89,0.97,0.0,0.0,1.09,1.09,,0,0.0,32,1.2,2698690,1.5,0,0.0,0,0.0,3,0.2,180,0.3
2,10,"10 Total, all industries",13051,2,71,0,2016,A,,49,4774,256817804,718677,13493,1035,53796,,0.79,0.99,1.13,0.09,0.77,1.14,1.14,,0,0.0,-89,-1.8,9650905,3.9,320958,80.7,8430,166.5,57,5.8,2965,5.8
3,10,"10 Total, all industries",13051,3,71,0,2016,A,,16,10917,482220785,2746020,63775,849,44173,,0.11,0.74,0.82,0.15,0.49,1.1,1.1,,0,0.0,797,7.9,41172025,9.3,104473,4.0,-6593,-9.4,11,1.3,589,1.4
4,10,"10 Total, all industries",13051,5,71,0,2016,A,,8527,130796,5690308542,1351013614,22719839,837,43505,,1.02,1.03,1.01,1.02,1.0,0.98,0.98,,279,3.4,2887,2.3,102231842,1.8,39969086,3.0,-1837803,-7.5,-3,-0.4,-183,-0.4


In [35]:
blsqcew_titles.industry_code.describe()

count     1703
unique    1549
top         10
freq         5
Name: industry_code, dtype: object

In [36]:
blsqcew_titles[['industry_title','annual_avg_estabs','annual_avg_emplvl','lq_annual_avg_emplvl','lq_total_annual_wages']].sort_values(by='lq_annual_avg_emplvl', ascending=False).head(10)

Unnamed: 0,industry_title,annual_avg_estabs,annual_avg_emplvl,lq_annual_avg_emplvl,lq_total_annual_wages
1090,NAICS 488320 Marine cargo handling,13,3581,56.37,35.96
1089,NAICS 48832 Marine cargo handling,13,3581,56.37,35.96
1086,NAICS 4883 Support activities for water transp...,28,3823,39.8,26.33
104,NAICS 621493 Freestanding emergency medical ce...,1,124,23.32,26.21
1071,NAICS 4871 Scenic and sightseeing transportati...,10,243,16.52,14.2
1072,NAICS 48711 Scenic and sightseeing transportat...,10,243,16.52,14.2
1073,NAICS 487110 Scenic and sightseeing transporta...,10,243,16.52,14.2
235,NAICS 62142 Outpatient mental health centers,13,196,13.71,14.1
237,NAICS 621420 Outpatient mental health centers,13,196,13.71,14.1
1091,NAICS 48833 Navigational services to shipping,7,181,10.13,10.45


Look at top industries by employement.

In [37]:
blsqcew_titles[['industry_title','annual_avg_estabs','annual_avg_emplvl','lq_annual_avg_emplvl','lq_total_annual_wages']].sort_values(by='annual_avg_emplvl', ascending=False).head(10)

Unnamed: 0,industry_title,annual_avg_estabs,annual_avg_emplvl,lq_annual_avg_emplvl,lq_total_annual_wages
0,"10 Total, all industries",8654,149090,1.0,1.0
4,"10 Total, all industries",8527,130796,1.03,1.01
8,102 Service-providing,7677,110314,1.05,0.92
11,"1021 Trade, transportation, and utilities",2126,35113,1.24,1.26
21,1026 Leisure and hospitality,1103,23591,1.44,1.48
19,1025 Education and health services,901,23336,1.03,1.29
1577,NAICS 72 Accommodation and food services,975,21645,1.55,1.72
303,101 Goods-producing,850,20483,0.93,1.38
95,NAICS 62 Health care and social assistance,822,19895,1.0,1.26
26,NAICS 44-45 Retail trade,1295,18771,1.13,1.24


Look at summary data by 2-digit NAICS code and Ownership Code.

In [39]:
blsqcew_2digit = blsqcew_titles[['industry_code','industry_title','own_code','annual_avg_estabs','annual_avg_emplvl']].\
loc[(blsqcew_titles['industry_code'].str.len()==2) |
    (blsqcew_titles['industry_code'].str.contains("-"))].sort_values(by=['industry_code','own_code'])
blsqcew_2digit.head()

Unnamed: 0,industry_code,industry_title,own_code,annual_avg_estabs,annual_avg_emplvl
0,10,"10 Total, all industries",0,8654,149090
1,10,"10 Total, all industries",1,64,2603
2,10,"10 Total, all industries",2,49,4774
3,10,"10 Total, all industries",3,16,10917
4,10,"10 Total, all industries",5,8527,130796


In [40]:
#  Need to drop total covered - own_code == 0 - leads to double counting
blsqcew_2digit = blsqcew_2digit.loc[(blsqcew['own_code'] !=0)]
blsqcew_2digit.head()

Unnamed: 0,industry_code,industry_title,own_code,annual_avg_estabs,annual_avg_emplvl
1,10,"10 Total, all industries",1,64,2603
2,10,"10 Total, all industries",2,49,4774
3,10,"10 Total, all industries",3,16,10917
4,10,"10 Total, all industries",5,8527,130796
308,11,"NAICS 11 Agriculture, forestry, fishing and hu...",5,15,0


In [42]:
# Replace Industry code for NAICS 92 to get levels of government
blsqcew_2digit.loc[(blsqcew_2digit['industry_code'] =='92')]

Unnamed: 0,industry_code,industry_title,own_code,annual_avg_estabs,annual_avg_emplvl
125,92,NAICS 92 Public administration,1,39,1403
126,92,NAICS 92 Public administration,2,27,1912
127,92,NAICS 92 Public administration,3,13,5346


In [43]:
blsqcew_2digit.loc[(blsqcew_2digit['industry_code'] =='92') &
                   (blsqcew_2digit['own_code'] ==1),'industry_code'] = '92 1 Federal Government'
blsqcew_2digit.loc[(blsqcew_2digit['industry_code'] =='92') &
                   (blsqcew_2digit['own_code'] ==2),'industry_code'] = '92 2 State Government'
blsqcew_2digit.loc[(blsqcew_2digit['industry_code'] =='92') &
                   (blsqcew_2digit['own_code'] ==3),'industry_code'] = '92 3 Local Government'

In [50]:
table1 = blsqcew_2digit.groupby(by=["industry_code","industry_title"]).sum()
table1.reset_index(inplace = True)
table_title = "Two-digit NAICS, "+area_title
varformat = {('annual_avg_emplvl'): "{:,.0f}", ('annual_avg_estabs'):"{:,.0f}"}
table1.style\
     .set_caption(table_title)\
     .format(varformat)

Unnamed: 0,industry_code,industry_title,own_code,annual_avg_estabs,annual_avg_emplvl
0,10,"10 Total, all industries",11,8656,149090
1,11,"NAICS 11 Agriculture, forestry, fishing and hunting",5,15,0
2,21,"NAICS 21 Mining, quarrying, and oil and gas extraction",5,2,0
3,22,NAICS 22 Utilities,5,14,336
4,23,NAICS 23 Construction,5,603,5466
5,31-33,NAICS 31-33 Manufacturing,5,231,14913
6,42,NAICS 42 Wholesale trade,5,391,5442
7,44-45,NAICS 44-45 Retail trade,6,1296,18816
8,48-49,NAICS 48-49 Transportation and warehousing,8,442,11019
9,51,NAICS 51 Information,8,94,1557


# Create a function that obtains and cleans BLS data

In [59]:
import pandas as pd # For reading, writing and wrangling data
import sys  # saving CSV files

def obtain_clean_bls(area_fips: str = "US000", year: str = "2016"):
  """
  area_fips = 5 character string
  """
  blsqcew_areatitles = pd.read_csv('https://data.bls.gov/cew/doc/titles/area/area_titles.csv')
  blsqcew = pd.read_csv('http://data.bls.gov/cew/data/api/'+year+'/a/area/'+area_fips+'.csv')
  
  # Save Area Title for later use
  area_title_df = blsqcew_areatitles.loc[blsqcew_areatitles['area_fips'] == area_fips]
  area_title = area_title_df['area_title'].values[0]

  # Add industry titles
  industry_titles_df = pd.read_csv('https://www.bls.gov/cew/classifications/industry/industry-titles-csv.csv')
  blsqcew_titles = pd.merge(left = industry_titles_df,
                            right = blsqcew,
                            left_on = 'industry_code',
                            right_on = 'industry_code',
                            how = 'right')
  
  # Look at summary data by 2-digit NAICS code and Ownership Code
  blsqcew_2digit = blsqcew_titles[['industry_code','industry_title','own_code','annual_avg_estabs','annual_avg_emplvl']].\
  loc[(blsqcew_titles['industry_code'].str.len()==2) |
      (blsqcew_titles['industry_code'].str.contains("-"))].sort_values(by=['industry_code','own_code'])

  #  Need to drop total covered - own_code == 0 - leads to double counting
  blsqcew_2digit = blsqcew_2digit.loc[(blsqcew['own_code'] !=0)]

  # Replace Industry code for NAICS 92 to get levels of government
  blsqcew_2digit.loc[(blsqcew_2digit['industry_code'] =='92') &
                   (blsqcew_2digit['own_code'] ==1),'industry_code'] = '92 1 Federal Government'
  blsqcew_2digit.loc[(blsqcew_2digit['industry_code'] =='92') &
                    (blsqcew_2digit['own_code'] ==2),'industry_code'] = '92 2 State Government'
  blsqcew_2digit.loc[(blsqcew_2digit['industry_code'] =='92') &
                    (blsqcew_2digit['own_code'] ==3),'industry_code'] = '92 3 Local Government'

  table1 = blsqcew_2digit.groupby(by=["industry_code","industry_title"]).sum()
  table1.reset_index(inplace = True)
  table_title = "Two-digit NAICS, "+area_title
  varformat = {('annual_avg_emplvl'): "{:,.0f}", ('annual_avg_estabs'):"{:,.0f}"}
  table1_fmt = table1.style\
      .set_caption(table_title)\
      .format(varformat)

  # renanme columns 
  
  # Save results as csv
  csv_filepath = 'BLSQCEW_'+year+"_"+area_fips+'.csv'
  savefile = csv_filepath
  print(savefile)
  table1.to_csv(savefile, index=False)

  return table1_fmt

obtain_clean_bls()

BLSQCEW_2016_US000.csv


Unnamed: 0,industry_code,industry_title,own_code,annual_avg_estabs,annual_avg_emplvl
0,10,"10 Total, all industries",28,19671802,302312591
1,11,"NAICS 11 Agriculture, forestry, fishing and hunting",11,103712,1265973
2,21,"NAICS 21 Mining, quarrying, and oil and gas extraction",8,34180,613632
3,22,NAICS 22 Utilities,11,26410,809818
4,23,NAICS 23 Construction,11,780970,6865799
5,31-33,NAICS 31-33 Manufacturing,11,343795,12345962
6,42,NAICS 42 Wholesale trade,11,615605,5859976
7,44-45,NAICS 44-45 Retail trade,11,1044935,15886000
8,48-49,NAICS 48-49 Transportation and warehousing,11,272822,5723644
9,51,NAICS 51 Information,11,162474,2940560
