# Master's Thesis - James Quacinella


# Abstract

**Objectives:** This study will extend an established model for estimating the current living wage in 2015 to the past decade for the purpose of:

* an exploratory analysis trends in the gap between the estimated living wage and the minimum wage
* evaluating any correlation between the living wage gap and other economic metrics, including public funds spent on social services

**Methods:** The original data set for this model is for 2015. This study will extend the data sources of this model into the past to enable trend analysis. Data for economic metrics from public data sources will supplement this data for correlation analysis.


# Methods

## Model

The original model proposed estimated the living wage in terms of 9 variables:

** *basic_needs_budget* ** = *food_cost* + *child_care_cost* + ( *insurance_premiums* + *health_care_costs* ) + *housing_cost* + *transportation_cost* + *other_necessities_cost*

** *living_wage* ** = *basic_needs_budget* + ( *basic_needs_budget* \* *tax_rate* )

## Data Sources

The following data sources are used to find estimates of the model variables:

* The food cost is estimated from data from the USDA’s low-cost food plan national average in June 2014. 2
* Child care is based off state-level estimates published by the National Association of Child Care Resource and Referral Agencies. 3
* Insurance costs are based on the insurance component of the 2013 Medical Expenditure Panel Survey. 4
* Housing costs are estimated from the HUD Fair Market Rents (FMR) estimates
* Other variables are pulled from the 2014 Bureau of Labor Statistics Consumer Expenditure Survey. 5

These data sets extend into the past, allowing for calculating the model for years past. The data will also have to be adjusted for inflation 6.

## Analytic Approach

First, data will be gathered from the data sources of the original model but will be extended into the past. The methodology followed by the model will be replicated to come up with a data set representing estimates of the living wage across time. After the data set is prepared, the trend of the living wage as compared to minimum wage can be examined. Has the gap increased or decreased over time, and at what rate? Have certain areas seen larger than average increases or decreases in this gap? 

Once preliminary trend analysis is done, this data set will be analyzed in comparison to other economic trends to see if any interesting correlations can be found. Correlations to GDP growth rate and the national rate of unemployment can be made, but the primary investigation will be to see if the living wage gap correlates to national spending on SNAP (Food stamps). In other words, we will see if there is any (potentially time lagged) relationship between the living wage gap and how much the United States needs to spend to support those who cannot make ends meet. A relationship here can potentially indicate that shrinking this gap could lower public expenditures.


## Presentation Of Results

Results will be presented for both parts of the data analysis. For studying the living wage gap trends, this report will present graphs of time series, aggregated in different ways, of the living wage as well as the living wage gap. Some of these time series will be presented along side data on public expenditures on SNAP to visually inspect for correlations.

## Background / Sources

- Glasmeier AK, Nadeau CA, Schultheis E: LIVING WAGE CALCULATOR User’s Guide / Technical Notes 2014 Update
- USDA low-cost food plan, June, 2014
- Child Care in America 2014 State fact sheets
- 2013 Medical Expenditure Panel Survey Available
- Consumer Expenditure Survey
- Inflation Calculator

---------------------------

# Pre-Data Collection

Lets do all of our imports now:

In [118]:
import numpy as np
from prettytable import PrettyTable
from IPython.core.display import HTML
from collections import OrderedDict, defaultdict

Lets setup some inflation multipliers:

In [71]:
# Multiply a dollar value to get the equivalent 2014 dollars
inflation_multipliers = {
    2010: 1.092609, 
    2011: 1.059176,
    2012: 1.037701,
    2013: 1.022721
}

Lets setup regional differences for the food data:

In [70]:
# Multiply price of food by regional multipler to get better estimate of food costs
food_regional_multipliers = {
    'East': 0.08,
    'West': 0.11,
    'South': -0.07,
    'Midwest': -0.05,
}

------

#  Data Collection

The following sections will outline how I gathered the data for the various model parameters as well as other data we need to calculate their values. The original model was made for 2014 data and extending this data to the past means we need to be careful that any changes in the underlying data methodology of these parameters needs to be noted.

## Data Sources

### Consumer Expenditure Report

In [None]:
# Get CEX for 2013 and 2014 (XLSX format)
for i in `seq 2013 2014`; do wget http://www.bls.gov/cex/$i/aggregate/cusize.xlsx -O ${i}_cex.xlsx; done

# Get CEX for 2004 - 2012 (XLS format)
for i in `seq 2004 2012`; do wget http://www.bls.gov/cex/$i/aggregate/cusize.xls -O ${i}_cex.xls; done

# Get CEX for 2001 to 2003 (TXT format)
for i in `seq 2001 2003`; do wget http://www.bls.gov/cex/aggregate/$i/cusize.xls -O ${i}_cex.txt; done


# Get CEX for 2013 and 2014 (XLSX format)
for i in `seq 2013 2014`; do wget http://www.bls.gov/cex/$i/aggregate/region.xlsx -O ${i}_region_cex.xlsx; done

# Get CEX for 2004 - 2012 (XLS format)
for i in `seq 2004 2012`; do wget http://www.bls.gov/cex/$i/aggregate/region.xls -O ${i}_region_cex.xls; done

# Get CEX for 2001 to 2003 (TXT format)
for i in `seq 2001 2003`; do wget http://www.bls.gov/cex/aggregate/$i/region.xls -O ${i}_region_cex.txt; done

### USDA Food Plans

**TODO** Fill in with wget command used to download

## County Data

*TODO*

In [23]:
# Counties dict will map county ID to useful infomation, mostly region
counties = { }

## Model Variable: Food

### Change of Methodology?

In 2006, the data from the USDA changed the age ranges for their halthy meal cost calculations. The differences in range are minimal and should not effect overall estimations.

### Load Data

Data for the food calculations have been successfully downloaded in PDF form. The main way to calculate this is, from the PDF:

>Adult  food  consumption  costs  are  estimated  by  averaging  the  low - cost  plan  food  costs for  males  and  females  between  19  and  50

In [124]:
# The base food cost (not regionally weighed) for nation (data pulled manually from PDFs)
national_monthly_food_cost_per_year = {
    2015: {"base": np.average([240.90, 208.80])},
    2014: {"base": np.average([241.50, 209.80])},
    2013: {"base": np.average([234.60, 203.70])},
    2012: {"base": np.average([234.00, 203.00])},
    2011: {"base": np.average([226.80, 196.90])},
    2010: {"base": np.average([216.30, 187.70])},
    2009: {"base": np.average([216.50, 187.90])},
    2008: {"base": np.average([216.90, 189.60])},
    2007: {"base": np.average([200.20, 174.10])},
    2006: {"base": np.average([189.70, 164.80])},
    2005: {"base": np.average([186.20, 162.10])},
    2004: {"base": np.average([183.10, 159.50])},
    2003: {"base": np.average([174.20, 151.70])},
    2002: {"base": np.average([170.30, 148.60])},
    2001: {"base": np.average([166.80, 145.60])},
}

# Create ordered dict to make sure we process things in order
national_monthly_food_cost_per_year = OrderedDict(sorted(national_monthly_food_cost_per_year.items(), 
                                                        key=lambda t: t[0]))

# Regionally adjusted
for year in national_monthly_food_cost_per_year:
    national_monthly_food_cost_per_year[year]["regional"] = { }
    for region in food_regional_multipliers:
        national_monthly_food_cost_per_year[year]["regional"][region] = \
            national_monthly_food_cost_per_year[year]["base"] + (food_regional_multipliers[region] * national_monthly_food_cost_per_year[year]["base"])

# national_monthly_food_cost_per_year

# TODO: inflation adjusted

# # Print it nicely
# pt = PrettyTable()
# pt.add_column("Year", national_monthly_food_cost_per_year.keys())
# pt.add_column("Food Cost (per month)", [x["base"] for x in national_monthly_food_cost_per_year.values()])
# for region in food_regional_multipliers:
#     pt.add_column("Food Cost (%s)" % region, [x["regional"][region] for x in national_monthly_food_cost_per_year.values()])

# # Print as HTML
# HTML(pt.get_html_string())

In yearly form:

In [75]:
# Print it nicely in yearly costs
pt = PrettyTable()
pt.add_column("Year", national_monthly_food_cost_per_year.keys())
pt.add_column("Food Cost (per year)", [x["base"] * 12 for x in national_monthly_food_cost_per_year.values()])
for region in food_regional_multipliers:
    pt.add_column("Food Cost (%s)" % region, [x["regional"][region] * 12 for x in national_monthly_food_cost_per_year.values()])

# Print as HTML
HTML(pt.get_html_string())

Year,Food Cost (per year),Food Cost (West),Food Cost (East),Food Cost (Midwest),Food Cost (South)
2001,1874.4,2080.584,2024.352,1780.68,1743.192
2002,1913.4,2123.874,2066.472,1817.73,1779.462
2003,1955.4,2170.494,2111.832,1857.63,1818.522
2004,2055.6,2281.716,2220.048,1952.82,1911.708
2005,2089.8,2319.678,2256.984,1985.31,1943.514
2006,2127.0,2360.97,2297.16,2020.65,1978.11
2007,2245.8,2492.838,2425.464,2133.51,2088.594
2008,2439.0,2707.29,2634.12,2317.05,2268.27
2009,2426.4,2693.304,2620.512,2305.08,2256.552
2010,2424.0,2690.64,2617.92,2302.8,2254.32


## Model Variable: Transportation Cost

Looking at the (1) Cars and trucks (used), (2) gasoline and motor oil, (3) other vehicle expenses, and (4)  public  transportation fields under "Transportation" in the 2014 Consumer Expenditure Report, we can pull out information from each to model the claculation done in the original model. For each sub-variable, we get the amount of money (in millions) and the percentgae of that that single adults spend. After multiple those numbers (accounting for units) and dividiing by the total number of single adults in the survey gives us a mean total cost per adult.

The original model takes into account regional drift by scaling based on each regions

Since this data reflects conditions in 2013, we account for inflation to get the 2014 estimate that is produced in the original model.

#TODO:

* Download Regional Diffdata from 2014 Consumer Expenditure Survey, Table 1800 
* Load in both into pandas data frames
* Download data from 2001 to 2013
* Load in all data into previous pandas data frames

In [122]:
# Transportation data from 2014 survey is for year 2013
cex = {
    2013: {
        "single_adults": 37884.0,
        "transport": {
            "used_car": 214524.0,
            "gasoline": 313481.0,
            "other_vehicle": 345454.0,
            "public": 73842.0,
            "used_car_percent": 0.146,
            "gasoline_percent": 0.157,
            "other_vehicle_percent": 0.163,
            "public_percent": 0.172,
            "regional": {
                "east": 15.7 / 17.0,     # 0.923
                "midwest": 16.9 / 17.0,  # 0.994
                "south": 18.3 / 17.0,    # 1.076
                "west": 16.1 / 17.0,     # 0.947
            }
        }
    },
}

# Base price for transport
transportation_costs = defaultdict(dict)
transportation_costs[2013]["base"] = \
    (1000000 * ((cex[2013]["transport"]["used_car"] * cex[2013]["transport"]["used_car_percent"]) + \
                (cex[2013]["transport"]["gasoline"] * cex[2013]["transport"]["gasoline_percent"]) + \
                (cex[2013]["transport"]["other_vehicle"] * cex[2013]["transport"]["other_vehicle_percent"] ) + \
                (cex[2013]["transport"]["public"] * cex[2013]["transport"]["public_percent"] )) /  float(cex[2013]["single_adults"] * 1000) ) * inflation_multipliers[2013]

# Account for regional drift
for region in cex[2013]["transport"]["regional"]:
    transportation_costs[2013][region] = transportation_costs[2013]["base"] * cex[2013]["transport"]["regional"][region]

# Print it nicely
pt = PrettyTable()
pt.add_column("Year", transportation_costs.keys())
for region in sorted(transportation_costs[2013].keys()):
    pt.add_column("Trans Cost (%s)" % region, [ transportation_costs[year][region] for year in transportation_costs  ])

# Print as HTML
HTML(pt.get_html_string())

Year,Trans Cost (base),Trans Cost (east),Trans Cost (midwest),Trans Cost (south),Trans Cost (west)
2013,4037.18458744,3728.45870723,4013.43644281,4345.91046766,3823.45128575


In [111]:
used_car_rations = (2.5 / 3.2, 3.5 / 3.2, 3.5 / 3.2, 2.9 / 3.2)
gas_rations = (3.8 / 4.6, 4.7 / 4.6, 5.2 / 4.6, 4.5 / 4.6)
other_rations = (5.2 / 5.1, 5.0  / 5.1, 5.1 / 5.1,  5.1 / 5.1)
public_rations = (1.6/1.1,  0.9/1.1,  0.8/1.1, 1.2/1.1)

In [123]:
for region in range(4):
    print(1000000 * ((cex[2013]["transport"]["used_car"] * cex[2013]["transport"]["used_car_percent"] * used_car_rations[region]) + \
                (cex[2013]["transport"]["gasoline"] * cex[2013]["transport"]["gasoline_percent"] * gas_rations[region]) + \
                (cex[2013]["transport"]["other_vehicle"] * cex[2013]["transport"]["other_vehicle_percent"] * other_rations[region]) + \
                (cex[2013]["transport"]["public"] * cex[2013]["transport"]["public_percent"] * public_rations[region])) /  float(cex[2013]["single_adults"] * 1000) ) * inflation_multipliers[2013]


3806.81172532
4053.19012053
4196.24523506
3960.20242085


0.008732546062667405

## Model Variable: Child Care Cost

## Insurance Premiums & Health Care Costs

## Model Variable: Housing Cost

## Other Necessities Cost