## IS602 | Adv. Programming Techniques | Fall 2015
### Final Project
#### James Hamski | james.hamski@spsmail.cuny.edu

If you are under 40 years old, you expect your income to grow more in the coming year than older age groups. The median point prediction has ranged from a low in June of 2013 of 2.78% to a high of 4.64% in June of 2015. I find this survey fascinating. What dollar amount does 4.64% equate to? Who doesn’t think they’re going to get a raise in the next year? What economic and demographic factors are important to this survey?

Using monthly data from the Federal Reserve Bank of New York’s Survey of Consumer Expectations – November 2015, I will investigate the above questions and if expectations of income growth have a statistically significant relationship with other economic indicators such as unemployment, job openings, and inflation expectations.  

### 1. Configuring Analysis Environment

*Module Imports*

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

### 2. Data Import


Data for this project comes from the Federal Reserve Bank of New York’s (FRBNY) Survey of Consumer Expectations. 

These data are available in Excel format from the FRBNY's website. Pandas does allow for reading in Excel files directly from a URL using read_excel(). However, since these files are pretty large, I downloaded them to the AWS instance to speed future imports. This means using a Linux shell command via the Notebook by preceeding it with !. 

While I didn't precisely time it, the 60MB 'microdata' file download seems to be significantly faster on the AWS instance compared to downloading via 

In [None]:
!rm FRBNY-SCE-Data.xls?version=2.1.3.9
!rm FRBNY-SCE-Public-Microdata-Complete.xlsx

In [None]:
!ls -lh

In [None]:
!wget https://www.newyorkfed.org/medialibrary/Interactives/sce/sce/downloads/data/FRBNY-SCE-Public-Microdata-Complete.xlsx
!wget https://www.newyorkfed.org/medialibrary/interactives/sce/sce/downloads/data/FRBNY-SCE-Data.xls?version=2.1.3.9

In [2]:
#confirm the files appear in the active directory
!ls -lh

total 26M
-rw-rw-r-- 1 ubuntu ubuntu 263K Dec  2 17:38 FRBNY-SCE-Data.xls?version=2.1.3.9
-rw-rw-r-- 1 ubuntu ubuntu  25M Dec 11 21:27 FRBNY-SCE-Public-Microdata-Complete.xlsx
-rw-rw-r-- 1 ubuntu ubuntu  13K Dec 18 21:21 IS602_FinalProject_JHamski.ipynb
-rw-rw-r-- 1 ubuntu ubuntu  118 Nov 28 20:11 README.md
-rw-rw-r-- 1 ubuntu ubuntu  18K Dec  6 16:23 test.png


In [None]:
data_excel_1 = 'FRBNY-SCE-Data.xls?version=2.1.3.9'
#headers1 = ['month', 'median_exp_growth', '25th_exp_growth', '75th_exp_growth', 'median_point_prediction']
data_results = pd.read_excel(data_excel_1, 'Earnings growth', header=3, skip_rows=4, parse_dates=True, index_col=None)
data_results_demo = pd.read_excel(data_excel_1, 'Earnings growth Demo', header=3, skip_rows=4, index_col=None)

In [None]:
data_excel_2 = 'FRBNY-SCE-Public-Microdata-Complete.xlsx'
microdata = pd.read_excel(data_excel_2, 'Data', header=1, skip_rows=2)

### 3. Data Cleaning and Formatting

First, I reconfigure the data_results dataframe object. This shows summary statistics for the expected income growth survey results (median, 2th and 75th, median point prediction). 

In [None]:
data_results.reset_index(level=0, inplace=True)
data_results.rename(columns = {'index':'Month'}, inplace = True)
data_results['Month'] = pd.to_datetime(data_results['Month'], errors='coerce', format='%Y%m')
data_results.head()

In [None]:
data_results.dtypes

In [None]:
data_results_demo.reset_index(level=0, inplace=True)
data_results_demo.rename(columns = {'index':'Month'}, inplace = True)
data_results_demo['Month'] = pd.to_datetime(data_results_demo['Month'], errors='coerce', format='%Y%m')
data_results_demo.head()

In [None]:
microdata['date'] = pd.to_datetime(microdata['date'], errors='coerce', format='%Y%m')
microdata.rename(columns = {'date':'Month'}, inplace = True)
microdata.head()

### 4. Exploratory Data Analysis

Median expected earnings growth  
Respondents who report working full time, part time, being temporarily laid off or on sick leave are
asked for the percent chance that 12 months from now their earnings, before taxes and deductions,
will have increased (decreased) by 12% or more; by 8% to 12%; by 4% to 8%; by 2% to 4%; by 0% to
2% (assuming that 12 months from now they are working in the exact same job at the same place
they currently work, and working the exact same number of hours). A generalized beta distribution is
fitted to the responses of each survey participant and the mean of this distribution is calculated. We
call this mean the respondent’s “expected earnings growth”. Variable 1 is the median across all
respondents of their expected earnings growth rates. 

Median point prediction  
Respondents who report working full time, part time, being temporarily laid off or on sick leave are
asked by how much they expect their earnings to have increased/decreased 12 months from now
(assuming that 12 months from now they are working in the exact same job at the same place they
currently work, and working the exact same number of hours). This is a point prediction (a singlevalue
forecast). Variable 3 is the median across all respondents of their point predictions. Given that
almost all respondents, while asked about continuous variables, provide integer responses, throughout in computing medians based on point predictions we treat the responses as rounded
grouped data and compute linearly interpolated medians.

In [None]:
data_results['75th Percentile expected earnings growth'].plot()

In [None]:
n_bins = 10

fig, axes = plt.subplots(nrows=2, ncols=2)
ax0, ax1, ax2, ax3 = axes.flat

ax0.hist(data_results['Median expected earnings growth'], n_bins, normed=1, histtype='bar')
ax0.set_title('Median expected earnings growth')

ax1.hist(data_results['25th Percentile expected earnings growth'], n_bins, normed=1, histtype='bar')
ax1.set_title('25th Percentile expected earnings growth')

ax2.hist(data_results['75th Percentile expected earnings growth'], n_bins, normed=1, histtype='bar')
ax2.set_title('75th Percentile expected earnings growth')

ax3.hist(data_results['Median point prediction'], n_bins, normed=1, histtype='bar')
ax3.set_title('Median point prediction')

plt.tight_layout()
plt.show()

### 5. Do Salary Expectations Vary With Macroeconomic Indicators?

### Postscript

I also used Github for source control. This forced me to learn the Git command line functions instead of the desktop GUI. 

### GitHub

In [4]:
!git add IS602_FinalProject_JHamski.ipynb
#!git commit -m "More data import and configuration, basic EDA started"
#!git push origin master

In [3]:
!git status

On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	[31mmodified:   IS602_FinalProject_JHamski.ipynb[m

Untracked files:
  (use "git add <file>..." to include in what will be committed)

	[31m.ipynb_checkpoints/[m
	[31mFRBNY-SCE-Data.xls?version=2.1.3.9[m
	[31mFRBNY-SCE-Public-Microdata-Complete.xlsx[m
	[31mtest.png[m

no changes added to commit (use "git add" and/or "git commit -a")
