## IS602 | Adv. Programming Techniques | Fall 2015
### Final Project
#### James Hamski | james.hamski@spsmail.cuny.edu

If you are under 40 years old, you expect your income to grow more in the coming year than older age groups. The median point prediction has ranged from a low in June of 2013 of 2.78% to a high of 4.64% in June of 2015. I find this survey fascinating. Who doesn’t think they’re going to get a raise in the next year? What income are they at? More specifically, what dollar amount does 4.64% equate to?

Using monthly data from the Federal Reserve Bank of New York’s Survey of Consumer Expectations – November 2015, I will investigate the above questions and if expectations of income growth have a statistically significant relationship with other economic indicators such as unemployment, job openings, and inflation expectations.  

### 1. Configuring Analysis Environment

*Module Imports*

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

ImportError: libSM.so.6: cannot open shared object file: No such file or directory

### 2. Data Import


Data for this project comes from the Federal Reserve Bank of New York’s (FRBNY) Survey of Consumer Expectations. 

These data are available in Excel format from the FRBNY's website. Pandas does allow for reading in Excel files directly from a URL using read_excel(). However, since these files are pretty large, I downloaded them to the AWS instance to speed future imports. This means using a Linux shell command via the Notebook by preceeding it with !. 

While I didn't precisely time it, the 60MB 'microdata' file download seems to be significantly faster on the AWS instance compared to downloading via 

In [None]:
!wget https://www.newyorkfed.org/medialibrary/Interactives/sce/sce/downloads/data/FRBNY-SCE-Public-Microdata-Complete.xlsx
!wget https://www.newyorkfed.org/medialibrary/interactives/sce/sce/downloads/data/FRBNY-SCE-Data.xls?version=2.1.3.9

In [4]:
#confirm the files appear in the active directory
!ls -l

total 23108
-rw-rw-r-- 1 ubuntu ubuntu   259072 Nov 18 15:40 FRBNY-SCE-Data.xls?version=2.1.3.9
-rw-rw-r-- 1 ubuntu ubuntu 23367769 Nov 18 15:41 FRBNY-SCE-Public-Microdata-Complete.xlsx
-rw-rw-r-- 1 ubuntu ubuntu    23567 Nov 28 22:40 IS602_FinalProject_JHamski.ipynb
-rw-rw-r-- 1 ubuntu ubuntu      118 Nov 28 20:11 README.md


In [17]:
data_excel_1 = 'FRBNY-SCE-Data.xls?version=2.1.3.9'
data_results = pd.read_excel(data_excel_1, 'Earnings growth', header=3, skip_rows=4)
data_results_demo = pd.read_excel(data_excel_1, 'Earnings growth Demo', header=3, skip_rows=4, index_col=0)

In [13]:
data_excel_2 = 'FRBNY-SCE-Public-Microdata-Complete.xlsx'
microdata = pd.read_excel(data_excel_2, 'Data', header=1, skip_rows=2, index_col=0)


### 3. Data Cleaning and Formatting

In [18]:
data_results.head()

Unnamed: 0,Median expected earnings growth,25th Percentile expected earnings growth,75th Percentile expected earnings growth,Median point prediction
201306,2.0,0.95,3.53,2.28
201307,2.0,1.0,3.02,2.38
201308,2.07,1.0,3.55,2.39
201309,2.0,1.0,3.52,2.2
201310,1.9,1.0,3.61,2.15


In [19]:
data_results.dtypes

Median expected earnings growth             float64
25th Percentile expected earnings growth    float64
75th Percentile expected earnings growth    float64
Median point prediction                     float64
dtype: object

In [9]:
data_results_demo.head()

Unnamed: 0,Age Under 40,Age 40-60,Age Over 60,Education High School or Less,Education Some College,Education BA or Higher,Income under 50k,Income 50-100k,Income Over 100k,Numeracy Low,Numeracy High,Region West,Region Midwest,Region South,Region Northeast
201306,3.0,1.46,1.29,1.54,1.71,2.31,1.28,2.11,2.28,2.0,2.0,1.55,2.5,1.54,1.95
201307,2.62,1.45,1.29,2.0,1.27,2.42,1.55,1.63,2.0,1.0,2.19,1.29,2.45,1.4,2.0
201308,2.91,1.94,1.29,2.0,2.15,2.38,2.03,2.0,2.45,2.0,2.19,2.03,2.37,2.15,2.0
201309,2.5,2.0,1.46,2.0,2.0,2.45,1.61,2.0,2.59,1.61,2.0,2.0,1.69,2.18,1.39
201310,2.45,1.6,1.6,1.55,1.46,2.42,1.46,1.58,2.5,1.55,2.0,2.25,1.87,1.75,1.75


In [12]:
microdata.head()

Unnamed: 0_level_0,date,tenure,weight,Q1,Q2,Q3,Q4new,Q5new,Q6new,Q8v2,...,Q47,D1,D3,DSAME,_AGE_CAT,_NUM_CAT,_REGION_CAT,_EDU_CAT,_HH_INC_CAT,_HH_INC_CAT.1
userid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
70000220,201306,6,16.327999,3,3,50,20,10,15,1,...,3,1,2,,Under 40,High,West,Some College,Under 50k,Under 50k
70000224,201306,7,0.228,4,4,25,10,25,75,1,...,11,1,2,,Over 60,High,Midwest,College,Over 100k,Over 100k
70000234,201306,6,4.066,4,3,3,9,20,20,1,...,9,1,2,,40 to 60,High,West,Some College,Over 100k,Over 100k
70000238,201306,6,3.035,3,3,0,10,5,70,1,...,4,1,2,,Over 60,Low,West,Some College,Under 50k,Under 50k
70000238,201307,7,1.867,3,3,50,90,0,60,1,...,5,1,2,,Over 60,Low,West,Some College,Under 50k,Under 50k


### 4. Exploratory Data Analysis

In [None]:
1+1

### 5. Do Salary Expectations Vary With Macroeconomic Indicators?

### Postscript

I also used Github for source control. This forced me to learn the Git command line functions instead of the desktop GUI. 