In [2]:
import sys
import json
from pathlib import Path
from dateutil import parser
from math import pi

import requests
from shapely.geometry import shape, Point
import geopandas as gpd
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score


DIR = Path('..')
sys.path.append(str(DIR))

DATA_DIR = DIR/'data/'
OUT_DIR = DIR/'output/'

%load_ext autoreload
%autoreload 2

# Chapter 1

## Structure of Economic Data

Cross-Sectional Data
A cross-sectional data set consists of a sample of individuals, households, firms, cities,states, countries, or a variety of other units, taken at a given point in time.
An important feature of cross-sectional data is that we can often assume that they have been obtained by random sampling from the underlying population.

A time series data set consists of observations on a variable or several variables over time. Another feature of time series data that can require special attention is the data frequency
at which the data are collected. In economics, the most common frequencies are daily, weekly, monthly, quarterly, and annually.

Some data sets have both cross-sectional and time series features. To increase our sample size, we can form a pooled cross section by combining the two years. Pooling cross sections from different years is often an effective way of analyzing the effects of a new government policy.

A panel data (or longitudinal data) set consists of a time series for each cross-sectional member in the data set.The key feature of panel data that distinguishes them from a pooled cross section isthat the same cross-sectional units (individuals, firms, or counties in the preceding examples) are followed over a given time period.

## Causality and the Notion of Ceteris Paribus in Econometric Analysis

In most tests of economic theory, and certainly for evaluating public policy, the economist’s goal is to infer that one variable (such as education) has a causal effect on another variable (such as worker productivity). Simply finding an association between two or more variables might be suggestive, but unless causality can be established, it is rarely
compelling.
The notion of ceteris paribus—which means “other (relevant) factors being equal”— plays an important role in causal analysis.

# Chapter 2

## Simple Regression Model

The following equation defines the simple linear regression model.The variable *u*, called the error term or disturbance in the relationship, represents factors other than x that affect y. A simple regression analysis effectively treats all factors affecting y other than x as being unobserved.

\begin{equation*}
y= \beta_0+\beta_1x+u \\
E(y|x)= \beta_0+\beta_1x \\
E(u)=E(u|x)=0 
\end{equation*}

In [9]:
from sklearn import datasets ## imports datasets from scikit-learn
data = datasets.load_boston() ## loads Boston dataset from datasets library
import numpy as np
import pandas as pd
# define the data/predictors as the pre-set feature names  
df = pd.DataFrame(data.data, columns=data.feature_names)

# Put the target (housing value -- MEDV) in another DataFrame
target = pd.DataFrame(data.target, columns=["MEDV"])


In [10]:
from sklearn import datasets ## imports datasets from scikit-learn
data = datasets.load_boston() ## loads Boston dataset from datasets library

import statsmodels.api as sm


X = df["RM"]
y = target["MEDV"]

# Note the difference in argument order
model = sm.OLS(y, X).fit()
predictions = model.predict(X) # make the predictions by the model

# Print out the statistics
model.summary()

0,1,2,3
Dep. Variable:,MEDV,R-squared:,0.901
Model:,OLS,Adj. R-squared:,0.901
Method:,Least Squares,F-statistic:,4615.0
Date:,"Wed, 07 Mar 2018",Prob (F-statistic):,3.7399999999999996e-256
Time:,17:13:11,Log-Likelihood:,-1747.1
No. Observations:,506,AIC:,3496.0
Df Residuals:,505,BIC:,3500.0
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
RM,3.6534,0.054,67.930,0.000,3.548,3.759

0,1,2,3
Omnibus:,83.295,Durbin-Watson:,0.493
Prob(Omnibus):,0.0,Jarque-Bera (JB):,152.507
Skew:,0.955,Prob(JB):,7.649999999999999e-34
Kurtosis:,4.894,Cond. No.,1.0
