## Hard-Coding the OLS estimator with NumPy and SciPy

Now we have some familarity with NumPy and SciPy, let's look at some examples of using them in practice.

The first one will be probably my 'favourite' example - in my 4th year undergrad metrics class we computed OLS by hand in MATLAB using the dataset from Mankiw, Romer and Weil. We will do this in Python. 

In [1]:
import numpy as np
import pandas as pd
from scipy import linalg as la

## Import the data

The data are in STATA native format, so we import with Pandas (we discuss this whole thing later)

In [2]:
data = pd.read_stata('../data/mrw1992.dta')
data.head()

Unnamed: 0,c_index,c_name,c_code,cont,nonoil,inter,oecd,gdp60,gdp85,popgrowth,igdp,school
0,1,Algeria,DZA,Africa,Non-oil,Intermediate Sample,Non-OECD,2485.0,4371.0,2.6,24.1,4.5
1,2,Angola,AGO,Africa,Non-oil,Not intermediate sample,Non-OECD,1588.0,1171.0,2.1,5.8,1.8
2,3,Benin,BEN,Africa,Non-oil,Not intermediate sample,Non-OECD,1116.0,1071.0,2.4,10.8,1.8
3,4,Botswana,BWA,Africa,Non-oil,Intermediate Sample,Non-OECD,959.0,3671.0,3.2,28.299999,2.9
4,5,Burkina Faso,BFA,Africa,Non-oil,Not intermediate sample,Non-OECD,529.0,857.0,0.9,12.7,0.4


Question: Why couldnt we import that as a NumPy array?

In [3]:
# subset OECD data
oecd = data[data['oecd']=='OECD']
oecd.head()

Unnamed: 0,c_index,c_name,c_code,cont,nonoil,inter,oecd,gdp60,gdp85,popgrowth,igdp,school
52,53,Japan,JPN,Asia,Non-oil,Intermediate Sample,OECD,3493.0,13893.0,1.2,36.0,10.9
69,70,Austria,AUT,Europe,Non-oil,Intermediate Sample,OECD,5939.0,13327.0,0.4,23.4,8.0
70,71,Belgium,BEL,Europe,Non-oil,Intermediate Sample,OECD,6789.0,14290.0,0.5,23.4,9.3
72,73,Denmark,DNK,Europe,Non-oil,Intermediate Sample,OECD,8551.0,16491.0,0.6,26.6,10.7
73,74,Finland,FIN,Europe,Non-oil,Intermediate Sample,OECD,6527.0,13779.0,0.7,36.900002,11.5


Now we want to generate some dependent and independent variables...

* MRW use 'growth' as the dependent variable
* Independent variables are 'popgrowth', 'igdp', 'school' and 'lngdp60'