In [1]:
import numpy as np

# Project 9 - Working With OLS

Having built statistics functions, we are now ready to build a function for regression analysis. We will start by building the an regression. We will use linear algebra to estimate parameters that minimize the sum of the squared errors. This is an ordinary least squares regression. 

An OLS regression with one exogenous variable takes the form. 

$y = \alpha + \beta_1x_1 + \mu $

$\beta_0 = \alpha + \mu$

We merge the error term, which represents bias in the data, with alpha to yield the constant, $\beta_0$. This is necessary since OLS assumes an unbiased estimator where:

$\sum_{i=0}^{n-1} e_{i}=0$

Each estimate of a point created from a particular observation takes the form.

$y_i = \beta_0 + \beta_1x_{1,i} + e_i$

This can be generalized to include k exogenous variables:

$y_i = \beta_0 + (\sum_{j=1}^{k} \beta_jx_{i,j}) + e_i$

Ideally, we want to form a prediction where, on average, the right-hand side of the equation  yields the correct value on the left-hand side. When we perform an OLS regression, we form a predictor that minimizes the sum of the distance between each predicted value and the observed value drawn from the data. For example, if the prediction for a particular value of y is 8, and the actual value is 10, the error of the prediction is -2 and the squared error is 4.

To find the function that minimizes the sum squared errors, we will use matrix algebra, also known as linear algebra. For those unfamiliar, the next section uses the numpy library to perform matrix operations. For clarity, we will review the linear algebra functions that we will use with simple examples.

## Linear Algebra for OLS

We solve the following function for a vector of beta values ($\beta$), constants whose values represent estimates of the effect of variables in the set **_X_** on the selected endogenously generate variable $y$. The matrix **_X_** also includes a vector of ones used to estimate the constant $\beta_0$.

$\beta = (X'X)^{-1}X'Y$

$Y =$ Observations for Endogenous Variable

$X =$ Observations for Exogenous Variables

$X' =$ $X$-transpose

$(X'X)^{-1} =$ Inverse of $X'X$

### Inverting a Matrix

In reviewing the linear equation for estimating $\beta$, we confront two unique operations worth understanding. Included in these are some key concepts in linear algebra, including the identity matrix $I$ and linear independence. The best way to understand these concepts is by working with some sample vectors. Consider the matrix $X$ consisting of vectors $x_0$,$x_1$,…,$x_{n-1}$,$x_n$. We must check that these vectors are linearly independent. We do this by joining $X$ with an identity matrix and thus create:

$A = [XI]$

We transform this to show that the product of $A$ and $X^{-1}$ is equal to the product of and an identity matrix, $I$ and $X^{-1}$

$AX^{-1} = [XI]X^{-1}$

$AX^{-1} = [IX^{-1}]$

Let us solve for $AX^{-1}$ using the following vectors for $X$. 

$\begin{equation*}
X = \begin{bmatrix}
1 & 2 & 1 \\
4 & 1 & 5 \\
6 & 8 & 6
\end{bmatrix}
\end{equation*}$

Concatenate a 3 X 3 identity matrix on the left of $X$:

$\begin{equation*}
I = \begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{bmatrix}
\end{equation*}$

$\begin{equation*}
[XI] = \begin{bmatrix}
1 & 2 & 1 & 1 & 0 & 0 \\
4 & 1 & 5 & 0 & 1 & 0 \\
6 & 8 & 6 & 0 & 0 & 1
\end{bmatrix}
\end{equation*}$

If we perform row operations on $A$ to transform $X$ in $[XI]$ into $I$, then we $I$ will be transformed into $X^{-1}$:

$\begin{equation*}
[XI] = \begin{bmatrix}
1 & 2 & 1 & 1 & 0 & 0 \\
4 & 1 & 5 & 0 & 1 & 0 \\
6 & 8 & 6 & 0 & 0 & 1
\end{bmatrix}
\end{equation*}$




$\begin{equation*}
r_2 - 4r_1:\begin{bmatrix}
1 & 2 & 1 & 1 & 0 & 0 \\
0 & -7 & 1 & -4 & 1 & 0 \\
6 & 8 & 6 & 0 & 0 & 1
\end{bmatrix}
\end{equation*}$


$\begin{equation*}
r_3 - 6r_1:\begin{bmatrix}
1 & 2 & 1 & 1 & 0 & 0 \\
0 & -7 & 1 & -4 & 1 & 0 \\
0 & -4 & 0 & -6 & 0 & 1
\end{bmatrix}
\end{equation*}$


$\begin{equation*}
r_2 \leftrightarrow r_3:\begin{bmatrix}
1 & 2 & 1 & 1 & 0 & 0 \\
0 & -4 & 0 & -6 & 0 & 1\\
0 & -7 & 1 & -4 & 1 & 0 
\end{bmatrix}
\end{equation*}$

$\begin{equation*}
r_2/{-4}:\begin{bmatrix}
1 & 2 & 1 & 1 & 0 & 0 \\
0 & 1 & 0 & 3/2 & 0 & -1/4\\
0 & -7 & 1 & -4 & 1 & 0 
\end{bmatrix}
\end{equation*}$

$\begin{equation*}
r_3 + 7r_2:\begin{bmatrix}
1 & 2 & 1 & 1 & 0 & 0 \\
0 & 1 & 0 & 3/2 & 0 & -1/4\\
0 & 0 & 1 & 13/2 & 1 & -7/4 
\end{bmatrix}
\end{equation*}$

$\begin{equation*}
r_1 + -2r_2 - r_3:\begin{bmatrix}
1 & 0 & 0 & -17/2 & -1 & 9/4 \\
0 & 1 & 0 & 3/2 & 0 & -1/4\\
0 & 0 & 1 & 13/2 & 1 & -7/4 
\end{bmatrix}
\end{equation*}$

$\begin{equation*}
IX^{-1}=\begin{bmatrix}
1 & 0 & 0 & -8.5 & -1 & 2.25 \\
0 & 1 & 0 & 1.5 & 0 & -0.25\\
0 & 0 & 1 & 6.5 & 1 & -1.75 
\end{bmatrix}
\end{equation*}$

$\begin{equation*}
X^{-1}=\begin{bmatrix}
-8.5 & -1 & 2.25 \\
1.5 & 0 & -0.25\\
6.5 & 1 & -1.75 
\end{bmatrix}
\end{equation*}$

By transforming $X$ in matrix $XI$ into an identity matrix, we transform the $I$ matrix into $X^{-1}$. This also confirms that the vectors comprising X are independent, meaning that one vector in the set comprising $X$ cannot be formed from the combination and or transformation of the others. A fundamental assumption of regression analysis is that data generated from factors believed to determine the y-values are independent of one another.

In [2]:
x1 = np.array([1,2,1])
x2 = np.array([4,1,5])
x3 = np.array([6,8,6])
print(x1,x2,x3, sep="\n")

[1 2 1]
[4 1 5]
[6 8 6]


In [3]:
x1 = np.matrix(x1)
x2 = np.matrix(x2)
x3 = np.matrix(x3)
print(x1,x2,x3, sep="\n")

[[1 2 1]]
[[4 1 5]]
[[6 8 6]]


In [4]:
X = np.concatenate((x1, x2, x3), axis = 0)
X

matrix([[1, 2, 1],
        [4, 1, 5],
        [6, 8, 6]])

In [5]:
X_inverse = X.getI()
X_inverse

matrix([[-8.50000000e+00, -1.00000000e+00,  2.25000000e+00],
        [ 1.50000000e+00,  5.12410627e-17, -2.50000000e-01],
        [ 6.50000000e+00,  1.00000000e+00, -1.75000000e+00]])

In [6]:
X_inverse = np.round(X.getI(), 2)
X_inverse

array([[-8.5 , -1.  ,  2.25],
       [ 1.5 ,  0.  , -0.25],
       [ 6.5 ,  1.  , -1.75]])

In [7]:
X_transpose = X.getT()
X_transpose

matrix([[1, 4, 6],
        [2, 1, 8],
        [1, 5, 6]])

## Regression Function

Now that we have learned the necessary operations, we can understand the operations of the regression function. If you would like to build your own regression module, reconstruct the scripts form Chapter 7. In this lesson, we will use the statsmodels OLS method to reconstruct and compare statistics from an OLS regression. 

Recall that we estimate the vector of beta parameters for each variable with the equation:

$\beta = (X'X)^{-1}X'Y$

Each estimated $\beta$ value is multiplied by each observation of the relevant exogenous variable estimate the effect of the value on the endogenous, $Y$, value.

We will run a regression In order to estimate the parameters, we will need to import data, define the dependent variable and independent variables, and transform these into matrix objects. 

Let's use the data from chapter 6 with the addition real GDP per capita. This combined set of data is saved in the repository as a file created in chapter 8.

In [8]:
import pandas as pd
mgdp = pd.read_excel("https://www.rug.nl/ggdc/historicaldevelopment/maddison/data/mpd2020.xlsx", 
                   index_col = [0,2],
                   parse_dates = True, 
                    sheet_name = "Full data")
mgdp

Unnamed: 0_level_0,Unnamed: 1_level_0,country,gdppc,pop
countrycode,year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
AFG,1820,Afghanistan,,3280.00000
AFG,1870,Afghanistan,,4207.00000
AFG,1913,Afghanistan,,5730.00000
AFG,1950,Afghanistan,1156.0000,8150.00000
AFG,1951,Afghanistan,1170.0000,8284.00000
...,...,...,...,...
ZWE,2014,Zimbabwe,1594.0000,13313.99205
ZWE,2015,Zimbabwe,1560.0000,13479.13812
ZWE,2016,Zimbabwe,1534.0000,13664.79457
ZWE,2017,Zimbabwe,1582.3662,13870.26413


In [9]:
filename = "efotw-2022-master-index-data-for-researchers-iso.xlsx"
data = pd.read_excel(filename, 
                     index_col = [2,0], 
                     header = [0],
                     sheet_name = "EFW Panel Data 2022 Report")
data

Unnamed: 0_level_0,Unnamed: 1_level_0,ISO_Code_2,World Bank Region,"World Bank Current Income Classification, 1990-present (L=Low income, LM=Lower middle income, UM=Upper middle income, H=High income)",Countries,Panel Data Summary Index,Area 1,Area 2,Area 3,Area 4,Area 5,Standard Deviation of the 5 EFW Areas
ISO_Code_3,Year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
ALB,2020,AL,Europe & Central Asia,UM,Albania,7.640000,7.817077,5.260351,9.788269,8.222499,7.112958,1.652742
DZA,2020,DZ,Middle East & North Africa,LM,Algeria,5.120000,4.409943,4.131760,7.630287,3.639507,5.778953,1.613103
AGO,2020,AO,Sub-Saharan Africa,LM,Angola,5.910000,8.133385,3.705161,6.087996,5.373190,6.227545,1.598854
ARG,2020,AR,Latin America & the Caribbean,UM,Argentina,4.870000,6.483768,4.796454,4.516018,3.086907,5.490538,1.254924
ARM,2020,AM,Europe & Central Asia,UM,Armenia,7.840000,7.975292,6.236215,9.553009,7.692708,7.756333,1.178292
...,...,...,...,...,...,...,...,...,...,...,...,...
VEN,1970,VE,Latin America & the Caribbean,,"Venezuela, RB",7.242943,8.349529,5.003088,9.621851,7.895993,5.209592,2.028426
VNM,1970,VN,East Asia & Pacific,,Vietnam,,,,,,,
YEM,1970,YE,Middle East & North Africa,,"Yemen, Rep.",,,,,,,
ZMB,1970,ZM,Sub-Saharan Africa,,Zambia,4.498763,5.374545,4.472812,5.137395,,5.307952,0.412514


In [10]:
rename = {"Panel Data Summary Index": "Summary",
         "Area 1":"Size of Government",
         "Area 2":"Legal System and Property Rights",
         "Area 3":"Sound Money",
         "Area 4":"Freedom to Trade Internationally",
         "Area 5":"Regulation"}
data = data.dropna(how="all", axis = 1).rename(columns = rename)
data

Unnamed: 0_level_0,Unnamed: 1_level_0,ISO_Code_2,World Bank Region,"World Bank Current Income Classification, 1990-present (L=Low income, LM=Lower middle income, UM=Upper middle income, H=High income)",Countries,Summary,Size of Government,Legal System and Property Rights,Sound Money,Freedom to Trade Internationally,Regulation,Standard Deviation of the 5 EFW Areas
ISO_Code_3,Year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
ALB,2020,AL,Europe & Central Asia,UM,Albania,7.640000,7.817077,5.260351,9.788269,8.222499,7.112958,1.652742
DZA,2020,DZ,Middle East & North Africa,LM,Algeria,5.120000,4.409943,4.131760,7.630287,3.639507,5.778953,1.613103
AGO,2020,AO,Sub-Saharan Africa,LM,Angola,5.910000,8.133385,3.705161,6.087996,5.373190,6.227545,1.598854
ARG,2020,AR,Latin America & the Caribbean,UM,Argentina,4.870000,6.483768,4.796454,4.516018,3.086907,5.490538,1.254924
ARM,2020,AM,Europe & Central Asia,UM,Armenia,7.840000,7.975292,6.236215,9.553009,7.692708,7.756333,1.178292
...,...,...,...,...,...,...,...,...,...,...,...,...
VEN,1970,VE,Latin America & the Caribbean,,"Venezuela, RB",7.242943,8.349529,5.003088,9.621851,7.895993,5.209592,2.028426
VNM,1970,VN,East Asia & Pacific,,Vietnam,,,,,,,
YEM,1970,YE,Middle East & North Africa,,"Yemen, Rep.",,,,,,,
ZMB,1970,ZM,Sub-Saharan Africa,,Zambia,4.498763,5.374545,4.472812,5.137395,,5.307952,0.412514


In [11]:
data["RGDP Per Capita"] = mgdp["gdppc"]
data

Unnamed: 0_level_0,Unnamed: 1_level_0,ISO_Code_2,World Bank Region,"World Bank Current Income Classification, 1990-present (L=Low income, LM=Lower middle income, UM=Upper middle income, H=High income)",Countries,Summary,Size of Government,Legal System and Property Rights,Sound Money,Freedom to Trade Internationally,Regulation,Standard Deviation of the 5 EFW Areas,RGDP Per Capita
ISO_Code_3,Year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
ALB,2020,AL,Europe & Central Asia,UM,Albania,7.640000,7.817077,5.260351,9.788269,8.222499,7.112958,1.652742,
DZA,2020,DZ,Middle East & North Africa,LM,Algeria,5.120000,4.409943,4.131760,7.630287,3.639507,5.778953,1.613103,
AGO,2020,AO,Sub-Saharan Africa,LM,Angola,5.910000,8.133385,3.705161,6.087996,5.373190,6.227545,1.598854,
ARG,2020,AR,Latin America & the Caribbean,UM,Argentina,4.870000,6.483768,4.796454,4.516018,3.086907,5.490538,1.254924,
ARM,2020,AM,Europe & Central Asia,UM,Armenia,7.840000,7.975292,6.236215,9.553009,7.692708,7.756333,1.178292,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
VEN,1970,VE,Latin America & the Caribbean,,"Venezuela, RB",7.242943,8.349529,5.003088,9.621851,7.895993,5.209592,2.028426,15289.0
VNM,1970,VN,East Asia & Pacific,,Vietnam,,,,,,,,1172.0
YEM,1970,YE,Middle East & North Africa,,"Yemen, Rep.",,,,,,,,1961.0
ZMB,1970,ZM,Sub-Saharan Africa,,Zambia,4.498763,5.374545,4.472812,5.137395,,5.307952,0.412514,1710.0


In [12]:
del data['Standard Deviation of the 5 EFW Areas']
data

Unnamed: 0_level_0,Unnamed: 1_level_0,ISO_Code_2,World Bank Region,"World Bank Current Income Classification, 1990-present (L=Low income, LM=Lower middle income, UM=Upper middle income, H=High income)",Countries,Summary,Size of Government,Legal System and Property Rights,Sound Money,Freedom to Trade Internationally,Regulation,RGDP Per Capita
ISO_Code_3,Year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
ALB,2020,AL,Europe & Central Asia,UM,Albania,7.640000,7.817077,5.260351,9.788269,8.222499,7.112958,
DZA,2020,DZ,Middle East & North Africa,LM,Algeria,5.120000,4.409943,4.131760,7.630287,3.639507,5.778953,
AGO,2020,AO,Sub-Saharan Africa,LM,Angola,5.910000,8.133385,3.705161,6.087996,5.373190,6.227545,
ARG,2020,AR,Latin America & the Caribbean,UM,Argentina,4.870000,6.483768,4.796454,4.516018,3.086907,5.490538,
ARM,2020,AM,Europe & Central Asia,UM,Armenia,7.840000,7.975292,6.236215,9.553009,7.692708,7.756333,
...,...,...,...,...,...,...,...,...,...,...,...,...
VEN,1970,VE,Latin America & the Caribbean,,"Venezuela, RB",7.242943,8.349529,5.003088,9.621851,7.895993,5.209592,15289.0
VNM,1970,VN,East Asia & Pacific,,Vietnam,,,,,,,1172.0
YEM,1970,YE,Middle East & North Africa,,"Yemen, Rep.",,,,,,,1961.0
ZMB,1970,ZM,Sub-Saharan Africa,,Zambia,4.498763,5.374545,4.472812,5.137395,,5.307952,1710.0


In [13]:
!pip install xlwt
# save to file. We will need to reimport for the homework question
data.to_excel("EFWAndRGDP.xls")



  data.to_excel("EFWAndRGDP.xls")


In [14]:
data.keys()

Index(['ISO_Code_2', 'World Bank Region',
       'World Bank Current Income Classification, 1990-present (L=Low income, LM=Lower middle income, UM=Upper middle income, H=High income)',
       'Countries', 'Summary', 'Size of Government',
       'Legal System and Property Rights', 'Sound Money',
       'Freedom to Trade Internationally', 'Regulation', 'RGDP Per Capita'],
      dtype='object')

In [15]:
data = data[data.keys()[3:]]
data

Unnamed: 0_level_0,Unnamed: 1_level_0,Countries,Summary,Size of Government,Legal System and Property Rights,Sound Money,Freedom to Trade Internationally,Regulation,RGDP Per Capita
ISO_Code_3,Year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
ALB,2020,Albania,7.640000,7.817077,5.260351,9.788269,8.222499,7.112958,
DZA,2020,Algeria,5.120000,4.409943,4.131760,7.630287,3.639507,5.778953,
AGO,2020,Angola,5.910000,8.133385,3.705161,6.087996,5.373190,6.227545,
ARG,2020,Argentina,4.870000,6.483768,4.796454,4.516018,3.086907,5.490538,
ARM,2020,Armenia,7.840000,7.975292,6.236215,9.553009,7.692708,7.756333,
...,...,...,...,...,...,...,...,...,...
VEN,1970,"Venezuela, RB",7.242943,8.349529,5.003088,9.621851,7.895993,5.209592,15289.0
VNM,1970,Vietnam,,,,,,,1172.0
YEM,1970,"Yemen, Rep.",,,,,,,1961.0
ZMB,1970,Zambia,4.498763,5.374545,4.472812,5.137395,,5.307952,1710.0


In [16]:
data.sort_index(inplace = True)
data

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data.sort_index(inplace = True)


Unnamed: 0_level_0,Unnamed: 1_level_0,Countries,Summary,Size of Government,Legal System and Property Rights,Sound Money,Freedom to Trade Internationally,Regulation,RGDP Per Capita
ISO_Code_3,Year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
AGO,1970,Angola,,,,,,,2818.0000
AGO,1975,Angola,,,,,,,1710.0000
AGO,1980,Angola,,,,,,,1532.0000
AGO,1985,Angola,,,,,,,1242.0000
AGO,1990,Angola,,,,,,,1384.0000
...,...,...,...,...,...,...,...,...,...
ZWE,2016,Zimbabwe,6.121996,5.332597,4.056407,8.086016,6.404937,6.520805,1534.0000
ZWE,2017,Zimbabwe,5.599886,4.699843,4.071445,7.983888,4.503965,6.399757,1582.3662
ZWE,2018,Zimbabwe,5.876298,5.170946,4.041897,7.312324,6.396649,6.303135,1611.4052
ZWE,2019,Zimbabwe,4.719465,5.628359,4.026568,1.413372,6.397045,6.132583,


In [17]:
reg_vars = list(data.keys())
reg_vars

['Countries',
 'Summary',
 'Size of Government',
 'Legal System and Property Rights',
 'Sound Money',
 'Freedom to Trade Internationally',
 'Regulation',
 'RGDP Per Capita']

In [20]:
y_var = [reg_vars[-1]]
x_vars = reg_vars[2:-1]
y_var, x_vars

(['RGDP Per Capita'],
 ['Size of Government',
  'Legal System and Property Rights',
  'Sound Money',
  'Freedom to Trade Internationally',
  'Regulation'])

In [24]:
reg_data = data[reg_vars].dropna()

In [21]:
import statsmodels.api as sm

In [25]:
y = reg_data[y_var]
x = reg_data[x_vars]
x["Constant"] = 1
x

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  x["Constant"] = 1


Unnamed: 0_level_0,Unnamed: 1_level_0,Size of Government,Legal System and Property Rights,Sound Money,Freedom to Trade Internationally,Regulation,Constant
ISO_Code_3,Year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
AGO,2005,6.886311,3.129619,1.270081,5.356979,4.511067,1
AGO,2006,5.162277,3.238314,3.807267,5.302944,5.118114,1
AGO,2007,4.963676,3.224507,4.015297,5.139768,5.348260,1
AGO,2008,4.715589,3.382642,4.653201,5.181950,5.185843,1
AGO,2009,7.455501,3.394515,4.901540,5.503538,5.007256,1
...,...,...,...,...,...,...,...
ZWE,2014,6.771807,3.930143,7.664303,6.398692,5.039824,1
ZWE,2015,6.964753,4.108142,7.859669,6.509231,6.555970,1
ZWE,2016,5.332597,4.056407,8.086016,6.404937,6.520805,1
ZWE,2017,4.699843,4.071445,7.983888,4.503965,6.399757,1


In [26]:
results = sm.OLS(y, x).fit()

In [27]:
results.summary()

0,1,2,3
Dep. Variable:,RGDP Per Capita,R-squared:,0.486
Model:,OLS,Adj. R-squared:,0.485
Method:,Least Squares,F-statistic:,593.5
Date:,"Tue, 11 Apr 2023",Prob (F-statistic):,0.0
Time:,12:25:03,Log-Likelihood:,-34081.0
No. Observations:,3145,AIC:,68170.0
Df Residuals:,3139,BIC:,68210.0
Df Model:,5,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Size of Government,-2752.2138,202.274,-13.606,0.000,-3148.817,-2355.611
Legal System and Property Rights,3966.0733,196.152,20.219,0.000,3581.474,4350.672
Sound Money,902.3584,177.099,5.095,0.000,555.117,1249.599
Freedom to Trade Internationally,1279.8725,211.796,6.043,0.000,864.601,1695.144
Regulation,2141.0305,281.044,7.618,0.000,1589.982,2692.079
Constant,-1.66e+04,1627.397,-10.197,0.000,-1.98e+04,-1.34e+04

0,1,2,3
Omnibus:,2952.722,Durbin-Watson:,0.174
Prob(Omnibus):,0.0,Jarque-Bera (JB):,189244.77
Skew:,4.324,Prob(JB):,0.0
Kurtosis:,40.005,Cond. No.,113.0


In [28]:
predictor = results.predict()
reg_data[y_var[0] + "Predictor"] = predictor
reg_data

Unnamed: 0_level_0,Unnamed: 1_level_0,Countries,Summary,Size of Government,Legal System and Property Rights,Sound Money,Freedom to Trade Internationally,Regulation,RGDP Per Capita,RGDP Per CapitaPredictor
ISO_Code_3,Year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
AGO,2005,Angola,4.214590,6.886311,3.129619,1.270081,5.356979,4.511067,3708.7706,-5474.902171
AGO,2006,Angola,4.531179,5.162277,3.238314,3.807267,5.302944,5.118114,4592.3373,3221.099672
AGO,2007,Angola,4.550966,4.963676,3.224507,4.015297,5.139768,5.348260,5773.5483,4184.555105
AGO,2008,Angola,4.643633,4.715589,3.382642,4.653201,5.181950,5.185843,6743.7482,5776.385317
AGO,2009,Angola,5.251115,7.455501,3.394515,4.901540,5.503538,5.007256,7087.6041,-1464.025089
...,...,...,...,...,...,...,...,...,...,...
ZWE,2014,Zimbabwe,5.999147,6.771807,3.930143,7.664303,6.398692,5.039824,1594.0000,6250.400915
ZWE,2015,Zimbabwe,6.449595,6.964753,4.108142,7.859669,6.509231,6.555970,1560.0000,9989.206335
ZWE,2016,Zimbabwe,6.121996,5.332597,4.056407,8.086016,6.404937,6.520805,1534.0000,14271.539452
ZWE,2017,Zimbabwe,5.599886,4.699843,4.071445,7.983888,4.503965,6.399757,1582.3662,13288.328954


In [31]:
y_hat = reg_data[y_var[0] + "Predictor"]
y_mean = reg_data[y_var[0]].mean()
y = reg_data[y_var[0]]

In [32]:
reg_data["Residuals"] = (y.sub(y_hat))
reg_data["Squared Explained"] = y_hat.sub(y_mean) ** 2
reg_data["Squared Residuals"] = (y.sub(y_hat)) ** 2
reg_data["Squared Totals"] = (y.sub(y_mean)) ** 2
reg_data

Unnamed: 0_level_0,Unnamed: 1_level_0,Countries,Summary,Size of Government,Legal System and Property Rights,Sound Money,Freedom to Trade Internationally,Regulation,RGDP Per Capita,RGDP Per CapitaPredictor,Residuals,Squared Explained,Squared Residuals,Squared Totals
ISO_Code_3,Year,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
AGO,2005,Angola,4.214590,6.886311,3.129619,1.270081,5.356979,4.511067,3708.7706,-5474.902171,9183.672771,4.693919e+08,8.433985e+07,1.557949e+08
AGO,2006,Angola,4.531179,5.162277,3.238314,3.807267,5.302944,5.118114,4592.3373,3221.099672,1371.237628,1.682067e+08,1.880293e+06,1.345186e+08
AGO,2007,Angola,4.550966,4.963676,3.224507,4.015297,5.139768,5.348260,5773.5483,4184.555105,1588.993195,1.441440e+08,2.524899e+06,1.085140e+08
AGO,2008,Angola,4.643633,4.715589,3.382642,4.653201,5.181950,5.185843,6743.7482,5776.385317,967.362883,1.084549e+08,9.357909e+05,8.924211e+07
AGO,2009,Angola,5.251115,7.455501,3.394515,4.901540,5.503538,5.007256,7087.6041,-1464.025089,8551.629189,3.116841e+08,7.313036e+07,8.286367e+07
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
ZWE,2014,Zimbabwe,5.999147,6.771807,3.930143,7.664303,6.398692,5.039824,1594.0000,6250.400915,-4656.400915,9.880662e+07,2.168207e+07,2.130593e+08
ZWE,2015,Zimbabwe,6.449595,6.964753,4.108142,7.859669,6.509231,6.555970,1560.0000,9989.206335,-8429.206335,3.845670e+07,7.105152e+07,2.140531e+08
ZWE,2016,Zimbabwe,6.121996,5.332597,4.056407,8.086016,6.404937,6.520805,1534.0000,14271.539452,-12737.539452,3.682612e+06,1.622449e+08,2.148145e+08
ZWE,2017,Zimbabwe,5.599886,4.699843,4.071445,7.983888,4.503965,6.399757,1582.3662,13288.328954,-11705.962754,8.422902e+06,1.370296e+08,2.133991e+08


In [33]:
SSR = reg_data["Squared Explained"].sum()
SSE = reg_data["Squared Residuals"].sum()
SST = reg_data["Squared Totals"].sum()
SSR, SSE, SST

(450042843462.08374, 476075689815.21045, 926118533277.295)