# Part 4 - Regressions

Serena Weimer

Data Analytics

December 2, 2020

In [1]:
library(dplyr)
library(tidyverse)
library(WDI)
library(plm)
library(estimatr)
library(lmtest)
library(corrplot)
library(stargazer)
library(RColorBrewer)
library(Hmisc)
#install.packages("car")
library(sandwich)
library(car)

"package 'dplyr' was built under R version 3.6.3"
Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

"package 'tidyverse' was built under R version 3.6.3"-- Attaching packages --------------------------------------- tidyverse 1.3.0 --
v ggplot2 3.3.2     v purrr   0.3.2
v tibble  3.0.4     v stringr 1.4.0
v tidyr   1.1.2     v forcats 0.4.0
v readr   1.3.1     
"package 'tidyr' was built under R version 3.6.3"-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()
"package 'plm' was built under R version 3.6.3"
Attaching package: 'plm'

The following objects are masked from 'package:dplyr':

    between, lag, lead

"package 'lmtest' was built under R version 3.6.3"Loading required package: zoo

Attaching package: 'zoo'

The following objects 

In [2]:
#connecting to my path
setwd('C:/Users/seren/OneDrive/Desktop/SustainDevel_project/')
#calling my csv file and naming it data, and printing it
data1 <- read.csv('alldata_final.csv')
#droping columns
data = subset(data1, select = -c(X,GDP_Growth) )
data

Country,Code,Year,GHG,GDP_PerCap,AvgTotYrs_School,Unemploy,MedHighTech,FemLF_Partici,NatDis_Death,Gini,Life_Expect,Renewables
Afghanistan,AFG,2003,16.78,190.6838,2.4,,13.590844,15.323,137.0000,,57.271,0.378177967
Afghanistan,AFG,2004,16.35,211.3821,2.5,,14.275717,15.570,18.0000,,57.772,0.365563605
Afghanistan,AFG,2005,17.09,242.0313,2.6,,2.717657,15.801,561.0001,,58.290,0.298041613
Afghanistan,AFG,2006,20.74,263.7337,2.7,,2.701550,15.530,382.0000,,58.826,0.294850346
Afghanistan,AFG,2007,24.78,359.6932,2.9,,3.063960,15.268,296.0000,,59.375,0.276491618
Afghanistan,AFG,2008,31.39,364.6607,3.0,,9.559297,15.057,0.0000,,59.930,0.135097662
Afghanistan,AFG,2009,37.72,438.0760,3.1,,10.006840,14.938,101.0000,,60.484,0.120773378
Afghanistan,AFG,2010,44.75,543.3030,3.2,,9.483964,14.935,350.0000,,61.028,0.107082036
Afghanistan,AFG,2011,58.62,591.1628,3.3,,9.285154,15.339,62.0000,,61.553,0.083497175
Afghanistan,AFG,2012,67.05,641.8715,3.4,,8.927933,15.850,333.0000,,62.054,0.118564723


In [3]:
#running basic ols regression
base <- plm(log(GDP_PerCap) ~ GHG + AvgTotYrs_School + MedHighTech + Unemploy + FemLF_Partici + Gini + NatDis_Death + Renewables + Life_Expect, data = data, index = c("Code", "Year"), model = "within")
#running a regression with oneway fixed effects on my panel data 
base1 <- lm(log(GDP_PerCap) ~ GHG + AvgTotYrs_School + MedHighTech + Unemploy + FemLF_Partici + Gini + NatDis_Death + Renewables + Life_Expect, data = data)
#running a regression with twoway fixed effects on my panel data
regress2 <- plm(log(GDP_PerCap) ~ GHG + AvgTotYrs_School + MedHighTech + Unemploy + FemLF_Partici + NatDis_Death + Gini + Renewables + Life_Expect, data = data, index = c("Code", "Year"), effect= "twoways", model = "within")

#regression with clustered standard errors by years
regress3 <- plm(log(GDP_PerCap) ~ GHG + AvgTotYrs_School + MedHighTech + Unemploy + FemLF_Partici + NatDis_Death + Gini + Renewables + Life_Expect, data = data, index = c("Code", "Year"), effect= "twoways", model = "within")
clust <- vcovHC(base1, type = "HC1", cluster = "time")
clust1 <- sqrt(diag(clust))
#https://cran.r-project.org/web/packages/stargazer/vignettes/stargazer.pdf
#clustered standard error grouped by years

In [5]:
#putting regression into professional table
stargazer(base1, base, regress2, regress3, type = "text",  se = list(NULL, NULL, NULL, clust1), single.row = FALSE, align=TRUE,
    dep.var.labels=c("GDP Per Capita"), covariate.labels=c("Greenhouse Gas Emissions","Average Total Years of School",
    "Meduim/High Tech Industry","Umemployed","Female Labor Force Participation","Gini Index",
    "Natural Disaster Deaths", "Renewable Energy","Life Expectancy Rate"), no.space=TRUE, 
    title = "Table 2: Results", notes = "()std.error", out = "C:/Users/seren/OneDrive/Desktop/SustainDevel_project/table2.html")

#https://www.jakeruss.com/cheatsheets/stargazer/#make-an-addition-to-the-existing-note-section


Table 2: Results
                                                                        Dependent variable:                                       
                                 -------------------------------------------------------------------------------------------------
                                                                          GDP Per Capita                                          
                                           OLS                                             panel                                  
                                                                                           linear                                 
                                           (1)                      (2)                      (3)                     (4)          
----------------------------------------------------------------------------------------------------------------------------------
Greenhouse Gas Emissions                -0.00004*                

# Exploratory Analysis

The most basic regression preformed in this study uses the Ordinary Least Squares (OLS) method with panel data that had 887 observations. Looking at the r-squared, it can be seen that these variables account for approximately 78% of growth. Further, six variables were found to be significant. We can be 99% confident that Average total years of school, the Medium High Tech Industry Value Added, the Female Labor Force Participation Rate, and Life expectancy rate are in the 1% significance level. Where as, Greenhouse Gas Emissions fall within the 10% significance level with a 90% confidence interval.

The second regression ran is the OLS method with a one way country fixed effects. It was quite interesting to find that all of my variables were significant in the 95% and 99% confidence intervals which account for approximately 59% of GDP. The third regression run uses the OLS two way fixed effects model. The two-way fixed effects control for differneces in predictors for both time and country. My hypothesis was that this would have been the best working model, however these results held variables that did not have all significant p-values. With this model, we can be 99% confident that with Greenhouse Gas Emissions, Unemployment, the Gini Index, and renewable energy we can reject the null hypothesis. Furhter, the final regression ran invloved the OLS method with two-way fixed effects and a clustered standard error. I ran this regression to see if how my results changed from regression 3 when considering the correlation between observations. However, my results did not change significantly. The biggest change that occued was the Life expectancy variable originally had a 5% significance became insignificant.  

After considering all four regression my preffered model is regression 2, due to the number of significant findings. The coefficents can be interpreted to see the effect these variables individually have on the growth of GDP Per capita, ceteris parabus. Specifically, an increase of one unit of Greenhouse Gas Emissions will increase GDP by .02%. For every year addded to the average total years of schooling, GDP will increase by 21%. Further for every perecent increase in the medium and high tech industry, gdp growth will incrase by .6% For every year added on to the expected life span, GDP will increase by 7.7%. The variables that are negatively correlated with GDP are Unemployment, the Gini Index, natural disaster deaths and renewable energy. A 1% increase in unemployment amounts to a 2.1% decrease in GDP. A 1 unit increase in the GINI Index leads to a 2.8% decrease in Growth of GDP Per Capita. I think that the most interesting results from this study is the GINI Indexes effect on GDP Growth. From these results, my understanding is that inequality is bad for growth and a country with greater inequality will have slower GDP Growth.


Looking toward future research, one thing I want to improve to better understand my research, is to add more indpendent variables as proxies for sustainable development. The biggest difficulty with completeing this is the lack of data there is available for enviormental and social ewuity variables, like deforestation, land use, and malnourishment. These were all variables I originally planned on incoorportating my study, however the number of observations would have been compromised. I felt that inorder to get the best understanding of how sustainable development effected economic growth; it was more important to get a larger pool of observations to study.   