# Ray Fair — Econometrics and Presidential Elections: Assessing & Improving the Model

**The model**

$$V^{p} = \alpha_{0} + \alpha_{1} G + \alpha_{2} P + \alpha_{3} Z + \alpha_{4} I + \alpha_{5} \mathrm{DUR} + \alpha_{6} \mathrm{DPER} + \alpha_{7} \mathrm{WAR} + \varepsilon$$

**Variables**

| Variable | Definition |
|---|---|
| `V^p` | Democratic share of the two-party presidential vote. |
| `G` | Growth rate of real per-capita GDP in the first 3 quarters of the on-term election year (annual rate). |
| `P` | Absolute value of the growth rate of the GDP deflator in the first 15 quarters of the administration (annual rate), except 1920, 1944, 1948 where values are zero. |
| `Z` | Number of quarters in the first 15 quarters where real per-capita GDP growth is &gt; 3.2% (annual rate), except 1920, 1944, 1948 where zero. |
| `I` | 1 if Democratic presidential incumbent at election; −1 if Republican. |
| `DUR` | 0 if either party has been in the White House for one term; 1 [−1] for two consecutive Democratic [Republican] terms; 1.25 [−1.25] for three terms; 1.50 [−1.50] for four terms; etc. |
| `DPER` | 1 if a Democratic presidential incumbent runs again; −1 if a Republican incumbent runs again; 0 otherwise. |
| `WAR` | 1 for the elections of 1918, 1920, 1942, 1944, 1946, 1948; 0 otherwise. |






First, import Professor Fair's data:

In [135]:
import pandas as pd

df = pd.read_fwf("data_text.txt", names=['t', 'VP', 'VC', 'I', 'DPER', 'DUR', 'WAR', 'G', 'P', 'Z'])   
df.tail(10)




Unnamed: 0,t,VP,VC,I,DPER,DUR,WAR,G,P,Z
28,1984,40.877,52.778,-1,-1,0.0,0,5.437,5.277,8
29,1988,46.168,54.011,-1,0,-1.0,0,2.343,2.817,4
30,1992,53.621,52.744,-1,-1,-1.25,0,3.053,3.21,3
31,1996,54.737,50.158,1,1,0.0,0,3.3,2.04,4
32,2000,50.262,49.819,1,0,1.0,0,2.013,1.644,7
33,2004,48.767,48.632,-1,-1,0.0,0,2.187,2.115,2
34,2008,53.689,55.535,-1,0,-1.0,0,-1.387,2.717,2
35,2012,52.01,50.681,1,1,0.0,0,1.181,1.421,2
36,2016,51.163,50.546,1,0,1.0,0,1.245,1.349,2
37,2020,52.249,51.556,-1,-1,0.0,0,-3.508,1.828,2


The 2024 presidential election notes:

Political
- does it make sense to include a gender bias
- more information on the candidate like age, origin, etc. 
- there has been an economic news variable, does it make sense to include something like that


Economic:

We do have GDP per capita and inflation. 
- What about unemployment. 
- The youth seems to play a more important role in elections, what drives young men to become more conservative. 
- Globalization could play a role. Could look at Dollar's strenght. 
- What about economic indicators that measure opportunities, i.e. mean years of schooling. 
- What about crime
- What about inequality, affects politics and is indicative for political polarization 
- What about gov. deficit. 
- What about average tax rate, etc. 
- What economic indicators do people look at when voting?
- GDP growth compared to other countries. IMPORTANT ONE
- What about financial markets? 


find papers that look at how those economic indicators correlate with political opinion






Let's start looking at relative US GDP per capita as a better indicator for the government's success in creating prosperity. 

But first, let's specify Prof. Ray's classic model:

In [136]:
import pandas as pd
import numpy as np

# 1) Clean strings and adjust year variable
df = df.replace(["na", "NA", ""], np.nan)
df["t"] = pd.to_numeric(df["t"], errors="coerce").astype("Int64")
df = df[df["t"] >= 1916]


# 2) Force numeric for the variables we need
cols = ['VP','I','DPER','DUR','WAR','G','P','Z']
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')

# 3) Build X, y with proper dtypes and add constant
df["G_I"] = df["G"] * df["I"]
df["P_I"] = df["P"] * df["I"]
df["Z_I"] = df["Z"] * df["I"]

X = df[["G_I", "P_I", "Z_I", "I", "DPER", "DUR", "WAR"]]
X = sm.add_constant(X)
y = df["VP"]

# 4) Fit
res = sm.OLS(y, X).fit()
print(res.summary())


                            OLS Regression Results                            
Dep. Variable:                     VP   R-squared:                       0.849
Model:                            OLS   Adj. R-squared:                  0.794
Method:                 Least Squares   F-statistic:                     15.30
Date:                Wed, 29 Oct 2025   Prob (F-statistic):           1.37e-06
Time:                        17:08:16   Log-Likelihood:                -63.324
No. Observations:                  27   AIC:                             142.6
Df Residuals:                      19   BIC:                             153.0
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         48.2245      0.659     73.222      0.0

Okay, now let's find some data that looks at US GDP relative to world gdp per capita growth. 

In [137]:
X.corr()

Unnamed: 0,const,G_I,P_I,Z_I,I,DPER,DUR,WAR
const,,,,,,,,
G_I,,1.0,0.08724,0.364123,0.226406,0.213755,0.056471,-0.15584
P_I,,0.08724,1.0,0.760571,0.802434,0.692367,0.493538,0.070719
Z_I,,0.364123,0.760571,1.0,0.850268,0.674076,0.542549,-0.004848
I,,0.226406,0.802434,0.850268,1.0,0.793171,0.731591,0.340693
DPER,,0.213755,0.692367,0.674076,0.793171,1.0,0.336551,0.280848
DUR,,0.056471,0.493538,0.542549,0.731591,0.336551,1.0,0.48486
WAR,,-0.15584,0.070719,-0.004848,0.340693,0.280848,0.48486,1.0


In [138]:
from statsmodels.stats.outliers_influence import variance_inflation_factor
import pandas as pd

vif = pd.DataFrame()
vif["variable"] = X.columns
vif["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
print(vif)


  variable        VIF
0    const   1.292287
1      G_I   1.337017
2      P_I   3.711272
3      Z_I   6.012925
4        I  15.303838
5     DPER   4.652973
6      DUR   4.214724
7      WAR   2.264246


## US GDP per capita relative to OECD GDP per capita

In [139]:
import pandas as pd

GDP = pd.read_csv("GDP.csv")
GDP = GDP.iloc[:, [2, 4]]

GDP["t"] = pd.to_numeric(
    GDP["Time"].astype(str).str.replace(",", ""), 
    errors="coerce"
).astype("Int64")

GDP = GDP.drop(columns=["Time"])

GDP.head(10)

Unnamed: 0,GDP per capita (current US$) [NY.GDP.PCAP.CD],t
0,1311.37045,1960
1,1379.679072,1961
2,1471.839643,1962
3,1562.980986,1963
4,1687.383687,1964
5,1809.565133,1965
6,1956.071051,1966
7,2065.885535,1967
8,2215.304542,1968
9,2412.852797,1969


Now we have to calculate the growth rate of GDP per capita

In [140]:
GDP['G_OECD'] = GDP['GDP per capita (current US$) [NY.GDP.PCAP.CD]'].pct_change() * 100
GDP.head(10)

  GDP['G_OECD'] = GDP['GDP per capita (current US$) [NY.GDP.PCAP.CD]'].pct_change() * 100


Unnamed: 0,GDP per capita (current US$) [NY.GDP.PCAP.CD],t,G_OECD
0,1311.37045,1960,
1,1379.679072,1961,5.208949
2,1471.839643,1962,6.679856
3,1562.980986,1963,6.192342
4,1687.383687,1964,7.959323
5,1809.565133,1965,7.240881
6,1956.071051,1966,8.096195
7,2065.885535,1967,5.614034
8,2215.304542,1968,7.232686
9,2412.852797,1969,8.917431


In [141]:
merged = pd.merge(df, GDP, on="t")
merged['G_relative'] = merged['G'] / merged['G_OECD'] * 100
merged = merged[merged['t'] >= 1961]
merged.head(10)



Unnamed: 0,t,VP,VC,I,DPER,DUR,WAR,G,P,Z,G_I,P_I,Z_I,GDP per capita (current US$) [NY.GDP.PCAP.CD],G_OECD,G_relative
1,1964,61.203,57.324,1,1,0.0,0,5.098,1.234,9,5.098,1.234,9,1687.383687,7.959323,64.050675
2,1968,49.425,50.921,1,0,1.0,0,5.109,3.086,7,5.109,3.086,7,2215.304542,7.232686,70.637661
3,1972,38.209,52.66,-1,-1,0.0,0,5.863,4.812,4,-5.863,-4.812,-4,3280.676647,14.828081,39.539842
4,1976,51.049,56.85,-1,0,-1.0,0,3.827,7.476,5,-3.827,-7.476,-5,5187.973679,7.337678,52.155464
5,1980,44.842,51.383,1,1,0.0,0,-3.596,7.827,5,-3.596,7.827,5,8696.385082,10.034068,-35.837908
6,1984,40.877,52.778,-1,-1,0.0,0,5.437,5.277,8,-5.437,-5.277,-8,9106.131086,3.696107,147.100733
7,1988,46.168,54.011,-1,0,-1.0,0,2.343,2.817,4,-2.343,-2.817,-4,14693.571073,11.833991,19.798899
8,1992,53.621,52.744,-1,-1,-1.25,0,3.053,3.21,3,-3.053,-3.21,-3,19011.452866,6.569195,46.474494
9,1996,54.737,50.158,1,1,0.0,0,3.3,2.04,4,3.3,2.04,4,22192.551283,0.062069,5316.649564
10,2000,50.262,49.819,1,0,1.0,0,2.013,1.644,7,2.013,1.644,7,23025.787524,1.490352,135.068721


In [142]:
import pandas as pd
import numpy as np

# 3) Build X, y with proper dtypes and add constant
merged["G_relative_I"] = merged["G_relative"] * merged["I"]
merged["P_I"] = merged["P"] * merged["I"]
merged["Z_I"] = merged["Z"] * merged["I"]

X_oecd = merged[["G_I", "G_relative_I", "P_I", "Z_I", "I", "DPER", "DUR", "WAR"]]
X_oecd = sm.add_constant(X_oecd)
y_oecd = merged["VP"]

# 4) Fit
res_oecd = sm.OLS(y_oecd, X_oecd).fit()
print(res_oecd.summary())


                            OLS Regression Results                            
Dep. Variable:                     VP   R-squared:                       0.858
Model:                            OLS   Adj. R-squared:                  0.717
Method:                 Least Squares   F-statistic:                     6.056
Date:                Wed, 29 Oct 2025   Prob (F-statistic):             0.0149
Time:                        17:08:16   Log-Likelihood:                -32.164
No. Observations:                  15   AIC:                             80.33
Df Residuals:                       7   BIC:                             85.99
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
                   coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------
const           49.1891      0.992     49.594   

  res = hypotest_fun_out(*samples, **kwds)
  return np.sqrt(eigvals[0]/eigvals[-1])


## US GDP per capita relative to German GDP per capita

In [143]:
import pandas as pd

Germany = pd.read_csv("Germany.csv")
Germany = Germany.iloc[:, [2, 4]]

Germany["t"] = pd.to_numeric(
    Germany["Time"].astype(str).str.replace(",", ""), 
    errors="coerce"
).astype("Int64")

Germany = Germany.drop(columns=["Time"])

Germany['G_Germany'] = Germany['Germany [DEU]'].pct_change() * 100
Germany.head(10)


merged_germany = pd.merge(df, Germany, on="t")
merged_germany['G_relative'] = merged_germany['G'] / merged_germany['G_Germany'] * 100
merged_germany = merged_germany[merged_germany['t'] >= 1961]
merged_germany.head(10)

import pandas as pd
import numpy as np

# 3) Build X, y with proper dtypes and add constant
merged_germany["G_relative_I"] = merged_germany["G_relative"] * merged_germany["I"]
merged_germany["P_I"] = merged_germany["P"] * merged_germany["I"]
merged_germany["Z_I"] = merged_germany["Z"] * merged_germany["I"]

X_germany = merged_germany[["G_relative_I", "P_I", "Z_I", "I", "DPER", "DUR", "WAR"]]
X_germany = sm.add_constant(X_germany)
y_germany = merged_germany["VP"]

# 4) Fit
res_germany = sm.OLS(y_germany, X_germany).fit()
print(res_germany.summary())


                            OLS Regression Results                            
Dep. Variable:                     VP   R-squared:                       0.698
Model:                            OLS   Adj. R-squared:                  0.471
Method:                 Least Squares   F-statistic:                     3.077
Date:                Wed, 29 Oct 2025   Prob (F-statistic):             0.0723
Time:                        17:08:16   Log-Likelihood:                -37.846
No. Observations:                  15   AIC:                             89.69
Df Residuals:                       8   BIC:                             94.65
Df Model:                           6                                         
Covariance Type:            nonrobust                                         
                   coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------
const           47.6055      1.291     36.868   

  Germany['G_Germany'] = Germany['Germany [DEU]'].pct_change() * 100
  res = hypotest_fun_out(*samples, **kwds)
  return np.sqrt(eigvals[0]/eigvals[-1])


The statistically insignificant results for using a relative GDP per capita indicator instead of the US GDP indicatoro shows that american voters seem to form their opinion on who to vote for independent of the relative performance of their government. 

## Gini Coefficient

Next, I am including the gini coefficient for the US, as I believe economic inequality to have a significant effect on political polarization and perceived justice. 

In [159]:
import pandas as pd


Gini = pd.read_csv("Gini.csv")
Gini = Gini.iloc[:, [2, 4]]

Gini["t"] = pd.to_numeric(
    Gini["Time"].astype(str).str.replace(",", ""), 
    errors="coerce"
).astype("Int64")
Gini = Gini.drop(columns=["Time"])
Gini['Gini'] = Gini['United States [USA]']

merged_gini = pd.merge(df, Gini, on="t")
merged_gini = merged_gini[merged_gini['t'] >= 1963]
merged_gini.head(10)

import pandas as pd
import numpy as np

cols = ["VP","I","DPER","DUR","WAR","G_I","P_I","Z_I","Gini"]
for c in cols:
    merged_gini[c] = pd.to_numeric(merged_gini[c], errors="coerce")

reg = merged_gini.dropna(subset=cols).copy()

X_gini = reg[["G_I", "P_I", "Z_I", "I", "DPER", "DUR", "WAR", "Gini"]].astype(float)
X_gini = sm.add_constant(X_gini)
y_gini = reg["VP"]


res_gini = sm.OLS(y_gini, X_gini).fit()
print(res_gini.summary())


                            OLS Regression Results                            
Dep. Variable:                     VP   R-squared:                       0.881
Model:                            OLS   Adj. R-squared:                  0.762
Method:                 Least Squares   F-statistic:                     7.403
Date:                Wed, 29 Oct 2025   Prob (F-statistic):            0.00850
Time:                        17:23:53   Log-Likelihood:                -30.854
No. Observations:                  15   AIC:                             77.71
Df Residuals:                       7   BIC:                             83.37
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         29.1816     16.876      1.729      0.1

  res = hypotest_fun_out(*samples, **kwds)
  return np.sqrt(eigvals[0]/eigvals[-1])


The Gini coefficient doesn't add explanatory power to the model either. 

## Financial Markets as an indicator for economic sentiment

There is data from the FED on the NASDAQ Composite Index. I thought to take the average returns from one year before the elections. 