# Ray Fair — Econometrics and Presidential Elections: Application to NYC mayoral elections
**The model**

$$V^{p} = \alpha_{0} + \alpha_{1} G + \alpha_{2} P + \alpha_{3} Z + \alpha_{4} I + \alpha_{5} \mathrm{DUR} + \alpha_{6} \mathrm{DPER} + \alpha_{7} \mathrm{WAR} + \varepsilon$$

**Variables**

| Variable | Definition |
|---|---|
| `V^p` | Democratic share of the two-party presidential vote. |
| `G` | Growth rate of real per-capita GDP in the first 3 quarters of the on-term election year (annual rate). |
| `P` | Absolute value of the growth rate of the GDP deflator in the first 15 quarters of the administration (annual rate), except 1920, 1944, 1948 where values are zero. |
| `Z` | Number of quarters in the first 15 quarters where real per-capita GDP growth is &gt; 3.2% (annual rate), except 1920, 1944, 1948 where zero. |
| `I` | 1 if Democratic presidential incumbent at election; −1 if Republican. |
| `DUR` | 0 if either party has been in the White House for one term; 1 [−1] for two consecutive Democratic [Republican] terms; 1.25 [−1.25] for three terms; 1.50 [−1.50] for four terms; etc. |
| `DPER` | 1 if a Democratic presidential incumbent runs again; −1 if a Republican incumbent runs again; 0 otherwise. |
| `WAR` | 1 for the elections of 1918, 1920, 1942, 1944, 1946, 1948; 0 otherwise. |






The New York City Mayor Elections are coming up and I'd like use Prof. Fair's model to explain election outcomes. It will help me to understand whether economic indicator influence local voter behaviour. The main problem is the limitation of data and number of time periods. Since mayoral elections are hold only every four years, it is difficult to obtain a large sample of data. Moreover, detailed data sets on local economic data for NYC are hard to obtain. I decided to use this project as an experimental study and a starting point for further models. I am inspired by Prof. Ray's work and want to use this application to better understand his work and built upon it. The NYC mayor elections is a very specific application for which likely Prof. Ray's model will proof to be insuitable since it was trained on presidential election data. However, as said, this might be a starting point for either thinking about economic models for predicting local elections, a project on causally explaining voting behaviour (to what extend are voters influenced by federal policy in local elections) or the application of Prof. Fair's model to other countries, like my own, Germany. 



First, I import data on NYC mayoral election outcomes, sourced from wikipedia. This data gives me a table with all the mayoral candidates, the voting year and their results. Inspired by Prof. Fair, I'll use the percentage for the democratic candidate as my dependent variable. One issue I encountered and should put more fought into is the problem of independent candidates that are in between conservative and progressive. When either a republican or a democrat wins the elections, this problem is neligable, but in 1913 and 2009 an independent candidate won the election which the model doesn't consider. 

In [1]:
import pandas as pd

NYC = pd.read_csv("nyc_mayoral_elections_1897_2025.csv")
NYC.head(100)


Unnamed: 0,year,total_votes,democratic_candidate,democratic_votes,democratic_pct,major_third_party_candidate,third_party_votes,third_party_pct,republican_candidate,republican_votes,republican_pct,other_major_candidates,other_votes,other_pct
0,1897,523560.0,Robert A. Van Wyck,233997.0,44.7,"Seth Low, Citizens Union",151540.0,28.9,Benjamin F. Tracy,101863.0,19.5,"Henry George, Jeff",21693.0,4.1
1,1901,579301.0,Edward M. Shepard,265177.0,45.8,"Seth Low, R-Citizens Union (Fusion)",296813.0,51.2,,,,,,
2,1903,589898.0,George B. McClellan Jr.,314782.0,53.4,"Seth Low, R-Citizens Union (Fusion)",252086.0,42.7,,,,,,
3,1905,604673.0,George B. McClellan Jr.,228407.0,37.8,"William R. Hearst, Municipal Ownership League",224989.0,37.2,William M. Ivins,137184.0,22.7,,,
4,1909,594902.0,William J. Gaynor,250378.0,42.1,"William R. Hearst, Civic Alliance",154187.0,25.9,"Otto Bannard, R-Fusion",177313.0,29.8,,,
5,1913,627017.0,Edward McCall,233919.0,37.3,"John P. Mitchel, Fusion",358181.0,57.1,,,,"Charles E. Russell, Soc",32057.0,5.1
6,1917,673300.0,John Francis Hylan,314010.0,46.6,"John P. Mitchel, Fusion (inc.)",155497.0,23.1,William M. Bennett,56438.0,8.4,"Morris Hillquit, Soc",145332.0,21.6
7,1921,1168767.0,John Francis Hylan (inc.),750247.0,64.2,,,,"Henry M. Curran, R-Coalition",332846.0,28.5,"Jacob Panken, Soc",82607.0,7.1
8,1925,1137966.0,Jimmy Walker,748687.0,65.8,,,,Frank Waterman,346564.0,30.5,"Norman Thomas, Soc",39574.0,3.5
9,1929,1429385.0,Jimmy Walker (inc.),867522.0,60.7,,,,Fiorello La Guardia,367675.0,25.7,"Norman Thomas, Soc",175697.0,12.3


I said that I wanted to look at the extend that federal economic data affects local election results. However, initially I was planning to look at local economic data and it's effect on local election results. I tried to find data on the unemployment rate, income, etc. in NYC. Unfortunately, there is no good historical data that I could use. The only usable data I found was on CPI and the Rent price index, which dates back to 1914. However, because of the NA's the CPI data is only usable after 1945 and the rent data even later. This causes a big problem for the robustness of the model. In the current election campaigning, rent prices and inflation play a big role, and I though it'd be an interesting variable to look at. It's definetly worth looking into further. 

In [2]:
import pandas as pd

CPI = pd.read_csv("CUURA101SA0.csv")
CPI.head(100)

Rent = pd.read_csv("CUURA101SEHA.csv")
Rent.head(100)

Unnamed: 0,observation_date,CUURA101SEHA
0,1914-12-01,18.9
1,1915-01-01,
2,1915-02-01,
3,1915-03-01,
4,1915-04-01,
...,...,...
95,1922-11-01,
96,1922-12-01,29.7
97,1923-01-01,
98,1923-02-01,


Here, I import the quarterly GDP data from Prof. Fair's website for the whole US, as I couldn't find historic local economic data for NYC. 

In [3]:
import pandas as pd

# Example colspecs: adjust (start, end) to your file’s layout
colspecs = [(0, 7), (9, 18), (19, 28), (30, 38)]
GDP_quarterly = pd.read_fwf(
    "Quarterly GDP Data.txt",
    colspecs=colspecs,
    names=["Quarter", "(Y)", "(X)", "(P)"],
    header=None
)
GDP_quarterly = GDP_quarterly.iloc[3:]
GDP_quarterly.tail(100)


Unnamed: 0,Quarter,(Y),(X),(P)
484,1997.4,8765.90,11722.70,274.246
485,1998.1,8866.50,11839.90,274.950
486,1998.2,8969.70,11949.50,275.703
487,1998.3,9121.10,12099.20,276.564
488,1998.4,9294.00,12294.70,277.400
...,...,...,...,...
579,2021.3,23550.40,19672.60,332.297
580,2021.4,24349.10,20006.20,332.584
581,2022.1,24740.50,19924.10,332.749
582,2022.2,25248.50,19895.30,332.940


NYC Mayoral Elections are always held in November, so I will always take the first 3 quarters of the election year to calculate G. In Prof. Fair's model, G is the growth rate of Real GDP per capita in the first 3 quarters before the election. For the NYC election that's Q1-Q3. The following code, creates variable G: 



In [4]:
# Create Real GDP per Capita variable
GDP_quarterly['Real GDP per Capita'] = (GDP_quarterly['(X)'].astype(float)*100)/ GDP_quarterly['(P)'].astype(float)

# Get rid of years before 1914
GDP_quarterly = GDP_quarterly[GDP_quarterly['Quarter'].str.slice(0,4).astype(int) >= 1914]

# Get rid of all years that are not mayoral election years: 1917 1921 1925 1929 1932 1933 1937 1941 1945 1949 1950 1953 1957 1961 1965 1969 1973 1977 1981 1985 1989 1993 1997 2001 2005 2009 2013 2017 2021
GDP_quarterly = GDP_quarterly[GDP_quarterly['Quarter'].str.slice(0,4).astype(int).isin([1917, 1921, 1925, 1929, 1932, 1933, 1937, 1941, 1945, 1949, 1950, 1953, 1957, 1961, 1965, 1969, 1973, 1977, 1981, 1985, 1989, 1993, 1997, 2001, 2005, 2009, 2013, 2017, 2021])]

# Calculate annual percentage change in Real GDP per Capita for first three quarters of each mayoral election year, i.e. taking the percentage change from Q1 to Q3 of the election year and extrapolating to an annual rate by taking it to the power of 4/3

# Extract year + quarter number
GDP_quarterly["Year"] = GDP_quarterly["Quarter"].astype(str).str.split(".").str[0].astype(int)
GDP_quarterly["Q"] = GDP_quarterly["Quarter"].astype(str).str.split(".").str[1].astype(int)

# Pivot to wide format: each year gets Q1, Q3 columns
wide = GDP_quarterly.pivot(index="Year", columns="Q", values="Real GDP per Capita")

# Ensure needed quarters exist
wide = wide.rename(columns={1: "Q1", 3: "Q3"})

# Compute G
wide["G"] = ((wide["Q3"] / wide["Q1"])**(4/3) - 1) * 100

# Merge back if needed
GDP_quarterly = GDP_quarterly.merge(
    wide["G"],
    on="Year",
    how="left"
)
GDP_quarterly.head(100)


G = GDP_quarterly[~GDP_quarterly["Q"].isin([1,2,3])].reset_index(drop=True)

GDP_quarterly.head(50)

G = G.iloc[:, 1:]
G.head(15)

Unnamed: 0,(Y),(X),(P),Real GDP per Capita,Year,Q,G
0,64.89,772.07,103.445,746.357968,1917,4,5.128574
1,70.37,734.54,109.267,672.243221,1921,4,6.290097
2,96.44,1000.49,116.522,858.627555,1925,4,4.125995
3,102.07,1083.95,122.377,885.746505,1929,4,5.67194
4,55.67,798.63,125.231,637.725483,1932,4,-11.149834
5,59.53,817.01,125.974,648.554464,1933,4,21.030681
6,87.28,1111.06,129.304,859.261894,1937,4,2.432873
7,143.23,1667.4,133.904,1245.220456,1941,4,14.373973
8,213.7,2116.91,140.497,1506.729681,1945,4,-11.378254
9,270.6,2103.7,150.167,1400.90699,1949,4,-0.187928


Next, I attempt to create P. I'll use the CPI data which is standardized at the year 1984. The historic CPI data that is available to me is monthly data which I aggregate on the quarter level to then calculate the average annual inflation for the first 15 quarters in the legislation. Even though local policy is far from causally related with local inflation, it is interesting to look at it's influence on local voter behaviour. I do the same for the rent price index by creating a variable R. 

In [5]:
import pandas as pd
import numpy as np
import statsmodels.api as sm  # if you run regressions later

# ---------- Helpers ----------
def quarter_start_series(df, date_col, value_col):
    """Return a Series of values at quarter starts (Jan 1, Apr 1, Jul 1, Oct 1)."""
    s = df.copy()
    s[date_col] = pd.to_datetime(s[date_col])
    s = s.sort_values(date_col).set_index(date_col)
    # keep only 01-01, 04-01, 07-01, 10-01
    s = s.loc[s.index.is_month_start & s.index.month.isin([1, 4, 7, 10]), value_col]
    return s.sort_index()

def fair_P_from_level(qstart_level, election_years, anchor="10-01"):
    """
    Compute Fair's P using level series at quarter starts.
    Anchor = start of election quarter (Q4 -> '10-01').
    Requires 16 quarters before the anchor (i.e., idx >= 16).
    """
    qc = qstart_level.to_frame("LEVEL").copy()
    qc["year"] = qc.index.year

    out = []
    for yr in sorted(pd.Series(election_years).astype(int).unique()):
        election_q_start = f"{yr}-{anchor}"
        try:
            idx_e = qc.index.get_loc(pd.Timestamp(election_q_start))
        except KeyError:
            # if exact Oct 1 missing, skip
            continue

        if idx_e < 16:
            continue  # not enough history

        # 1st and 15th quarters of the administration
        LEVEL_1  = qc.iloc[idx_e - 15]["LEVEL"]
        LEVEL_15 = qc.iloc[idx_e - 1]["LEVEL"]

        P = (((LEVEL_15 / LEVEL_1) ** (4 / 15)) - 1) * 100
        out.append({"year": int(yr), "P": float(abs(P))})

    return pd.DataFrame(out)

# ---------- CPI → P ----------
# CPI: columns ['observation_date','CUURA101SA0']
CPI_qstart = quarter_start_series(CPI, "observation_date", "CUURA101SA0")
P_df = fair_P_from_level(CPI_qstart, NYC["year"])

# ---------- CPI Rent → R (same construction as P, separate column) ----------
# Rent: columns ['observation_date','CUURA101SEHA']
Rent_qstart = quarter_start_series(Rent, "observation_date", "CUURA101SEHA")

R_rows = []
qc_r = Rent_qstart.to_frame("LEVEL").copy()

for yr in sorted(NYC["year"].astype(int).unique()):
    election_q_start = pd.Timestamp(f"{yr}-10-01")
    try:
        idx_e = qc_r.index.get_loc(election_q_start)
    except KeyError:
        continue
    if idx_e < 16:
        continue

    L1  = qc_r.iloc[idx_e - 15]["LEVEL"]
    L15 = qc_r.iloc[idx_e - 1]["LEVEL"]
    R = (((L15 / L1) ** (4 / 15)) - 1) * 100
    R_rows.append({"year": int(yr), "R": float(abs(R))})

R_df = pd.DataFrame(R_rows)

# ---------- G was already computed into a year-level table `G` ----------
# Make sure G has columns ['year','G']
if "year" not in G.columns:
    G = G.rename(columns={"Year": "year"})
G_use = G[["year", "G"]].drop_duplicates()

# ---------- Merge everything ONTO NYC (left) ----------
NYC = (NYC
       .merge(G_use, on="year", how="left")
       .merge(P_df,  on="year", how="left")
       .merge(R_df,  on="year", how="left"))

# Optional: sanity checks
# print(NYC[["year","G","P","R"]].sort_values("year"))


Next, I manually create I, DUR and DPER. 

In [6]:
# Define year → I mapping
I_map = {
    1949: 1,
    1957: 1,
    1961: 1,
    1981: 1,
    1985: 1,
    1993: 1,
    2017: 1,

    1969: -1,
    1997: -1,
    2005: -1,
    2009: -1
}


NYC["I"] = NYC["year"].map(I_map).fillna(0).astype(int)

# Create DUR variable

DUR_map = {
    1949: 1,
    1957: 1,
    1961: 1.25,

    1977: 1,
    1981: 1.25,
    1985: 1.50,
    1989: 1.75,

    1997: -1,
    2001: -1.25,
    2004: -1.50,
    2009: -1.75,

    2017: 1,
    2021: 1.25
}

NYC["DUR"] = NYC["year"].map(DUR_map).fillna(0)

# Create DPER variable

DPER_map = {
    1949: 1,
    1957: 1,
    1961: 1,

    1969: -1,

    1981: 1,
    1985: 1,

    1997: -1,

    2005: -1,
    2009: -1,

    2017: 1
}

NYC["DPER"] = NYC["year"].map(DPER_map).fillna(0)

NYC.columns

Index(['year', 'total_votes', 'democratic_candidate', 'democratic_votes',
       'democratic_pct', 'major_third_party_candidate', 'third_party_votes',
       'third_party_pct', 'republican_candidate', 'republican_votes',
       'republican_pct', 'other_major_candidates', 'other_votes', 'other_pct',
       'G', 'P', 'R', 'I', 'DUR', 'DPER'],
      dtype='object')

Next, I am fitting the model. The first thing that concerns me is the number of observation which is only 21. Moreover, there is no stastically relevant explanatory variable, except for the constant, which shows that Prof. Fair's model is insuitable for predicting NYC mayoral elections. Even local CPI data doesn't explain voting behaviour. This could be because that local voter's base their decision more on political and local variables. 

In [11]:
import pandas as pd
import numpy as np
import statsmodels.api as sm

# 1) Columns required for this regression
required = ['year','democratic_pct','I','DPER','DUR','G','P']
missing = [c for c in required if c not in NYC.columns]
if missing:
    raise ValueError(f"Missing columns in NYC: {missing}")

# 2) Work on a copy; coerce to numeric
dfm = NYC.copy()
for c in required:
    dfm[c] = pd.to_numeric(dfm[c], errors='coerce')

# 3) Interactions
dfm['G_I'] = dfm['G'] * dfm['I']
dfm['P_I'] = dfm['P'] * dfm['I']
dfm['R_I'] = dfm['R'] * dfm['I']

# 4) Build X, y
X = dfm[['G_I','P_I','I','DPER','DUR']].astype(float)
y = dfm['democratic_pct'].astype(float)

# 5) Replace inf with NaN, then drop rows with any NaN in X or y
X = X.replace([np.inf, -np.inf], np.nan)
y = y.replace([np.inf, -np.inf], np.nan)

keep_mask = X.notna().all(axis=1) & y.notna()
dropped_years = dfm.loc[~keep_mask, 'year'].tolist()

X_clean = X.loc[keep_mask].copy()
y_clean = y.loc[keep_mask].copy()

# Optional: see what got dropped
print("Dropped years due to missing/inf in regressors or y:", dropped_years)

# 6) Add constant and fit
X_clean = sm.add_constant(X_clean, has_constant='add')
nyc_reg = sm.OLS(y_clean, X_clean).fit()
print(nyc_reg.summary())


Dropped years due to missing/inf in regressors or y: [1897, 1901, 1903, 1905, 1909, 1913, 1917, 1921, 1925, 1929, 1932, 1933, 1937, 1941, 2025]
                            OLS Regression Results                            
Dep. Variable:         democratic_pct   R-squared:                       0.437
Model:                            OLS   Adj. R-squared:                  0.249
Method:                 Least Squares   F-statistic:                     2.325
Date:                Fri, 31 Oct 2025   Prob (F-statistic):             0.0943
Time:                        18:55:24   Log-Likelihood:                -77.181
No. Observations:                  21   AIC:                             166.4
Df Residuals:                      15   BIC:                             172.6
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025

This gives the correlation matrix of the relevant variables of the model. 

In [None]:
NYC[cols].corr()