# GDP

This section tests whether income levels and macroeconomic dynamics are associated with fertility across countries and over time by examining GDP per capita (log scale and nominal USD) as level measures, GDP per-capita annual growth, and GDP deflator growth (“deflation”). We assess pooled and within-country correlations, add lags (1–3 years), and control for key covariates (e.g., marriage prevalence) to gauge independence.

* **H0 (log GDPpc):** There is **no statistical association** between fertility and **log GDP per capita**.
* **H0 (current USD):** There is **no statistical association** between fertility and **GDP per capita (current USD)**.
* **H0 (annual growth):** There is **no statistical association** between fertility and **GDP per-capita annual growth**.
* **H0 (deflator growth):** There is **no statistical association** between fertility and **GDP deflator growth**.


In [13]:
# Set up & Data Prep 

import numpy as np
import pandas as pd
from sqlalchemy import create_engine
from scipy.stats import pearsonr, spearmanr

# Load unified panel from your SQLite DB
engine = create_engine("sqlite:///analytics_panel.sqlite")
panel = pd.read_sql("SELECT * FROM v_panel_all;", con=engine)

# Construct log GDP per capita (for scale-stable analyses)
panel["log_gdppc"] = np.log(panel["current_usd"]).replace([np.inf, -np.inf], np.nan)

# Build GDP slice (keep only columns we need); no year cap unless you prefer one
gdp = (
    panel[[
        "iso_code","year","fertility_rate",
        "current_usd","log_gdppc",
        "annual_rate","deflation"
    ]]
    .dropna(subset=["fertility_rate"])
    .copy()
)

# Optional plausibility checks (comment out if you want raw)
# - fertility in a wide but realistic macro range
gdp = gdp[(gdp["fertility_rate"] >= 0) & (gdp["fertility_rate"] <= 10)]
# - percentage-like rates: keep generous bounds to avoid over-trimming
for col in ["annual_rate","deflation"]:
    if col in gdp:
        gdp[col] = pd.to_numeric(gdp[col], errors="coerce")
        gdp = gdp[(gdp[col].isna()) | (gdp[col].between(-50, 100))]

# Quick summary for the write-up
summary = {
    "rows_in_panel": len(panel),
    "rows_in_gdp_slice": len(gdp),
    "date_range_panel": (int(panel["year"].min()), int(panel["year"].max())),
    "date_range_gdp": (int(gdp["year"].min()), int(gdp["year"].max())),
    "non_null_fraction_current_usd": round(panel["current_usd"].notna().mean()*100, 1),
    "non_null_fraction_log_gdppc": round(panel["log_gdppc"].notna().mean()*100, 1),
    "non_null_fraction_annual_rate": round(panel["annual_rate"].notna().mean()*100, 1),
    "non_null_fraction_deflation": round(panel["deflation"].notna().mean()*100, 1),
}
summary


{'rows_in_panel': 17927,
 'rows_in_gdp_slice': 16673,
 'date_range_panel': (1960, 2024),
 'date_range_gdp': (1960, 2023),
 'non_null_fraction_current_usd': 81.1,
 'non_null_fraction_log_gdppc': 81.1,
 'non_null_fraction_annual_rate': 78.7,
 'non_null_fraction_deflation': 78.2}

In [14]:
#  GDP coverage by country 

# Unique countries (panel vs GDP slice)
n_countries_panel = panel["iso_code"].nunique(dropna=True)
n_countries_gdp   = gdp["iso_code"].nunique(dropna=True)

print(f"Unique countries in PANEL:    {n_countries_panel}")
print(f"Unique countries in GDP slice:{n_countries_gdp}")

# Per-country non-null counts for each GDP variable
coverage = (
    gdp.groupby("iso_code", dropna=True)
       .agg(
           rows=("iso_code", "size"),
           n_current_usd=("current_usd", lambda s: s.notna().sum()),
           n_log_gdppc=("log_gdppc", lambda s: s.notna().sum()),
           n_annual_rate=("annual_rate", lambda s: s.notna().sum()),
           n_deflation=("deflation", lambda s: s.notna().sum()),
       )
       .assign(
           has_current_usd=lambda d: (d["n_current_usd"] > 0).astype(int),
           has_log_gdppc=lambda d: (d["n_log_gdppc"] > 0).astype(int),
           has_annual_rate=lambda d: (d["n_annual_rate"] > 0).astype(int),
           has_deflation=lambda d: (d["n_deflation"] > 0).astype(int),
       )
)

# How many countries have ANY data for each GDP variable
print("\nCountries with ANY data by variable:")
print(" current_usd :", int(coverage["has_current_usd"].sum()))
print(" log_gdppc   :", int(coverage["has_log_gdppc"].sum()))
print(" annual_rate :", int(coverage["has_annual_rate"].sum()))
print(" deflation   :", int(coverage["has_deflation"].sum()))

# Countries missing each GDP variable entirely (useful for debugging)
missing_current_usd = coverage.index[coverage["has_current_usd"] == 0].tolist()
missing_log_gdppc   = coverage.index[coverage["has_log_gdppc"] == 0].tolist()
missing_annual_rate = coverage.index[coverage["has_annual_rate"] == 0].tolist()
missing_deflation   = coverage.index[coverage["has_deflation"] == 0].tolist()

print("\nMissing entirely (country lists):")
print(" current_usd :", missing_current_usd[:20], "..." if len(missing_current_usd) > 20 else "")
print(" log_gdppc   :", missing_log_gdppc[:20], "..." if len(missing_log_gdppc) > 20 else "")
print(" annual_rate :", missing_annual_rate[:20], "..." if len(missing_annual_rate) > 20 else "")
print(" deflation   :", missing_deflation[:20], "..." if len(missing_deflation) > 20 else "")

# Optional: a compact table sorted by weakest coverage
coverage_sorted = coverage.sort_values(
    by=["has_current_usd","has_log_gdppc","has_annual_rate","has_deflation","rows"],
    ascending=[True, True, True, True, True]
)
#coverage_sorted.head(15)


Unique countries in PANEL:    278
Unique countries in GDP slice:265

Countries with ANY data by variable:
 current_usd : 262
 log_gdppc   : 262
 annual_rate : 262
 deflation   : 261

Missing entirely (country lists):
 current_usd : ['GIB', 'PRK', 'VGB'] 
 log_gdppc   : ['GIB', 'PRK', 'VGB'] 
 annual_rate : ['GIB', 'PRK', 'VGB'] 
 deflation   : ['GIB', 'MAF', 'PRK', 'VGB'] 


Good coverage, not as good as marriage percentage data tho

In [15]:
# Pooled correlations

def corr_tests(df, x, y="fertility_rate", min_n=10):
    sub = df[[y, x]].dropna()
    n = len(sub)
    if n < min_n:
        return {"n": n, "pearson_r": np.nan, "pearson_p": np.nan,
                "spearman_r": np.nan, "spearman_p": np.nan, "var": x}
    r_p, p_p = pearsonr(sub[y].to_numpy(), sub[x].to_numpy())
    r_s, p_s = spearmanr(sub[y].to_numpy(), sub[x].to_numpy())
    return {"n": n, "pearson_r": float(r_p), "pearson_p": float(p_p),
            "spearman_r": float(r_s), "spearman_p": float(p_s), "var": x}

vars_pooled = ["log_gdppc", "current_usd", "annual_rate", "deflation"]
pooled_rows = [corr_tests(gdp, v) for v in vars_pooled]
gdp_pooled_sig = pd.DataFrame(pooled_rows)[["var","n","pearson_r","pearson_p","spearman_r","spearman_p"]]
gdp_pooled_sig


Unnamed: 0,var,n,pearson_r,pearson_p,spearman_r,spearman_p
0,log_gdppc,14071,-0.783999,0.0,-0.810876,0.0
1,current_usd,14071,-0.444698,0.0,-0.810876,0.0
2,annual_rate,13625,-0.081293,2.022921e-21,-0.131161,2.410824e-53
3,deflation,13529,0.109343,2.916317e-37,0.177656,2.419586e-96


log_gdppc: Reject H0
Interpretation: Very strong negative association—higher income levels relate to lower fertility. (Log scale linearizes the relationship.)

current_usd: Reject H0
Note: Spearman equals log_gdppc’s because log is a monotonic transform (ranks unchanged). Prefer log_gdppc for modeling.

annual_rate (GDP pc growth): → Reject H0
Interpretation: Statistically significant but small negative association; growth alone explains little of fertility variation.

deflation (GDP deflator growth): Reject H0
Interpretation: Small positive association; price-level changes correlate weakly with fertility in pooled data.

In [16]:
# Within-country (demeaned)
def demean_by_country(df, cols):
    out = df.copy()
    for c in cols:
        out[c] = out[c] - out.groupby("iso_code")[c].transform("mean")
    return out

gdp_w = demean_by_country(gdp, ["fertility_rate","log_gdppc","current_usd","annual_rate","deflation"])

wc_rows = [corr_tests(gdp_w, v) for v in vars_pooled]
gdp_within_sig = pd.DataFrame(wc_rows)[["var","n","pearson_r","pearson_p","spearman_r","spearman_p"]]
gdp_within_sig


Unnamed: 0,var,n,pearson_r,pearson_p,spearman_r,spearman_p
0,log_gdppc,14071,-0.724846,0.0,-0.726488,0.0
1,current_usd,14071,-0.235246,3.384932e-176,-0.544663,0.0
2,annual_rate,13625,0.026545,0.001943353,-0.009119,0.2871804
3,deflation,13529,0.070454,2.319482e-16,0.13988,4.4281260000000004e-60


log_gdppc: Reject H0.
Strong within-country negative association: as a country becomes richer (per capita), its fertility tends to decline.

current_usd: Reject H0.
Same direction but weaker linearly; prefer log_gdppc for scale/linearity.

annual_rate: Mixed/weak.
Statistically detectable due to large n, but negligible and not robust across metrics → treat as no practical effect.

deflation (deflator growth): Reject H0, but small.
Price changes show a tiny positive association with fertility within countries.

Takeaway: Within countries over time, income level (log GDPpc) shows a strong, consistent negative relationship with fertility; growth and deflator effects are small to negligible in practice.

In [17]:
# Lags
gdp_l = gdp.sort_values(["iso_code","year"]).copy()
lag_vars = ["log_gdppc", "annual_rate", "deflation"]  # (you can add current_usd if desired)

for v in lag_vars:
    for k in (1,2,3):
        gdp_l[f"{v}_lag{k}"] = gdp_l.groupby("iso_code")[v].shift(k)

lag_rows = []
for v in lag_vars:
    for k in (1,2,3):
        x = f"{v}_lag{k}"
        lag_rows.append(corr_tests(gdp_l, x))
gdp_lags_sig = (pd.DataFrame(lag_rows)
                [["var","n","pearson_r","pearson_p","spearman_r","spearman_p"]]
                .sort_values(["var"]))
gdp_lags_sig


Unnamed: 0,var,n,pearson_r,pearson_p,spearman_r,spearman_p
3,annual_rate_lag1,13376,-0.080687,9.084873e-21,-0.130762,4.309521e-52
4,annual_rate_lag2,13120,-0.07707,9.563408999999999e-19,-0.124017,3.944267e-46
5,annual_rate_lag3,12863,-0.065703,8.712835e-14,-0.109451,1.404667e-35
6,deflation_lag1,13280,0.103527,5.6599510000000005e-33,0.174813,1.2891e-91
7,deflation_lag2,13024,0.104253,8.361419000000001e-33,0.179641,6.80194e-95
8,deflation_lag3,12768,0.101901,7.976872e-31,0.177298,1.1501879999999998e-90
0,log_gdppc_lag1,13822,-0.780965,0.0,-0.808231,0.0
1,log_gdppc_lag2,13565,-0.778069,0.0,-0.80587,0.0
2,log_gdppc_lag3,13306,-0.77473,0.0,-0.803348,0.0


Results (pooled over countries/years):

log_gdppc: Reject H0: very strong, stable negative association across lags.

annual_rate: Reject H0, but associations are small and positive.

Interpretation:

The income level signal (log GDPpc) is consistently large and negative even when lagged—countries with higher (and persistently higher) income tend to have lower fertility.

Growth and price change measures show tiny effects; statistically significant but not practically large.

Caveat: These are pooled correlations. The strong lagged links for log GDPpc largely reflect between-country differences that persist over time, not causal timing. Use within-country tests and partials to assess independent, time-varying relationships.

In [18]:
# Partial correlations controlling for marriage 
# Ensure we have married_percentage by merging from the full panel
if "married_percentage" not in gdp.columns:
    gdp_m = gdp.merge(
        panel[["iso_code","year","married_percentage"]],
        on=["iso_code","year"], how="left"
    )
else:
    gdp_m = gdp.copy()

# Generic partial-correlation helper: partial r(X, Y | Z) with p-value
def partial_corr(df, x, y, z, min_n=10):
    sub = df[[x, y, z]].dropna()
    n = len(sub)
    if n < min_n:
        return {"x": x, "y": y, "z": z, "n": n, "partial_r": np.nan, "partial_p": np.nan}
    # residualize y ~ z
    b1y, b0y = np.polyfit(sub[z], sub[y], 1); y_res = sub[y] - (b1y*sub[z] + b0y)
    # residualize x ~ z
    b1x, b0x = np.polyfit(sub[z], sub[x], 1); x_res = sub[x] - (b1x*sub[z] + b0x)
    r, p = pearsonr(y_res.to_numpy(), x_res.to_numpy())
    return {"x": x, "y": y, "z": z, "n": n, "partial_r": float(r), "partial_p": float(p)}

# GDP net of marriage (does GDP add signal beyond marriage?)
rows_gdp_net_mar = []
for v in ["log_gdppc","current_usd","annual_rate","deflation"]:
    rows_gdp_net_mar.append(partial_corr(gdp_m, x=v, y="fertility_rate", z="married_percentage"))

gdp_partial_marriage = pd.DataFrame(rows_gdp_net_mar)[["x","n","partial_r","partial_p"]]
gdp_partial_marriage.rename(columns={"x":"var"}, inplace=True)
gdp_partial_marriage

# Marriage net of GDP level (does marriage add signal beyond income?)
rows_mar_net_gdp = []
for v in ["log_gdppc","current_usd"]:
    # partial r(fertility, married% | GDP measure)
    rows_mar_net_gdp.append(partial_corr(gdp_m, x="married_percentage", y="fertility_rate", z=v))

marriage_partial_gdp = pd.DataFrame(rows_mar_net_gdp)[["z","n","partial_r","partial_p"]]
marriage_partial_gdp.rename(columns={"z":"controlled_for"}, inplace=True)
marriage_partial_gdp


Unnamed: 0,controlled_for,n,partial_r,partial_p
0,log_gdppc,9714,0.18952,3.073348e-79
1,current_usd,9714,0.403779,0.0


Marriage retains a positive, independent association with fertility even after holding income constant—small-to-moderate when controlling for log GDPpc (r≈0.19; ~3.6% variance), and moderate when controlling for raw USD (r≈0.40; ~16% variance).

The stronger partial using current USD reflects scale/nonlinearity; log GDPpc is the appropriate control. With that proper control, the marriage effect is still statistically robust and substantively meaningful at macro scale.

In [19]:
# First differences for dynamics 
def corr_cols(df, xcol, ycol="d_fertility"):
    sub = df[[ycol, xcol]].dropna()
    if len(sub) < 10:
        return {"var": xcol, "n": len(sub), "pearson_r": np.nan, "pearson_p": np.nan,
                "spearman_r": np.nan, "spearman_p": np.nan}
    r_p, p_p = pearsonr(sub[ycol].to_numpy(), sub[xcol].to_numpy())
    r_s, p_s = spearmanr(sub[ycol].to_numpy(), sub[xcol].to_numpy())
    return {"var": xcol, "n": len(sub), "pearson_r": float(r_p), "pearson_p": float(p_p),
            "spearman_r": float(r_s), "spearman_p": float(p_s)}

gdp_d = gdp.sort_values(["iso_code","year"]).copy()
gdp_d["d_fertility"] = gdp_d.groupby("iso_code")["fertility_rate"].diff()
for v in ["log_gdppc","annual_rate","deflation"]:
    gdp_d[f"d_{v}"] = gdp_d.groupby("iso_code")[v].diff()

delta_rows = [corr_cols(gdp_d, f"d_{v}") for v in ["log_gdppc","annual_rate","deflation"]]
gdp_delta_sig = pd.DataFrame(delta_rows)[["var","n","pearson_r","pearson_p","spearman_r","spearman_p"]]
gdp_delta_sig


Unnamed: 0,var,n,pearson_r,pearson_p,spearman_r,spearman_p
0,d_log_gdppc,13802,0.054554,1.424389e-10,0.048069,1.604203e-08
1,d_annual_rate,13362,0.011768,0.1737588,-0.019051,0.0276536
2,d_deflation,13267,0.017481,0.04406405,0.009674,0.2652112


Interpretation & decision:
Short-run changes in GDP variables show negligible association with short-run fertility changes. The only statistically significant signals (Δlog GDPpc and a marginal Δdeflator/Δgrowth result) have very small magnitudes (|r| ≈ 0.01–0.05) and would likely not survive multiple-testing correction. Practically, GDP levels (especially log GDPpc) matter far more than year-to-year macro fluctuations.

### Conclusion: “Money isn’t the root issue”

Across our separate analyses, income level (log GDP per capita) has a strong, negative association with fertility (pooled and within-country), but macroeconomic dynamics—annual growth and deflator/price changes—show only tiny, inconsistent links. If a purely economic story (“it’s the economy”) were the root driver, we’d expect consistent strength across level and flow variables; we don’t see that. Instead, the evidence points to a broader development bundle—income co-moving with education, urbanization, housing constraints, childcare costs, women’s opportunities/time costs, and especially family-formation patterns—as the real backdrop. This aligns with our domain rankings: Marriage prevalence shows the strongest, robust within-country association with fertility; GDP level is also strong; Politics/Safety is strong only cross-sectionally and fades with controls; Employment is weak. Taken together, “money” (short-run macro conditions) isn’t the root issue—family formation timing/structure and long-run development context** are more central to fertility decline.