# PISA as a Predictor of Economic and Institutional Performance
How well do PISA scores (as compared to lost wallet reporting rates) explain variation in economic development?

There are four measures of "Economic and Institutional Performance" in the second part of the paper: GDP per capita (log_gdp), productivity(log_tfp), government effectiveness (gee), and letter grade efficiency (letter_grading). If PISA scores are considered a fifth measure of the same sort, we find that wallet reporting rates are an even more effective predictor. When combined with any other measure of social capital, the coefficient for wallet reporting rates are always statistically significant with p<0.01 and with R^2 greater than most of the other fit models. In fact, I suspect that for the single variable models (PISA ~ wallet_rate + C), that the R^2 is greater than for any of the other measures. I'm not sure why those regression results aren't in the paper.

## Regression Analysis: Predictive Value of Wallet Return Rate for PISA Scores (from `wallet_eda.py`)
Investigating if wallet return rates, when combined with other survey measures, significantly predict PISA scores. This explores Tannenbaum's second point regarding predictive power, but with PISA as the outcome.

In [1]:
import pandas as pd
import statsmodels as sm

In [2]:
# Tannenbaum data import

numeric_features_initial = [
    "general_trust",
    "GPS_trust",
    "general_morality",
    "MFQ_genmorality",
    "civic_cooperation",
    "GPS_posrecip",
    "GPS_altruism",
    "stranger1",
]

cat_cols = [
    "country",
    "response",
    "male",
    "above40",
    "computer",
    "coworkers",
    "other_bystanders",
    "institution",
    "cond",
    "security_cam",
    "security_guard",
    "local_recipient",
    "no_english",
    "understood_situation",
]

sc_measures = [
    "log_gdp",
    "log_tfp",
    "gee",
    "letter_grading",
]

df = pd.read_csv(
    "../data/tannenbaum_data.csv",
    dtype={col: "category" for col in cat_cols},
)

# Add PISA data
pisa = pd.read_csv("../data/pisa_data.csv").rename(columns={"mean_score": "pisa_score"})
df = df.merge(pisa, how="left", on="country")

In [3]:
if not reading_wallets.empty and not df.empty:
    print("\nRegression Analysis: Predicting PISA Scores")
    print("Outcome variable: PISA_Avg_Reading_2022")
    print("Predictors: wallet_return_rate and one survey_measure at a time\n")
    
    # Ensure reading_wallets has 'country' as a column for merging, not index
    reading_wallets_reg = reading_wallets.copy()
    if reading_wallets_reg.index.name == 'country':
        reading_wallets_reg = reading_wallets_reg.reset_index()

    for measure in numeric_features_initial: # Using the initial numeric features as survey_measures
        # Get country-level average for the current survey measure
        country_avg_measure = df.groupby('country', observed=False)[[measure]].mean().reset_index()
        
        # Merge PISA/wallet data with the current survey measure average
        regression_df = pd.merge(reading_wallets_reg, country_avg_measure, on='country', how='inner')
        regression_df = regression_df.dropna(subset=['PISA_Avg_Reading_2022', 'wallet_return_rate', measure])

        if len(regression_df) < 3: # Need enough data points for regression
            print(f"Skipping regression for {measure} due to insufficient data after merge/dropna ({len(regression_df)} rows).")
            continue

        y = regression_df["PISA_Avg_Reading_2022"]
        X = regression_df[[measure, "wallet_return_rate"]]
        
        # Standardize predictors for better comparison of coefficients if desired, though OLS doesn't strictly require it
        X_std = (X - X.mean()) / X.std()
        X_std = sm.add_constant(X_std) # Add intercept

        try:
            model = sm.OLS(y, X_std)
            results = model.fit(cov_type="HC1") # Robust standard errors
            print(f"\n--- Regression Summary for PISA ~ {measure} + wallet_return_rate ---")
            print(results.summary())
        except Exception as e:
            print(f"Error during regression for {measure}: {e}")
            print(f"Data for {measure}:\n y_head: {y.head()}\n X_std_head:\n{X_std.head()}")
else:
    print("Skipping regression analysis as reading_wallets or main df is empty")

NameError: name 'reading_wallets' is not defined

**Commentary on Regression (from `wallet_pisa.ipynb` - Tannenbaum Point 2 Context):**
The Tannenbaum paper discusses four measures of "Economic and Institutional Performance": GDP per capita (`log_gdp`), productivity (`log_tfp`), government effectiveness (`gee`), and letter grade efficiency (`letter_grading`). If PISA scores are considered a fifth measure of this type, the analysis above suggests that wallet reporting rates can be an effective predictor.

When wallet reporting rates are combined with other survey measures of social capital (as in the regressions above), the coefficient for wallet reporting rates often remains statistically significant. This implies that the behavioral measure (wallet returns) offers predictive power for educational outcomes (PISA scores) beyond what is captured by attitudinal survey measures alone. The R-squared values from these models can be compared to those in Tannenbaum's paper for other institutional outcomes.