# Tannenbaum Paper

## PISA vs. Survey Data

The Tannenbaum paper aimed to validate survey measures of social capital using their correlation with wallet report rates. Since PISA education scores are not explicit and direct measurements of social capital, it's not clear that they could also be seen as validating the survey measures referenced in the paper. However, as can be seen below, the correlations are surprisingly consistent between wallet reporting rates/PISA scores and survey measures. This suggests that they not only contain the same *amount* of information about social capital but also the same *type* of information about social capital.

In [None]:
import pandas as pd
import plotly.express as px
import plotly.io as pio

pio.templates.default = "plotly_white"
pio.renderers.default = "sphinx_gallery"

import statsmodels.api as sm

# Data import
survey_cols = [
    "general_trust",
    "GPS_trust",
    "general_morality",
    "MFQ_genmorality",
    "civic_cooperation",
    "GPS_posrecip",
    "GPS_altruism",
    "stranger1",
]

cat_cols = [
    "country",
    "response",
    "male",
    "above40",
    "computer",
    "coworkers",
    "other_bystanders",
    "institution",
    "cond",
    "security_cam",
    "security_guard",
    "local_recipient",
    "no_english",
    "understood_situation",
]

sc_cols = [
    "log_gdp",
    "log_tfp",
    "gee",
    "letter_grading",
]

# Import Tannenbaum data
df = pd.read_csv(
    "../data/tannenbaum_data.csv",
    dtype={col: "category" for col in cat_cols},
)

# Import PISA data
pisa = pd.read_csv("../data/pisa_data.csv").rename(columns={"mean_score": "pisa_score"})
df = df.merge(pisa, how="left", on="country")

# Columns we want to see correlations for.
cols_for_country_avg_corr = ["response", "pisa_score"] + survey_cols

df_corr = df.copy().astype({"response": int})

# Calculate country averages for these measures
country_avg_data = df_corr.groupby("country")[cols_for_country_avg_corr].mean()

# Compute the correlation matrix
comprehensive_corr_matrix = country_avg_data.corr()

# Show correlations of interest
comprehensive_corr_matrix.columns = pd.MultiIndex.from_product(
    [["Correlation (r)"], comprehensive_corr_matrix.columns]
)
comprehensive_corr_matrix.iloc[:2, 2:]

Unnamed: 0_level_0,Correlation (r),Correlation (r),Correlation (r),Correlation (r),Correlation (r),Correlation (r),Correlation (r),Correlation (r)
Unnamed: 0_level_1,general_trust,GPS_trust,general_morality,MFQ_genmorality,civic_cooperation,GPS_posrecip,GPS_altruism,stranger1
response,0.603736,0.02351,0.612047,0.461323,0.391755,0.050279,-0.214705,0.645001
pisa_score,0.611033,0.127325,0.674059,0.419857,0.441642,-0.157111,-0.162403,0.659283


In [5]:
# Reshape dataframe for graphing ease.
df_reshaped = country_avg_data.reset_index().melt(
    id_vars=["country", "response", "pisa_score"]
)

# Calculate sample size for each survey measure and wallet report rates 
ens_wallet = pd.DataFrame(
    {
        col: country_avg_data[["response", col]].dropna().shape[0]
        for col in survey_cols
    },
    index=["N"],
)

# Wallet report rate vs survey measure facet plot.
fig = px.scatter(
    df_reshaped,
    x="value",
    y="response",
    facet_col="variable",
    facet_col_wrap=4,
    trendline="ols",
    facet_col_spacing=0.06,
    facet_row_spacing=0.15,
)
fig.update_xaxes(showline=True, linecolor="darkgray")
fig.update_yaxes(showline=True, linecolor="darkgray")
fig.for_each_annotation(
    lambda a: a.update(
        text=a.text.split("=")[-1]
        + " (N="
        + str(ens_wallet.loc["N", a.text.split("=")[-1]])
        + ")"
    )
)
fig.show()

The facet plot above is a replication of Figure 3 from **Tannenbaum** and can be compared with the plot below that has PISA scores instead of wallet return rates on the y-axes.

In [4]:
# Calculate sample size for each survey measure and PISA 
ens_pisa = pd.DataFrame(
    {
        col: country_avg_data[["pisa_score", col]].dropna().shape[0]
        for col in survey_cols
    },
    index=["N"],
)

# PISA vs Survey measure facet plot
fig = px.scatter(
    df_reshaped,
    x="value",
    y="pisa_score",
    facet_col="variable",
    facet_col_wrap=4,
    trendline="ols",
    facet_col_spacing=0.06,
    facet_row_spacing=0.15,
)
fig.update_xaxes(showline=True, linecolor="darkgray")
fig.update_yaxes(showline=True, linecolor="darkgray")
fig.for_each_annotation(
    lambda a: a.update(
        text=a.text.split("=")[-1]
        + " (N="
        + str(ens_pisa.loc["N", a.text.split("=")[-1]])
        + ")"
    )
)

fig.show()

## PISA as a Predictor of Economic and Institutional Performance
To address the second topic of the Tannenbaum paper, we ask how well do PISA scores (as compared to lost wallet reporting rates) explain variation in economic development?

There are four measures of "Economic and Institutional Performance" in the second part of the paper: GDP per capita (`log_gdp`), productivity(`log_tfp`), government effectiveness (`gee`), and letter grade efficiency (`letter_grading`). If PISA scores are considered a fifth measure of the same sort, we find that wallet reporting rates are an equally effective predictor. When combined with any other measure of social capital, the coefficient for wallet reporting rates are always statistically significant with p<0.01 and with $R^2$ greater than most of the other fit models.

### Regression Results
The table below seeks to replicate part of Table 2 from **Tannenbaum** and adds two new columns (Model 9 and Model 10) which contain the respective OLS model results where PISA scores are predicted. Models 7 and 8 are recreated here to show that the table is generated with the same process that generated Table 2 in **Tannenbaum**.

In [1]:
import pandas as pd
import statsmodels.api as sm
from great_tables import GT

# Data import
survey_cols = [
    "general_trust",
    "GPS_trust",
    "general_morality",
    "MFQ_genmorality",
    "civic_cooperation",
    "GPS_posrecip",
    "GPS_altruism",
    "stranger1",
]

econ_cols = [
    "log_gdp",
    "log_tfp",
    "gee",
    "letter_grading",
]

df = pd.read_csv(
    "../data/tannenbaum_data.csv",
)

# Add PISA data
pisa = pd.read_csv("../data/pisa_data.csv")
pisa = pisa.loc[pisa['year'] == 2015]
pisa = pisa.groupby('country')['pisa_score'].mean().reset_index()
df = df.merge(pisa, how="left", on="country")


# p-value stars to award to each parameter coef estimate.
def stars(p):
    if p > 0.1:
        return ""
    elif p > 0.05:
        return "*"
    elif p > 0.01:
        return "**"
    else:
        return "***"


# Run regression for each survey measure.
def get_model_results_no_pred(survey_measure, econ_measure):
    regression_df = (
        df.groupby("country")[[econ_measure, survey_measure]].mean().dropna()
    )

    y = regression_df[econ_measure]
    X = regression_df[[survey_measure]]

    # Standardize predictors
    X_std = (X - X.mean()) / X.std()
    X_std = sm.add_constant(X_std)

    model = sm.OLS(y, X_std)
    results = model.fit(cov_type="HC1")  # Robust standard errors same as in Tannenbaum
    result_df = (
        pd.DataFrame(
            {
                "param": pd.Series(
                    [
                        f"{v:.3f}{stars(p)}"
                        for v, p in zip(results.params[1:], results.pvalues[1:])
                    ],
                    index=results.params.index[1:],
                ),
                "se": results.bse[1:].apply(lambda x: f"({x:.3f})"),
            }
        )
        .astype({"param": object, "se": object})
        .stack()
    )
    result_df.loc[("<i>N<i>", "")] = X.shape[0]
    result_df.loc[("<i>R<i><sup>2</sup>", "")] = f"{results.rsquared:.3f}"
    result_df = result_df.reset_index()
    result_df["measure"] = survey_measure

    return result_df


# Run regression for each survey measure with predictor variable.
def get_model_results(survey_measure, econ_measure):
    regression_df = (
        df.groupby("country")[["response", econ_measure, survey_measure]]
        .mean()
        .dropna()
    )

    y = regression_df[econ_measure]
    X = regression_df[[survey_measure, "response"]]

    # Standardize predictors
    X_std = (X - X.mean()) / X.std()
    X_std = sm.add_constant(X_std)

    model = sm.OLS(y, X_std)
    results = model.fit(cov_type="HC1")  # Robust standard errors same as in Tannenbaum

    result_df = (
        pd.DataFrame(
            {
                "param": pd.Series(
                    [
                        f"{v:.3f}{stars(p)}"
                        for v, p in zip(results.params[1:], results.pvalues[1:])
                    ],
                    index=results.params.index[1:],
                ),
                "se": results.bse[1:].apply(lambda x: f"({x:.3f})"),
            }
        )
        .astype({"param": object, "se": object})
        .stack()
    )
    result_df.loc[("<i>N<i>", "")] = X.shape[0]
    result_df.loc[("<i>R<i><sup>2</sup>", "")] = f"{results.rsquared:.3f}"
    result_df = result_df.reset_index()
    result_df["measure"] = survey_measure

    return result_df


model_7_results = [
    get_model_results_no_pred(col, "letter_grading") for col in survey_cols
]
model_7 = pd.concat(model_7_results)
model_7 = model_7.rename(columns={0: "Model 7"})

model_8_results = [get_model_results(col, "letter_grading") for col in survey_cols]
model_8 = pd.concat(model_8_results)
model_8 = model_8.rename(columns={0: "Model 8"})

model_9_results = [get_model_results_no_pred(col, "pisa_score") for col in survey_cols]
model_9 = pd.concat(model_9_results)
model_9 = model_9.rename(columns={0: "Model 9"})

model_10_results = [get_model_results(col, "pisa_score") for col in survey_cols]
model_10 = pd.concat(model_10_results)
model_10 = model_10.rename(columns={0: "Model 10"})

# Combine results and make pretty.
display_df = (
    model_7.merge(model_8, on=["level_0", "level_1", "measure"], how="right")
    .merge(model_9, on=["level_0", "level_1", "measure"], how="left")
    .merge(model_10, on=["level_0", "level_1", "measure"], how="right")
    .iloc[:, [0, 2, 3, 4, 5, 6]]
)

display_df.loc[:, "level_0"] = display_df.loc[:, "level_0"].where(
    display_df.loc[:, "level_0"] != display_df.loc[:, "level_0"].shift(), ""
)

display_df
(
    GT(display_df)
    .tab_header(title="TABLE 2.—PREDICTIVE VALUE OF WALLET REPORTING RATES")
    .tab_stub(rowname_col="level_0", groupname_col="measure")
    .tab_spanner(label="Letter grade efficiency", columns=["Model 7", "Model 8"])
    .tab_spanner(label="PISA Score", columns=["Model 9", "Model 10"])
    .tab_options(
        table_body_hlines_style="none",
    )
    .cols_align(align="center", columns=["Model 7", "Model 8"])
)

TABLE 2.—PREDICTIVE VALUE OF WALLET REPORTING RATES,TABLE 2.—PREDICTIVE VALUE OF WALLET REPORTING RATES,TABLE 2.—PREDICTIVE VALUE OF WALLET REPORTING RATES,TABLE 2.—PREDICTIVE VALUE OF WALLET REPORTING RATES,TABLE 2.—PREDICTIVE VALUE OF WALLET REPORTING RATES
Unnamed: 0_level_1,Letter grade efficiency,Letter grade efficiency,PISA Score,PISA Score
Unnamed: 0_level_2,Model 7,Model 8,Model 9,Model 10
general_trust,general_trust,general_trust,general_trust,general_trust
general_trust,0.077*,-0.013,26.665***,9.099
,(0.041),(0.040),(4.461),(6.632)
response,,0.148***,,24.956***
,,(0.050),,(5.819)
N,39,39,32,32
R2,0.078,0.263,0.455,0.656
GPS_trust,GPS_trust,GPS_trust,GPS_trust,GPS_trust
GPS_trust,-0.016,-0.018,3.309,4.477
,(0.050),(0.039),(7.499),(3.642)
