# PISA vs. Survey Data

The Tannenbaum paper aimed to validate survey measures of social capital using their correlation with wallet report rates. Since PISA education scores are not explicit and direct measurements of social capital, it's not clear that they could also be seen as validating the survey measures referenced in the paper. However, as can be seen below, the correlations are surprisingly consistent between wallet reporting rates/PISA scores and survey measures. This suggests that they not only contain the same *amount* of information about social capital but also the same *type* of information about social capital.

In [None]:
import pandas as pd
import plotly.express as px
import plotly.io as pio

pio.templates.default = "plotly_white"
pio.renderers.default = "sphinx_gallery"

import statsmodels.api as sm

# Data import
survey_cols = [
    "general_trust",
    "GPS_trust",
    "general_morality",
    "MFQ_genmorality",
    "civic_cooperation",
    "GPS_posrecip",
    "GPS_altruism",
    "stranger1",
]

cat_cols = [
    "country",
    "response",
    "male",
    "above40",
    "computer",
    "coworkers",
    "other_bystanders",
    "institution",
    "cond",
    "security_cam",
    "security_guard",
    "local_recipient",
    "no_english",
    "understood_situation",
]

sc_cols = [
    "log_gdp",
    "log_tfp",
    "gee",
    "letter_grading",
]

# Import Tannenbaum data
df = pd.read_csv(
    "../data/tannenbaum_data.csv",
    dtype={col: "category" for col in cat_cols},
)

# Import PISA data
pisa = pd.read_csv("../data/pisa_data.csv").rename(columns={"mean_score": "pisa_score"})
df = df.merge(pisa, how="left", on="country")

# Columns we want to see correlations for.
cols_for_country_avg_corr = ["response", "pisa_score"] + survey_cols

df_corr = df.copy().astype({"response": int})

# Calculate country averages for these measures
country_avg_data = df_corr.groupby("country")[cols_for_country_avg_corr].mean()

# Compute the correlation matrix
comprehensive_corr_matrix = country_avg_data.corr()

# Show correlations of interest
comprehensive_corr_matrix.columns = pd.MultiIndex.from_product(
    [["Correlation (r)"], comprehensive_corr_matrix.columns]
)
comprehensive_corr_matrix.iloc[:2, 2:]

Unnamed: 0_level_0,Correlation (r),Correlation (r),Correlation (r),Correlation (r),Correlation (r),Correlation (r),Correlation (r),Correlation (r)
Unnamed: 0_level_1,general_trust,GPS_trust,general_morality,MFQ_genmorality,civic_cooperation,GPS_posrecip,GPS_altruism,stranger1
response,0.603736,0.02351,0.612047,0.461323,0.391755,0.050279,-0.214705,0.645001
pisa_score,0.611033,0.127325,0.674059,0.419857,0.441642,-0.157111,-0.162403,0.659283


In [5]:
# Reshape dataframe for graphing ease.
df_reshaped = country_avg_data.reset_index().melt(
    id_vars=["country", "response", "pisa_score"]
)

# Calculate sample size for each survey measure and wallet report rates 
ens_wallet = pd.DataFrame(
    {
        col: country_avg_data[["response", col]].dropna().shape[0]
        for col in survey_cols
    },
    index=["N"],
)

# Wallet report rate vs survey measure facet plot.
fig = px.scatter(
    df_reshaped,
    x="value",
    y="response",
    facet_col="variable",
    facet_col_wrap=4,
    trendline="ols",
    facet_col_spacing=0.06,
    facet_row_spacing=0.15,
)
fig.update_xaxes(showline=True, linecolor="darkgray")
fig.update_yaxes(showline=True, linecolor="darkgray")
fig.for_each_annotation(
    lambda a: a.update(
        text=a.text.split("=")[-1]
        + " (N="
        + str(ens_wallet.loc["N", a.text.split("=")[-1]])
        + ")"
    )
)
fig.show()

The facet plot above is a replication of Figure 3 from **Tannenbaum** and can be compared with the plot below that has PISA scores instead of wallet return rates on the y-axes.

In [4]:
# Calculate sample size for each survey measure and PISA 
ens_pisa = pd.DataFrame(
    {
        col: country_avg_data[["pisa_score", col]].dropna().shape[0]
        for col in survey_cols
    },
    index=["N"],
)

# PISA vs Survey measure facet plot
fig = px.scatter(
    df_reshaped,
    x="value",
    y="pisa_score",
    facet_col="variable",
    facet_col_wrap=4,
    trendline="ols",
    facet_col_spacing=0.06,
    facet_row_spacing=0.15,
)
fig.update_xaxes(showline=True, linecolor="darkgray")
fig.update_yaxes(showline=True, linecolor="darkgray")
fig.for_each_annotation(
    lambda a: a.update(
        text=a.text.split("=")[-1]
        + " (N="
        + str(ens_pisa.loc["N", a.text.split("=")[-1]])
        + ")"
    )
)

fig.show()