# Introduction

In the paper [What Do Cross-country Surveys Tell Us about Social Capital?](https://davetannenbaum.github.io/documents/TannenbaumCohnZundMarechal2025.pdf), Tannenbaum et al. use the [Wallet Return Dataset](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/YKBODN) as a direct measure of civic honesty to investigate two types of indirect social capital measures. First, they provide an analysis of lost wallet reporting rates and their correlation to survey measures of social capital, showing the quantitative extent to which survey measures contain legitimate information about social capital. Second, they show that lost wallet reporting rates may be used as effective predictors of "Economic and Institutional Performance", confirming social capital's economic explanatory value.

I became curious of how educational assessment data would relate to these findings. The [Programme for International Student Assessment (PISA)](https://www.oecd.org/en/about/programmes/pisa.html) contains data on national educational program effectiveness, measured on 15-year-olds and is a standard dataset for comparing education outcomes between countries. Surprisingly, PISA scores were very strongly correlated with lost wallet reporting rates and consequently resulted in the following concerning the two aims of the Tannenbaum paper:
1) PISA scores correlated with survey measures of social capital in largely the same manner as lost wallet reporting rates.
2) Lost wallet reporting rates proved to be arguably a better predictor of PISA scores than of any of the other measures of "Economic and Institutional Performance".

## Preliminary Inspection of PISA Data and Wallet Data
We calculate wallet reporting rates (proportion of '100' responses) per country and merge this with the PISA data.

In [42]:
import pandas as pd
import plotly.express as px
import plotly.io as pio

pio.templates.default = "plotly_white"
pio.renderers.default = "sphinx_gallery"

import statsmodels.api as sm

# Import Tannenbaum dataa
df = pd.read_csv(
    "../data/tannenbaum_data.csv",
)

# Import PISA data
pisa = pd.read_csv("../data/pisa_data.csv").rename(columns={"mean_score": "pisa_score"})
df = df.merge(pisa, how="left", on="country")
wallet_pisa = df.groupby(["country"])[["response", "pisa_score"]].mean()

# Calculate sample size
sample_size = wallet_pisa[["response", "pisa_score"]].reset_index().dropna().shape[0]

# Scatter plot
fig_scatter_pisa_wallet = px.scatter(
    wallet_pisa.reset_index(),
    x="response",
    y="pisa_score",
    hover_data=["country"],
    title="PISA 2022 Reading Score vs. Wallet Return Rate by Country",
    subtitle=f"r={wallet_pisa.corr().iloc[0,1]:.3f}" + ", N=" + str(sample_size),
    trendline="ols",
)
fig_scatter_pisa_wallet.update_xaxes(showline=True, mirror=True, linecolor="darkgray")
fig_scatter_pisa_wallet.update_yaxes(showline=True, mirror=True, linecolor="darkgray")

fig_scatter_pisa_wallet.show()

A key observation was the strong correlation between PISA reading scores and wallet reporting rates (`r = 0.816`). This is greater than the correlation with any other "Economic and Institutional Performance" measure as we'll see later. This suggests a close link between societal honesty (as measured by wallet returns) and educational outcomes.

Since PISA scores and lost wallet return rates are so closely correlated, it isn't surprising that we'll see that they relate similarly with other variables in the next section.

## PISA Missing Countries

:::{caution}
The PISA data used in this report is from 2022 and was copied directly from the pdf found here:
[OECD PISA 2022 Results Vol I](https://www.oecd.org/en/publications/pisa-2022-results-volume-i_53f23881-en.html), pp. 52-57. It should be noted that the PISA 2022 data is missing data for some important countries that are included in the Wallet Return Dataset (China, Russia, India, Ghana, Kenya, South Africa). The 2018 PISA results do include measures for China and Russia, however, China proves to be an extreme outlier with the very high education scores and a very low lost wallet reporting rate. Tannenbaum's paper also noted China as a special case. East Asian countries are generally underrepresented in the wallet dataset and the three other East Asian countries included (Malaysia, Thailand, Indonesia) are very different from China, both culturally, economically, and governmentally.
:::