# PISA vs. Survey Data
How do the correlations between the PISA data and survey measures of social capital compare to the correlations between lost wallet reporting rates and survey measures?

## Commentary on PISA, Wallets, and Survey Measures (from `wallet_pisa.ipynb`)

**Tannenbaum Point 1 Context:**
The Tannenbaum paper aimed to validate survey measures of social capital using wallet report rates. Similarly, PISA education measures could also be seen as validating these survey measures. It's interesting that the relationship between PISA scores and survey measures is very similar to that between wallet report rates and survey measures.

A key observation was the strong correlation between PISA reading scores and wallet reporting rates (rho ≈ 0.78 with 2022 data). This is greater than the correlation with many other "Economic and Institutional Performance" measures. This suggests a close link between societal honesty (as measured by wallet returns) and educational outcomes.

**Takeaways/Thoughts:**
- Both honesty and education are challenging to influence directly through policy. The strong correlation might suggest that familial and community factors (components of social capital) are highly influential for educational outcomes.
- Wallet report rates are a direct behavioral measure of social capital. While PISA scores are measures of institutional (educational system) performance, their strong correlation with wallet rates might imply they also reflect aspects of broader social capital.

In [1]:
import pandas as pd
import plotly.express as px

In [2]:
# Data import

numeric_features_initial = [
    "general_trust",
    "GPS_trust",
    "general_morality",
    "MFQ_genmorality",
    "civic_cooperation",
    "GPS_posrecip",
    "GPS_altruism",
    "stranger1",
]

cat_cols = [
    "country",
    "response",
    "male",
    "above40",
    "computer",
    "coworkers",
    "other_bystanders",
    "institution",
    "cond",
    "security_cam",
    "security_guard",
    "local_recipient",
    "no_english",
    "understood_situation",
]

sc_measures = [
    "log_gdp",
    "log_tfp",
    "gee",
    "letter_grading",
]

# Import Tannenbaum data
df = pd.read_csv(
    "../data/tannenbaum_data.csv",
    dtype={col: "category" for col in cat_cols},
)

# Import PISA data
pisa = pd.read_csv("../data/pisa_data.csv").rename(columns={"mean_score": "pisa_score"})
df = df.merge(pisa, how="left", on="country")

### Correlation Matrix including PISA, Wallet Response, Survey Measures, and SC Measures

Note how response and pisa_score have similar correlations with all the other survey measures of social capital. Should get rid of entire matrix and only show first two rows. Maybe even just make a table. Also should get rid of economic/institutional measures.

In [15]:
# Columns we want to see correlations for.
cols_for_country_avg_corr = ['response', 'pisa_score'] + numeric_features_initial

df_corr = df.copy().astype({'response': int})

# Calculate country averages for these measures
country_avg_data = df_corr.groupby("country")[cols_for_country_avg_corr].mean()

# Compute the correlation matrix
comprehensive_corr_matrix = country_avg_data.corr()

# Show correlations of interest
comprehensive_corr_matrix.iloc[:2,2:]

Unnamed: 0,general_trust,GPS_trust,general_morality,MFQ_genmorality,civic_cooperation,GPS_posrecip,GPS_altruism,stranger1
response,0.603736,0.02351,0.612047,0.461323,0.391755,0.050279,-0.214705,0.645001
pisa_score,0.611033,0.127325,0.674059,0.419857,0.441642,-0.157111,-0.162403,0.659283


In [16]:
# Show matrix

fig_comp_corr = px.imshow(
    comprehensive_corr_matrix,
    labels=dict(color="Correlation"),
    color_continuous_scale="RdBu_r",
    zmin=-1,
    zmax=1,
    title="Correlation Matrix: Wallet Response, PISA, Survey (Country Averages)",
    # text_auto=True # Show correlation values on heatmap
)
fig_comp_corr.update_layout(height=800)
fig_comp_corr.show()

**Observations from the Correlation Matrix:**
1. PISA scores (`PISA_Avg_Reading_2022`) generally show stronger correlations with wallet `response_numeric` than many individual survey-based social capital measures do.
2. PISA scores also correlate with other macro social capital indicators (`log_gdp`, `gee`, etc.), sometimes more strongly than the wallet response rate itself does. This might suggest PISA scores capture a broad aspect of societal development and functional institutions, which overlaps with social capital.
3. The survey measures (like `general_trust`, `general_morality`) show varied correlations with both wallet response and PISA scores, reinforcing the idea that these are related but distinct constructs.