# Lab 6: Open Banking & Financial Inclusion

API integration and welfare effects analysis

> **Expected Time**
>
> -   FIN510: Exercises 1-2 ≈ 90 min
> -   FIN720: All exercises ≈ 120 min
> -   Directed learning extensions ≈ 60 min

<figure>
<a
href="https://colab.research.google.com/github/quinfer/fin510-colab-notebooks/blob/main/labs/lab06_open_banking.ipynb"><img
src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
<figcaption>Open in Colab</figcaption>
</figure>

## Before You Code: The Big Picture

Open Banking is a regulatory revolution: **banks must share customer
data (with consent) via APIs**. This breaks banks’ data monopoly and
enables FinTech innovation—but does it improve financial inclusion and
consumer welfare?

> **The Open Banking Promise**
>
> **The Regulation:** - **EU (PSD2, 2018)**: Banks must provide API
> access to account data and payment initiation - **UK (CMA9, 2018)**:
> Nine largest banks forced to open APIs as competition remedy - **US**:
> Market-driven standards (Plaid, MX, Finicity), no federal mandate
>
> **The Promise:** 1. **Competition**: FinTechs can build better apps on
> bank data 2. **Innovation**: Account aggregation, budgeting tools,
> credit decisioning 3. **Inclusion**: Alternative data enables lending
> to thin-file borrowers 4. **Switching**: Easier to compare accounts
> and switch banks
>
> **The Reality (So Far):** - ~20% adoption in UK after 6 years (OBIE
> 2025 data) - Privacy concerns: 45% of consumers uncomfortable sharing
> data - Sticky incumbents: Switching rates increased only modestly -
> Unequal benefits: Digitally savvy consumers gain most

### What You’ll Build Today

By the end of this lab, you will have:

-   ✅ Understanding of OAuth 2.0 authentication for open banking APIs
-   ✅ Hands-on experience with API integration patterns
-   ✅ Empirical analysis of financial inclusion using Global Findex
    data
-   ✅ Difference-in-differences estimation of welfare effects
-   ✅ Critical perspective on who benefits (and who doesn’t)

**Time estimate:** 90 minutes (FIN510) \| 120 minutes (FIN720 with all
exercises)

> **Why This Matters**
>
> Open Banking is a natural experiment in data portability and
> competition policy. Did it work? For your Coursework 2 (FIN720), you
> might evaluate an open banking FinTech. This lab gives you the
> empirical toolkit to assess impact.

## Learning Objectives

By the end of this lab, you will be able to:

-   Authenticate with open banking APIs using OAuth 2.0
-   Retrieve and parse account and transaction data programmatically
-   Analyze financial inclusion patterns using Global Findex microdata
-   Estimate welfare effects using difference-in-differences models
-   Interpret heterogeneous treatment effects by demographic
    characteristics
-   Connect empirical findings to platform economics and policy
    evaluation

## Setup and Dependencies

In [1]:
# Core data science libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import warnings
warnings.filterwarnings('ignore')

# For API interactions (Exercise 1)
try:
    import requests
    import json
    from urllib.parse import urlencode
except ImportError:
    print("Installing requests for API work...")
    !pip install -q requests

# For statistical modeling (Exercises 2-3)
try:
    import statsmodels.api as sm
    import statsmodels.formula.api as smf
except ImportError:
    print("Installing statsmodels...")
    !pip install -q statsmodels

# Set visualization style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6)
plt.rcParams['font.size'] = 11

print("✓ Setup complete. All dependencies loaded.")

## Exercise 1: Open Banking API Integration

### Understanding OAuth 2.0 Authentication

Open Banking APIs use OAuth 2.0 for authorization—the same protocol used
by Google, Facebook, and Twitter for third-party app access. The flow
involves:

1.  **Authorization request**: Your app redirects user to bank’s
    authorization server
2.  **User consent**: User authenticates with bank and grants
    permissions
3.  **Authorization code**: Bank redirects back to your app with
    temporary code
4.  **Token exchange**: Your app exchanges code for access token
5.  **API calls**: Use access token to call protected endpoints

This complexity creates friction for developers and users, explaining
some open banking adoption challenges. Let’s implement a simplified
version using sandbox APIs.

### Plaid Sandbox Integration

Plaid provides a sandbox environment for testing open banking
integrations. We’ll use their Link token flow to simulate account
connection.

In [2]:
# Plaid API configuration (sandbox environment)
# NOTE: These are public sandbox credentials for educational use only
# In production, NEVER expose credentials in code!

PLAID_CLIENT_ID = "your_sandbox_client_id"  # Replace with your Plaid sandbox ID
PLAID_SECRET = "your_sandbox_secret"  # Replace with your sandbox secret
PLAID_ENV = "sandbox"
PLAID_PRODUCTS = ["auth", "transactions"]
PLAID_COUNTRY_CODES = ["GB", "US"]

def create_link_token():
    """
    Create a Link token for initializing Plaid Link.
    This is step 1 of the OAuth-like flow.
    """
    url = f"https://{PLAID_ENV}.plaid.com/link/token/create"
    
    headers = {
        "Content-Type": "application/json",
    }
    
    data = {
        "client_id": PLAID_CLIENT_ID,
        "secret": PLAID_SECRET,
        "user": {
            "client_user_id": "lab-user-001",  # Unique user ID in your system
        },
        "client_name": "FIN510 Lab App",
        "products": PLAID_PRODUCTS,
        "country_codes": PLAID_COUNTRY_CODES,
        "language": "en",
    }
    
    try:
        response = requests.post(url, headers=headers, json=data)
        response.raise_for_status()
        result = response.json()
        return result.get("link_token")
    except requests.exceptions.RequestException as e:
        print(f"Error creating link token: {e}")
        return None

# In real implementation, you'd use this token to initialize Plaid Link UI
# For this lab, we'll simulate the response
print("OAuth 2.0 authentication flow initiated...")
print("In production, user would see bank login screen here.")
print("After consent, we'd receive authorization code to exchange for access token.")

### Simulating Account Data Retrieval

Since full OAuth implementation requires web server setup, we’ll work
with realistic synthetic data that mimics what you’d receive from open
banking APIs.

In [3]:
# Generate realistic synthetic account and transaction data
# This simulates what you'd receive from /auth/get and /transactions/get endpoints

np.random.seed(42)

# Account information
accounts_data = {
    "account_id": ["acc_001", "acc_002"],
    "name": ["Current Account", "Savings Account"],
    "type": ["depository", "depository"],
    "subtype": ["checking", "savings"],
    "balance_current": [2450.75, 15420.30],
    "balance_available": [2450.75, 15420.30],
    "currency": ["GBP", "GBP"]
}

accounts_df = pd.DataFrame(accounts_data)

# Transaction history (90 days)
n_transactions = 120
dates = pd.date_range(end=pd.Timestamp.now(), periods=n_transactions, freq='D')

# Mix of income and expenses with realistic patterns
transaction_amounts = []
categories = []
merchants = []

for i in range(n_transactions):
    if np.random.random() < 0.1:  # 10% income transactions
        amount = np.random.choice([2500, 3000, 850])  # Salary, freelance
        category = "Income"
        merchant = np.random.choice(["Employer Deposit", "Freelance Payment"])
    else:  # Expenses
        category = np.random.choice([
            "Groceries", "Transport", "Dining", "Shopping", 
            "Utilities", "Entertainment", "Healthcare"
        ], p=[0.25, 0.20, 0.15, 0.12, 0.10, 0.10, 0.08])
        
        # Category-specific amounts
        if category == "Groceries":
            amount = -np.random.uniform(15, 85)
        elif category == "Transport":
            amount = -np.random.uniform(5, 45)
        elif category == "Dining":
            amount = -np.random.uniform(10, 50)
        elif category == "Shopping":
            amount = -np.random.uniform(20, 150)
        elif category == "Utilities":
            amount = -np.random.uniform(50, 120)
        elif category == "Entertainment":
            amount = -np.random.uniform(10, 60)
        else:  # Healthcare
            amount = -np.random.uniform(30, 200)
        
        merchant = f"{category} Merchant {np.random.randint(1, 5)}"
    
    transaction_amounts.append(amount)
    categories.append(category)
    merchants.append(merchant)

transactions_df = pd.DataFrame({
    "date": dates,
    "amount": transaction_amounts,
    "category": categories,
    "merchant": merchants,
    "account_id": np.random.choice(["acc_001", "acc_002"], size=n_transactions, p=[0.8, 0.2])
})

# Sort by date descending (most recent first, as APIs typically return)
transactions_df = transactions_df.sort_values("date", ascending=False).reset_index(drop=True)

print(f"✓ Retrieved {len(accounts_df)} accounts and {len(transactions_df)} transactions")
print("\nAccount Summary:")
print(accounts_df[["name", "type", "balance_current"]])
print("\nRecent Transactions:")
print(transactions_df.head(10))

### Analysis Task: Spending Patterns

Now let’s analyze the transaction data to understand spending
patterns—exactly what budgeting apps built on open banking do.

In [4]:
# Calculate spending by category
spending_by_category = transactions_df[transactions_df["amount"] < 0].groupby("category").agg({
    "amount": ["sum", "count", "mean"]
}).round(2)

spending_by_category.columns = ["Total_Spent", "Num_Transactions", "Avg_Transaction"]
spending_by_category["Total_Spent"] = spending_by_category["Total_Spent"].abs()
spending_by_category["Avg_Transaction"] = spending_by_category["Avg_Transaction"].abs()
spending_by_category = spending_by_category.sort_values("Total_Spent", ascending=False)

print("Spending Analysis by Category:")
print(spending_by_category)

# Calculate monthly income vs expenses
transactions_df["month"] = pd.to_datetime(transactions_df["date"]).dt.to_period("M")
monthly_summary = transactions_df.groupby("month").agg({
    "amount": lambda x: [x[x > 0].sum(), x[x < 0].sum(), x.sum()]
})
monthly_summary = pd.DataFrame(
    monthly_summary["amount"].tolist(), 
    index=monthly_summary.index,
    columns=["Income", "Expenses", "Net"]
)
monthly_summary["Expenses"] = monthly_summary["Expenses"].abs()

print("\nMonthly Cash Flow:")
print(monthly_summary)

# Visualization
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Spending by category
spending_by_category["Total_Spent"].plot(kind="barh", ax=axes[0], color="coral")
axes[0].set_xlabel("Total Spent (£)")
axes[0].set_title("Spending by Category (90 days)")
axes[0].grid(axis="x", alpha=0.3)

# Monthly cash flow
monthly_summary[["Income", "Expenses"]].plot(kind="bar", ax=axes[1])
axes[1].set_xlabel("Month")
axes[1].set_ylabel("Amount (£)")
axes[1].set_title("Monthly Income vs Expenses")
axes[1].legend(["Income", "Expenses"])
axes[1].grid(axis="y", alpha=0.3)
plt.xticks(rotation=0)

plt.tight_layout()
plt.show()

# Calculate savings rate
total_income = transactions_df[transactions_df["amount"] > 0]["amount"].sum()
total_expenses = transactions_df[transactions_df["amount"] < 0]["amount"].sum()
savings_rate = ((total_income + total_expenses) / total_income) * 100

print(f"\n💡 Financial Health Metrics:")
print(f"Total Income: £{total_income:,.2f}")
print(f"Total Expenses: £{abs(total_expenses):,.2f}")
print(f"Savings Rate: {savings_rate:.1f}%")

### Reflection Questions (Exercise 1)

Write 150-200 words addressing:

1.  **API Complexity**: Why is OAuth 2.0 necessary for open banking, and
    how does its complexity affect TPP adoption and consumer experience?

2.  **Data Value**: What insights can budgeting apps extract from
    transaction data that consumers might not get from their bank’s
    native app? Is this value sufficient to justify the friction of
    connecting accounts?

3.  **Privacy Trade-offs**: You’ve granted a TPP access to detailed
    transaction history. What privacy risks does this create? How should
    regulation balance innovation benefits with consumer protection?

## Exercise 2: Financial Inclusion Data Analysis

### Loading Global Findex Data

The World Bank’s Global Findex database surveys ~150,000 adults across
140+ countries on financial access, usage, and attitudes. We’ll analyze
patterns of mobile money adoption.

In [5]:
# Load Global Findex 2021 microdata (synthetic version for this lab)
# Real data available at: https://microdata.worldbank.org/index.php/catalog/findex

np.random.seed(123)
n_respondents = 5000

# Generate synthetic but realistic Findex-style data
findex_data = {
    "country": np.random.choice(
        ["Kenya", "Tanzania", "Uganda", "Nigeria", "India", "Bangladesh"], 
        size=n_respondents, 
        p=[0.15, 0.15, 0.15, 0.20, 0.20, 0.15]
    ),
    "age": np.random.choice([18, 25, 35, 45, 55, 65], size=n_respondents),
    "female": np.random.binomial(1, 0.52, size=n_respondents),
    "education": np.random.choice(
        ["None", "Primary", "Secondary", "Tertiary"], 
        size=n_respondents, 
        p=[0.20, 0.35, 0.30, 0.15]
    ),
    "income_quintile": np.random.choice([1, 2, 3, 4, 5], size=n_respondents),
    "urban": np.random.binomial(1, 0.40, size=n_respondents),
    "employed": np.random.binomial(1, 0.60, size=n_respondents),
}

# Financial access outcomes (correlated with demographics realistically)
findex_df = pd.DataFrame(findex_data)

# Bank account ownership (higher for urban, educated, higher income)
bank_prob = 0.15 + 0.20 * findex_df["urban"] + 0.08 * findex_df["income_quintile"] + \
            0.10 * (findex_df["education"] == "Tertiary") + \
            0.05 * (findex_df["country"] == "Kenya")
findex_df["has_bank_account"] = np.random.binomial(1, np.clip(bank_prob, 0, 1))

# Mobile money account (higher in Kenya/Tanzania, less dependent on education)
mobile_prob = 0.10 + 0.40 * (findex_df["country"] == "Kenya") + \
              0.25 * (findex_df["country"] == "Tanzania") + \
              0.15 * (findex_df["country"] == "Uganda") + \
              0.10 * findex_df["urban"] + \
              0.03 * findex_df["income_quintile"] - \
              0.05 * findex_df["female"]  # Gender gap (which policy tries to address)
findex_df["has_mobile_money"] = np.random.binomial(1, np.clip(mobile_prob, 0, 1))

# Received remittances (common use case)
remit_prob = 0.20 + 0.15 * findex_df["has_mobile_money"] + \
             0.05 * (findex_df["country"].isin(["Kenya", "Tanzania", "Uganda"]))
findex_df["received_remittances"] = np.random.binomial(1, np.clip(remit_prob, 0, 1))

print(f"✓ Loaded Global Findex data: {len(findex_df):,} respondents from {findex_df['country'].nunique()} countries")
print("\nSample of data:")
print(findex_df.head(10))

### Descriptive Analysis: Mobile Money Adoption Patterns

In [6]:
# Overall adoption rates by country
adoption_by_country = findex_df.groupby("country").agg({
    "has_bank_account": "mean",
    "has_mobile_money": "mean",
    "received_remittances": "mean"
}).round(3) * 100

adoption_by_country.columns = ["Bank Account %", "Mobile Money %", "Received Remittances %"]
adoption_by_country = adoption_by_country.sort_values("Mobile Money %", ascending=False)

print("Financial Access by Country:")
print(adoption_by_country)

# Gender gap analysis
gender_gap = findex_df.groupby("female").agg({
    "has_bank_account": "mean",
    "has_mobile_money": "mean"
}).round(3) * 100
gender_gap.index = ["Male", "Female"]
gender_gap.columns = ["Bank Account %", "Mobile Money %"]

print("\nGender Gap in Financial Access:")
print(gender_gap)
print(f"\nGender gap in bank accounts: {gender_gap.loc['Male', 'Bank Account %'] - gender_gap.loc['Female', 'Bank Account %']:.1f} pp")
print(f"Gender gap in mobile money: {gender_gap.loc['Male', 'Mobile Money %'] - gender_gap.loc['Female', 'Mobile Money %']:.1f} pp")

# Urban-rural divide
urban_rural = findex_df.groupby("urban").agg({
    "has_bank_account": "mean",
    "has_mobile_money": "mean"
}).round(3) * 100
urban_rural.index = ["Rural", "Urban"]
urban_rural.columns = ["Bank Account %", "Mobile Money %"]

print("\nUrban-Rural Divide:")
print(urban_rural)

# Visualizations
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Country comparison
adoption_by_country[["Bank Account %", "Mobile Money %"]].plot(
    kind="barh", ax=axes[0, 0], color=["steelblue", "coral"]
)
axes[0, 0].set_xlabel("Adoption Rate (%)")
axes[0, 0].set_title("Financial Access by Country")
axes[0, 0].legend(["Bank Account", "Mobile Money"])

# Gender gap
gender_gap.plot(kind="bar", ax=axes[0, 1], color=["steelblue", "coral"])
axes[0, 1].set_xlabel("Gender")
axes[0, 1].set_ylabel("Adoption Rate (%)")
axes[0, 1].set_title("Gender Gap in Financial Access")
axes[0, 1].set_xticklabels(["Male", "Female"], rotation=0)
axes[0, 1].legend(["Bank Account", "Mobile Money"])

# Income quintile analysis
income_adoption = findex_df.groupby("income_quintile")[["has_bank_account", "has_mobile_money"]].mean() * 100
income_adoption.plot(kind="line", ax=axes[1, 0], marker="o", color=["steelblue", "coral"])
axes[1, 0].set_xlabel("Income Quintile (1=poorest, 5=richest)")
axes[1, 0].set_ylabel("Adoption Rate (%)")
axes[1, 0].set_title("Financial Access by Income Level")
axes[1, 0].legend(["Bank Account", "Mobile Money"])
axes[1, 0].grid(alpha=0.3)

# Education level analysis
education_order = ["None", "Primary", "Secondary", "Tertiary"]
education_adoption = findex_df.groupby("education")[["has_bank_account", "has_mobile_money"]].mean().reindex(education_order) * 100
education_adoption.plot(kind="bar", ax=axes[1, 1], color=["steelblue", "coral"])
axes[1, 1].set_xlabel("Education Level")
axes[1, 1].set_ylabel("Adoption Rate (%)")
axes[1, 1].set_title("Financial Access by Education")
axes[1, 1].set_xticklabels(education_order, rotation=45)
axes[1, 1].legend(["Bank Account", "Mobile Money"])

plt.tight_layout()
plt.show()

### Regression Analysis: Determinants of Mobile Money Adoption

In [7]:
# Prepare data for regression
# Create dummy variables for categorical variables
findex_reg = findex_df.copy()
findex_reg = pd.get_dummies(findex_reg, columns=["country", "education"], drop_first=True)

# Logistic regression: Pr(has_mobile_money = 1)
# Independent variables: demographics, location, financial status

X_vars = [col for col in findex_reg.columns if col.startswith(("country_", "education_")) or 
          col in ["female", "age", "urban", "income_quintile", "employed"]]
X = findex_reg[X_vars]
X = sm.add_constant(X)
y = findex_reg["has_mobile_money"]

logit_model = sm.Logit(y, X)
logit_results = logit_model.fit(disp=0)

print("Logistic Regression: Determinants of Mobile Money Adoption")
print("=" * 70)
print(logit_results.summary2().tables[1])

# Calculate marginal effects (more interpretable than log-odds)
marginal_effects = logit_results.get_margeff()
print("\n\nMarginal Effects (percentage point change in adoption probability):")
print("=" * 70)
print(marginal_effects.summary())

# Interpretation of key coefficients
print("\n\n💡 Key Findings:")
print("=" * 70)

if "country_Kenya" in X_vars:
    kenya_effect = marginal_effects.margeff[X_vars.index("country_Kenya")]
    print(f"• Being in Kenya increases mobile money adoption by {kenya_effect*100:.1f} pp (vs baseline)")

female_effect = marginal_effects.margeff[X_vars.index("female")]
print(f"• Being female {'increases' if female_effect > 0 else 'decreases'} mobile money adoption by {abs(female_effect)*100:.1f} pp")

urban_effect = marginal_effects.margeff[X_vars.index("urban")]
print(f"• Living in urban areas increases adoption by {urban_effect*100:.1f} pp")

income_effect = marginal_effects.margeff[X_vars.index("income_quintile")]
print(f"• Each income quintile increase raises adoption by {income_effect*100:.1f} pp")

### Reflection Questions (Exercise 2)

Write 200-250 words addressing:

1.  **Adoption Patterns**: What demographic and geographic factors most
    strongly predict mobile money adoption? Do these patterns suggest
    mobile money reaches financially excluded populations, or primarily
    serves those already banked?

2.  **Gender Gap**: Why might a gender gap persist in mobile money
    adoption despite the technology’s accessibility? What policy
    interventions could address this?

3.  **Comparison to Open Banking**: How do mobile money adoption
    patterns in developing economies compare to open banking adoption in
    the UK/EU? What explains the differences in speed and breadth of
    adoption?

## Exercise 3: Welfare Effects Replication (Suri & Jack 2016)

### Understanding Difference-in-Differences

We’ll replicate the core empirical strategy from Suri & Jack’s mobile
money welfare study using simplified synthetic data. The
difference-in-differences (DiD) estimator compares changes over time
between treatment and control groups:

$$\text{DiD} = [E(Y_{treated,post}) - E(Y_{treated,pre})] - [E(Y_{control,post}) - E(Y_{control,pre})]$$

This approach controls for time-invariant differences between groups and
common time trends.

In [8]:
# Generate synthetic panel data mimicking Suri & Jack structure
np.random.seed(456)
n_households = 800
n_periods = 4  # 2008, 2010, 2012, 2014

# Create panel structure
household_ids = np.repeat(np.arange(n_households), n_periods)
years = np.tile([2008, 2010, 2012, 2014], n_households)

panel_data = pd.DataFrame({
    "household_id": household_ids,
    "year": years,
})

# Treatment assignment: some households gain M-Pesa access in 2010
# (based on agent network rollout, which we simulate)
treatment_assignment = np.random.binomial(1, 0.5, size=n_households)
panel_data["treated"] = panel_data["household_id"].map(
    dict(zip(np.arange(n_households), treatment_assignment))
)

# Define post-treatment period
panel_data["post"] = (panel_data["year"] >= 2010).astype(int)

# Household characteristics (time-invariant)
female_headed = np.random.binomial(1, 0.30, size=n_households)
urban = np.random.binomial(1, 0.35, size=n_households)
baseline_income = np.random.lognormal(mean=8, sigma=0.8, size=n_households)

panel_data["female_headed"] = panel_data["household_id"].map(
    dict(zip(np.arange(n_households), female_headed))
)
panel_data["urban"] = panel_data["household_id"].map(
    dict(zip(np.arange(n_households), urban))
)
panel_data["baseline_income"] = panel_data["household_id"].map(
    dict(zip(np.arange(n_households), baseline_income))
)

# Generate consumption outcomes
# Baseline consumption (correlated with income, urban status)
baseline_consumption = 1000 + 0.3 * panel_data["baseline_income"] + \
                       200 * panel_data["urban"] + \
                       np.random.normal(0, 150, size=len(panel_data))

# Time trend (consumption grows over time)
time_trend = 50 * (panel_data["year"] - 2008)

# Treatment effect: M-Pesa access increases consumption through smoothing
# Effect is larger for female-headed households (as in Suri & Jack)
treatment_effect = 150 * panel_data["treated"] * panel_data["post"] + \
                   100 * panel_data["treated"] * panel_data["post"] * panel_data["female_headed"]

# Final consumption
panel_data["consumption"] = baseline_consumption + time_trend + treatment_effect + \
                            np.random.normal(0, 200, size=len(panel_data))

# Poverty indicator (consumption below threshold)
poverty_threshold = 1200
panel_data["poor"] = (panel_data["consumption"] < poverty_threshold).astype(int)

print(f"✓ Generated panel data: {n_households} households over {n_periods} periods")
print("\nSample of panel data:")
print(panel_data.head(12))

# Summary statistics by treatment status and period
summary = panel_data.groupby(["treated", "post"]).agg({
    "consumption": ["mean", "std"],
    "poor": "mean"
}).round(2)
summary.columns = ["Consumption_Mean", "Consumption_SD", "Poverty_Rate"]
summary.index = ["Control Pre", "Control Post", "Treated Pre", "Treated Post"]

print("\nSummary Statistics by Group and Period:")
print(summary)

### Estimating Treatment Effects

In [9]:
# Difference-in-Differences regression
# consumption = β0 + β1*treated + β2*post + β3*treated*post + controls + ε
# β3 is the DiD estimator (average treatment effect)

did_formula = "consumption ~ treated + post + treated:post + urban + baseline_income"
did_model = smf.ols(did_formula, data=panel_data).fit(cov_type="cluster", cov_kwds={"groups": panel_data["household_id"]})

print("Difference-in-Differences Regression Results")
print("=" * 70)
print(did_model.summary())

# Extract treatment effect
treatment_effect_est = did_model.params["treated:post"]
treatment_effect_se = did_model.bse["treated:post"]
treatment_effect_pval = did_model.pvalues["treated:post"]

print(f"\n\n💡 Key Finding:")
print("=" * 70)
print(f"Average Treatment Effect: £{treatment_effect_est:.2f}")
print(f"Standard Error: £{treatment_effect_se:.2f}")
print(f"P-value: {treatment_effect_pval:.4f}")
print(f"95% Confidence Interval: [£{treatment_effect_est - 1.96*treatment_effect_se:.2f}, £{treatment_effect_est + 1.96*treatment_effect_se:.2f}]")

# Convert to poverty reduction
poverty_before = panel_data[(panel_data["treated"] == 1) & (panel_data["post"] == 0)]["poor"].mean()
poverty_after = panel_data[(panel_data["treated"] == 1) & (panel_data["post"] == 1)]["poor"].mean()
poverty_reduction = (poverty_before - poverty_after) * 100

print(f"\nM-Pesa access reduced poverty rate by {poverty_reduction:.1f} percentage points")
print(f"(from {poverty_before*100:.1f}% to {poverty_after*100:.1f}%)")

# Visualize parallel trends assumption
pre_post_means = panel_data.groupby(["treated", "year"])["consumption"].mean().unstack(level=0)
pre_post_means.columns = ["Control", "Treated"]

plt.figure(figsize=(10, 6))
plt.plot(pre_post_means.index, pre_post_means["Control"], marker="o", label="Control", linewidth=2)
plt.plot(pre_post_means.index, pre_post_means["Treated"], marker="s", label="Treated", linewidth=2)
plt.axvline(x=2010, color="red", linestyle="--", alpha=0.5, label="M-Pesa Rollout")
plt.xlabel("Year")
plt.ylabel("Average Consumption (£)")
plt.title("Parallel Trends: Consumption Over Time")
plt.legend()
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()

print("\n📊 Parallel Trends Check:")
print("Pre-treatment trends should be similar for treated and control groups.")
print("The divergence after 2010 is the treatment effect.")

### Heterogeneous Effects by Gender

In [10]:
# Estimate separate effects for female-headed vs male-headed households
# This replicates Suri & Jack's finding of larger effects for women

did_formula_gender = "consumption ~ treated + post + treated:post + treated:post:female_headed + urban + baseline_income"
did_model_gender = smf.ols(did_formula_gender, data=panel_data).fit(cov_type="cluster", cov_kwds={"groups": panel_data["household_id"]})

print("Heterogeneous Treatment Effects by Gender")
print("=" * 70)
print(did_model_gender.summary().tables[1])

# Calculate effects
ate_male = did_model_gender.params["treated:post"]
additional_effect_female = did_model_gender.params["treated:post:female_headed"]
ate_female = ate_male + additional_effect_female

print(f"\n\n💡 Gender-Specific Treatment Effects:")
print("=" * 70)
print(f"Male-headed households: £{ate_male:.2f} increase in consumption")
print(f"Female-headed households: £{ate_female:.2f} increase in consumption")
print(f"Additional effect for women: £{additional_effect_female:.2f}")
print(f"\nFemale-headed households benefit {(ate_female/ate_male - 1)*100:.0f}% more than male-headed households")

# Poverty reduction by gender
poverty_reduction_female = panel_data[
    (panel_data["female_headed"] == 1) & (panel_data["treated"] == 1)
].groupby("post")["poor"].mean()
poverty_reduction_female = (poverty_reduction_female.iloc[0] - poverty_reduction_female.iloc[1]) * 100

poverty_reduction_male = panel_data[
    (panel_data["female_headed"] == 0) & (panel_data["treated"] == 1)
].groupby("post")["poor"].mean()
poverty_reduction_male = (poverty_reduction_male.iloc[0] - poverty_reduction_male.iloc[1]) * 100

print(f"\nPoverty reduction:")
print(f"• Female-headed households: {poverty_reduction_female:.1f} pp")
print(f"• Male-headed households: {poverty_reduction_male:.1f} pp")

### Reflection Questions (Exercise 3)

Write 250-300 words addressing:

1.  **Causal Interpretation**: The DiD estimator assumes parallel
    trends—that treated and control groups would have followed the same
    trajectory absent treatment. Does this assumption seem plausible in
    our data? What threats to validity might exist in the real Suri &
    Jack study?

2.  **Magnitude and Significance**: The treatment effect represents a
    £150-250 increase in annual consumption. Is this economically
    significant? How would you assess whether this justifies mobile
    money as a development intervention compared to alternatives (cash
    transfers, microfinance, education)?

3.  **Gender Heterogeneity**: Why do female-headed households benefit
    more from mobile money access? What mechanisms explain this? What
    policy implications follow from gender-differentiated effects?

4.  **Generalizability**: These results come from Kenya 2008-2014. Would
    you expect similar effects in other countries today? What contextual
    factors matter for external validity?

## Summary and Key Takeaways

Through these three exercises, you’ve:

1.  **Experienced open banking APIs** firsthand, understanding OAuth
    complexity and data access patterns that enable budgeting apps and
    financial management tools

2.  **Analyzed financial inclusion patterns** using Global Findex data,
    identifying demographic and geographic determinants of mobile money
    adoption across developing economies

3.  **Replicated welfare effects estimation** using
    difference-in-differences methodology, quantifying how mobile money
    access affects consumption and poverty

4.  **Connected technical implementation to policy evaluation**, seeing
    how empirical evidence informs financial inclusion strategies

### Connections to Course Themes

-   **Week 2 (APIs)**: Open banking mandates transform APIs from
    technical infrastructure to regulatory requirement
-   **Week 3 (Platforms)**: Mobile money exhibits network effects and
    platform dynamics we analyzed theoretically
-   **Week 4 (Robo-advisors)**: Both open banking and mobile money use
    algorithms to personalize financial services
-   **Future weeks**: Digital currency (Week 7) and blockchain (Week 8)
    propose alternative infrastructures for financial inclusion

### Assessment Preparation

**FIN510 Coursework 1**: Set exercises may ask you to explain open
banking regulation, describe mobile money technology, or interpret
welfare effects evidence. Use this lab’s analyses and interpretations as
reference.

**FIN720 Coursework 1 (due next week)**: This lab provides empirical
grounding for essays on open banking (platform design, adoption
barriers) or financial inclusion (welfare effects, policy evaluation).
Reference Suri & Jack appropriately and consider comparing their
findings to open banking evidence.

### Further Exploration

If you have additional time, consider:

-   **Exercise 1 Extension**: Build a simple financial health dashboard
    using the transaction data, adding alerts for unusual spending or
    low balance warnings

-   **Exercise 2 Extension**: Analyze cross-country differences in
    mobile money adoption more deeply. What explains Kenya’s success vs
    slower adoption elsewhere?

-   **Exercise 3 Extension**: Examine consumption volatility (variance
    before/after treatment) to test the consumption smoothing mechanism
    directly

-   **Integration**: Combine Exercises 2 and 3—use Findex data to
    predict which populations would benefit most from mobile money
    expansion

------------------------------------------------------------------------

**Well done! You’ve completed a comprehensive analysis of open banking
and financial inclusion technologies, connecting APIs, data science, and
development economics.**