# Task 3: Event Impact Modeling

**Objective**: Model how events (policies, product launches, infrastructure investments) affect financial inclusion indicators.

This notebook will:
1.  **Load Impact Data**: Join `Impact_sheet` with `Events`.
2.  **Build Event-Indicator Matrix**: Quantify the expected impact of each event on key indicators.
3.  **Validate Model**: Compare predicted impacts against historical data (e.g., Telebirr launch).


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sys
import os
import numpy as np

# Add src to path
sys.path.append(os.path.abspath(os.path.join('../src')))

from data_loader import load_data

sns.set_theme(style="whitegrid")

# Load Data (Returns tuple: df_unified, df_impact)
df, df_impact = load_data(data_path=r"../data/raw/ethiopia_fi_unified_data.xlsx")
print(f"Loaded {len(df)} main records and {len(df_impact)} impact links.")

## 1. Data Preparation: Joining Events & Impacts

We need to associate each `Impact Link` with its parent `Event` to understand *what* is causing the impact and *when* it happened.

In [None]:
# Filter for Events in the main dataframe
events_df = df[df['record_type'] == 'event'][['record_id', 'indicator', 'start_date', 'data_year']]
events_df.rename(columns={'indicator': 'event_name', 'start_date': 'event_date', 'data_year': 'event_year'}, inplace=True)

# Join Impact Links with Parent Events
# impact.parent_id -> event.record_id
impact_model = pd.merge(
    df_impact, 
    events_df, 
    left_on='parent_id', 
    right_on='record_id', 
    how='left', 
    suffixes=('_impact', '_event')
)

print("Impact Model Schema:")
print(impact_model[['record_id_impact', 'event_name', 'event_date', 'related_indicator', 'impact_direction', 'impact_magnitude']].head())

## 2. Event-Indicator Association Matrix

Constructing a matrix that summarizes: **Which events affect which indicators, and by how much?**

We will map qualitative magnitudes (High, Medium, Low) to quantitative estimates for modeling:
- **High**: +/- 5.0 pp (for rates) or 20% growth (for counts)
- **Medium**: +/- 2.5 pp (for rates) or 10% growth
- **Low**: +/- 1.0 pp (for rates) or 5% growth

*Note: These are initial hypotheses to be refined in validation.*

In [None]:
# Define Estimator Map (Hypothesis)
magnitude_map = {
    'high': 5.0,
    'medium': 2.5,
    'low': 1.0
}

def estimate_impact(row):
    base = magnitude_map.get(str(row['impact_magnitude']).lower(), 1.0)
    direction = 1 if row['impact_direction'] == 'increase' else -1
    return base * direction

impact_model['estimated_impact_pp'] = impact_model.apply(estimate_impact, axis=1)

# Pivot to create the Matrix
# Using 'related_indicator' as the column (contains codes like ACC_OWNERSHIP)
association_matrix = impact_model.pivot_table(
    index='event_name',
    columns='related_indicator',
    values='estimated_impact_pp',
    aggfunc='sum' # Summing if an event has multiple impacts on same indicator (rare)
).fillna(0)

print("Event-Indicator Association Matrix (Estimated Impact PP):")
association_matrix

In [None]:
plt.figure(figsize=(10, 6))
sns.heatmap(association_matrix, annot=True, cmap="RdBu_r", center=0)
plt.title('Event Impact Matrix: Estimated Effect (pp)')
plt.xlabel('Indicator')
plt.ylabel('Event')
plt.show()

## 3. Validation: Historical Case Study (Telebirr)

**Event**: Telebirr Launch (May 2021)
**Predicted Impact**: High Increase on `ACC_MM_ACCOUNT` (Mobile Money Account Ownership).
**Observed Data**: Compare Findex 2021 vs 2024 for `ACC_MM_ACCOUNT`. 

In [None]:
# Extract Observed Data for Mobile Money
mm_obs = df[
    (df['indicator_code'] == 'ACC_MM_ACCOUNT') & 
    (df['record_type'] == 'observation') & 
    (df['gender'] == 'all')
].sort_values('data_year')

print("Observed Mobile Money Account Ownership:")
print(mm_obs[['data_year', 'value_numeric']])

obs_2021 = mm_obs[mm_obs['data_year'] == 2021]['value_numeric'].values[0] if 2021 in mm_obs['data_year'].values else np.nan
obs_2024 = mm_obs[mm_obs['data_year'] == 2024]['value_numeric'].values[0] if 2024 in mm_obs['data_year'].values else np.nan

if not np.isnan(obs_2021) and not np.isnan(obs_2024):
    actual_growth = obs_2024 - obs_2021
    print(f"\nActual Growth (2021-2024): +{actual_growth:.2f} pp")
    
    # Compare with Model Estimate for Telebirr
    # Assuming Telebirr is the primary driver for MM growth in this period
    telebirr_impact = association_matrix.loc['Telebirr Launch', 'ACC_MM_ACCOUNT'] if 'Telebirr Launch' in association_matrix.index and 'ACC_MM_ACCOUNT' in association_matrix.columns else 0
    
    print(f"Model Estimated Impact (Telebirr): +{telebirr_impact:.2f} pp")
    
    error = actual_growth - telebirr_impact
    print(f"Discrepancy: {error:.2f} pp")
    
    if abs(error) < 2.0:
        print("\nRESULT: Model estimate is REASONABLE.")
    else:
        print("\nRESULT: Model estimate needs REFINEMENT.")
else:
    print("\nInsufficient data to validate Telebirr impact.")

## 4. Comparable Evidence & Refinement

For future events like **Foreign Bank Entry (2025)**, we lack post-event data. We rely on Comparable Country Evidence (e.g., Kenya, Rwanda).

- **Assumption**: Competition from foreign banks typically improves *Quality* and *Affordability* before *Access*.
- **Lag Factor**: We apply a 12-month lag for Policy events based on the Telebirr observation (Launch 2021 -> Impact visible 2022-2024).

In [None]:
# Refinement: Applying Lag to the Model
# Create a 'Realized Impact Year' column

def calculate_realized_year(row):
    event_year = row['event_year']
    lag = row['lag_months'] if pd.notna(row['lag_months']) else 0
    return event_year + (lag / 12.0)

impact_model['realized_year'] = impact_model.apply(calculate_realized_year, axis=1)

print("Impact Schedule (When will impacts be felt?):")
print(impact_model[['event_name', 'related_indicator', 'event_year', 'lag_months', 'realized_year']].sort_values('realized_year'))