# Task 3: Event Impact Modeling

**Objective:** Model how events (policies, product launches, infrastructure investments) affect financial inclusion indicators.

This notebook covers:
1.  Loading and understanding impact data.
2.  Building the Event-Indicator Association Matrix.
3.  Validating model assumptions against historical data.
4.  Refining impact estimates.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sys
import os

# Add src to path
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..')))
from src.utils import load_data, get_observations, get_events

# Set style
sns.set_theme(style="whitegrid")
plt.rcParams['figure.figsize'] = [12, 6]

# Load Data
data_path = "../data/raw/ethiopia_fi_unified_data.csv"
df = load_data(data_path)

## 1. Understand the Impact Data
We load the `impact_link` records and join them with `events` to understand the modeled relationships.

In [None]:
# Filter Impact Links and Events
impacts = df[df['record_type'] == 'impact_link'].copy()
events = get_events(df)

print(f"Loaded {len(impacts)} impact links and {len(events)} events.")

# Since our initial dataset might be sparse, let's define a robust set of Hypothesized Impacts here
# This mimics the "Enrichment" phase where we consulted experts or literature.
additional_impacts = [
    {"parent_id": "Telebirr Launch", "related_indicator": "account_ownership", "impact_direction": "Positive", "magnitude": 0.2, "lag_months": 24, "notes": "Slow conversion to formal accounts"},
    {"parent_id": "Telebirr Launch", "related_indicator": "telebirr_users_m", "impact_direction": "Positive", "magnitude": 0.9, "lag_months": 0, "notes": "Direct adoption"},
    {"parent_id": "M-Pesa Ethiopia Launch", "related_indicator": "account_ownership", "impact_direction": "Positive", "magnitude": 0.1, "lag_months": 18, "notes": "Adding competition"},
    {"parent_id": "M-Pesa Ethiopia Launch", "related_indicator": "mpesa_users_m", "impact_direction": "Positive", "magnitude": 0.8, "lag_months": 0, "notes": "Direct adoption"},
    {"parent_id": "NFIS-II Strategy Launch", "related_indicator": "account_ownership", "impact_direction": "Positive", "magnitude": 0.4, "lag_months": 36, "notes": "Long term policy effect"}
]

# Convert to DataFrame
impact_df = pd.DataFrame(additional_impacts)
display(impact_df)

## 2. Event-Indicator Association Matrix
We build a matrix where Rows = Events and Columns = Indicators. The values represent the estimated **Magnitude** of impact (0-1 scale).

In [None]:
# Pivot to create the matrix
matrix_df = impact_df.pivot(index='parent_id', columns='related_indicator', values='magnitude').fillna(0)

plt.figure(figsize=(10, 6))
sns.heatmap(matrix_df, annot=True, cmap="Greens", fmt=".1f", linewidths=.5)
plt.title("Event-Indicator Association Matrix (Estimated Impact Magnitude)")
plt.ylabel("Event")
plt.xlabel("Indicator")
plt.tight_layout()
plt.show()

## 3. Test Model Against Historical Data
**Case Study: Telebirr Launch (May 2021)**
We predicted a High impact on `telebirr_users_m` (Direct) and a Lower/Lagged impact on `account_ownership`.

Let's visualize the actual data around this event.

In [None]:
# Get Observation Data
acc = get_observations(df, 'account_ownership')
tb_users = get_observations(df, 'telebirr_users_m')

launch_date = pd.to_datetime("2021-05-11")

fig, ax1 = plt.subplots(figsize=(12, 6))

# Plot Account Ownership (Left Axis)
color = 'tab:blue'
ax1.set_xlabel('Date')
ax1.set_ylabel('Account Ownership Rate', color=color)
ax1.plot(pd.to_datetime(acc['observation_date']), acc['value_numeric'], color=color, marker='o', label='Account Ownership')
ax1.tick_params(axis='y', labelcolor=color)
ax1.set_ylim(0, 0.6)

# Plot Telebirr Users (Right Axis)
ax2 = ax1.twinx()
color = 'tab:orange'
ax2.set_ylabel('Telebirr Users (Millions)', color=color)
ax2.plot(pd.to_datetime(tb_users['observation_date']), tb_users['value_numeric'], color=color, marker='x', linestyle='--', label='Telebirr Users')
ax2.tick_params(axis='y', labelcolor=color)

# Mark Event
plt.axvline(launch_date, color='red', linestyle=':', label='Telebirr Launch')
plt.text(launch_date, 10, " Telebirr Launch", color='red', verticalalignment='bottom')

plt.title("Impact Validation: Telebirr Launch vs Indicators")
fig.tight_layout()
plt.show()

### Validation Findings
1.  **Usage Impact:** The model correctly predicts a massive spike in generic usage (User base 0 -> 54M) immediately post-launch.
2.  **Access Impact:** The model correctly assumed a **lag/dampening effect**. Account ownership only moved from 46% to 49% in 3 years. The "Direct Conversion" assumption was rightfully low (0.2 Magnitude).

**Refinement:** We should keep the Access coefficient low (0.1 - 0.2) for pure product launches unless they are accompanied by regulatory changes (e.g., Mandatory digital wages).