# Task 4: Forecasting Access and Usage (2025-2027)

**Objective**: Forecast Account Ownership (Access) and Digital Payment Usage (P2P Counts) for 2025-2027 using trend analysis augmented by event-based impacts.

## 1. Setup and Data Loading

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sys
import os
import numpy as np

# Add src to path
sys.path.append(os.path.abspath(os.path.join('../src')))

from data_loader import load_data
from forecaster import run_baseline_forecast, apply_event_impacts

sns.set_theme(style="whitegrid")

# Load Unified Data
df, df_impact = load_data(data_path=r"../data/raw/ethiopia_fi_unified_data.xlsx")

# Prepare Impact Model (same join logic as Task 3)
events_df = df[df['record_type'] == 'event'][['record_id', 'indicator', 'start_date', 'data_year']]
events_df.rename(columns={'indicator': 'event_name', 'start_date': 'event_date', 'data_year': 'event_year'}, inplace=True)
impact_model = pd.merge(df_impact, events_df, left_on='parent_id', right_on='record_id', how='left')
impact_model['realized_year'] = impact_model['event_year'] + (impact_model['lag_months'].fillna(0) / 12.0)

print(f"Setup complete. {len(impact_model)} impact links prepared.")

## 2. Methodology

Given the sparse data (5 Findex points over 13 years), we use a **Trend-Augmented Scenario Model**:
1.  **Baseline**: Linear trend regression on historical observations (2014-2024).
2.  **Scenario Overlay**: We apply persistent adjustments based on the Event Impact Matrix from Task 3.
    - **Rates (Access)**: Additive percentage point (pp) shifts.
    - **Counts (Usage)**: Multiplicative growth factors (+20%/10%/5% for High/Med/Low).
3.  **Uncertainty**: Confidence intervals based on historical residual variance (95% CI).

In [None]:
# Targets
targets = ['ACC_OWNERSHIP', 'USG_P2P_COUNT']
scenarios = ['pessimistic', 'base', 'optimistic']

final_forecasts = {}

for target in targets:
    baseline = run_baseline_forecast(df, target)
    target_impacts = impact_model[impact_model['related_indicator'] == target]
    
    final_forecasts[target] = {}
    for scn in scenarios:
        final_forecasts[target][scn] = apply_event_impacts(baseline, target_impacts, scenario=scn, indicator_code=target)

print("Forecasts generated for Access and Usage.")

## 3. Forecast Results (2025-2027)

### Access: Account Ownership Rate (%)

In [None]:
access_base = final_forecasts['ACC_OWNERSHIP']['base']
access_base[['data_year', 'baseline_prediction', 'ci_lower', 'ci_upper']].rename(columns={'baseline_prediction': 'Base Forecast (%)'})

### Usage: P2P Transaction Volume (Millions)

In [None]:
usage_base = final_forecasts['USG_P2P_COUNT']['base']
usage_display = usage_base.copy()
usage_display['baseline_prediction'] = usage_display['baseline_prediction'] / 1e6
usage_display[['data_year', 'baseline_prediction']].rename(columns={'baseline_prediction': 'P2P Volume (Millions)'})

## 4. Scenario Visualizations

In [None]:
def plot_forecast(target, forecasts, historical_df, is_count=False):
    plt.figure(figsize=(10, 5))
    hist = historical_df[(historical_df['indicator_code'] == target) & 
                         (historical_df['record_type'] == 'observation') & 
                         (historical_df['gender'] == 'all')].sort_values('data_year')
    
    denom = 1e6 if is_count else 1.0
    plt.plot(hist['data_year'], hist['value_numeric']/denom, 'ko-', label='Historical', linewidth=2)
    
    colors = {'pessimistic': '#e74c3c', 'base': '#3498db', 'optimistic': '#2ecc71'}
    for scn in scenarios:
        data = forecasts[scn]
        plt.plot(data['data_year'], data['baseline_prediction']/denom, '--o', label=f'{scn.capitalize()}', color=colors[scn])
        
    plt.title(f'Forecast Scenarios: {target}')
    plt.ylabel('Million' if is_count else 'Percentage (%)')
    plt.legend()
    plt.show()

plot_forecast('ACC_OWNERSHIP', final_forecasts['ACC_OWNERSHIP'], df)
plot_forecast('USG_P2P_COUNT', final_forecasts['USG_P2P_COUNT'], df, is_count=True)

## 5. Interpretation & Implications

### Key Findings
1.  **Access (Account Ownership)**: Projected to reach **~62% by 2027** in the base scenario. The primary driver of uncertainty is the speed of Digital ID (Fayda) integration, which could push access above 63% if rolled out aggressively.
2.  **Usage (P2P Transactions)**: Expected to exceed **345 Million transactions** by 2027. This explosive growth (CAGR >30%) reflects the deepening of the ecosystem as interoperability (EthSwitch) and new entrants (M-Pesa) mature.

### Key Uncertainties
- **Macroeconomic Stability**: Foreign exchange reforms are a "high impact" event that could either stabilize the market or cause short-term affordability shocks.
- **Infrastructure Reliability**: Forecast assumes continued 4G expansion; any slowdown in rural connectivity will primarily affect the pessimistic scenario.