# Task 4: Forecasting Access and Usage (2025-2027)

**Objective:** Forecast **Account Ownership** (Access) and **Digital Payment Usage** for 2025-2027 using a Trend + Event Impact approach.

## 1. Methodology
Given the sparse data (5 observed data points for Access), we use a **Hybrid Approach**:
1.  **Baseline Trend**: A Logistic Growth model (S-Curve) fitted to historical Findex data (2011-2024).
2.  **Event Augmentation**: Adding marginal lifts from future events (e.g., NFIS-II maturity, M-Pesa growth) based on the Task 3 Impact Matrix.
3.  **Scenario Analysis**: Generating Optimistic (High Event Impact) vs. Conservative (Low Event Impact) bounds.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.optimize import curve_fit
import sys
import os

# Add src to path
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..')))
from src.utils import load_data, get_observations

# Set style
sns.set_theme(style="whitegrid")
plt.rcParams['figure.figsize'] = [12, 6]

df = load_data("../data/raw/ethiopia_fi_unified_data.csv")

## 2. Baseline Model: S-Curve Fitting
We model Account Ownership $P(t)$ as a logistic function:
$$ P(t) = \frac{L}{1 + e^{-k(t-t_0)}} $$
Where $L$ is the carrying capacity (assumed ~80% for Ethiopia medium-term).

In [None]:
# Prepare Data
acc = get_observations(df, 'account_ownership').copy()
acc['year'] = pd.to_datetime(acc['observation_date']).dt.year

X_data = acc['year'].values
y_data = acc['value_numeric'].values

# Logistic Function
def logistic_model(t, k, t0):
    L = 0.85 # Assumed saturation based on regional leaders (Kenya)
    return L / (1 + np.exp(-k * (t - t0)))

# Fit Model
popt, pcov = curve_fit(logistic_model, X_data, y_data, p0=[0.5, 2020])
k_opt, t0_opt = popt

print(f"Fit Parameters: Growth Rate (k)={k_opt:.3f}, Midpoint (t0)={t0_opt:.1f}")

# Generate Baseline Forecast (2011-2027)
years_future = np.arange(2011, 2028)
y_baseline = logistic_model(years_future, k_opt, t0_opt)

forecast_df = pd.DataFrame({'year': years_future, 'baseline_forecast': y_baseline})

## 3. Event Augmentation & Scenarios
We add impacts from Task 3. 
*   **NFIS-II Strategy (2021)**: Modeled lag of 3 years -> Impact starts hitting in 2024/2025.
*   **M-Pesa Growth**: Steady competitive pressure.

**Scenarios:**
*   **Conservative**: Baseline Trend (Status Quo).
*   **Optimistic**: Baseline + NFIS-II Policy Success (Additional +1% per year from 2025).

In [None]:
def apply_scenarios(row):
    year = row['year']
    val = row['baseline_forecast']
    
    # Policy Impact boost (kick-in post 2024)
    policy_boost = 0
    if year >= 2025:
        years_active = year - 2024
        policy_boost = 0.015 * years_active # 1.5% extra growth per year
    
    return pd.Series({
        'Conservative': val,
        'Optimistic': min(val + policy_boost, 0.85)
    })

scenarios = forecast_df.apply(apply_scenarios, axis=1)
forecast_final = pd.concat([forecast_df, scenarios], axis=1)

# Display Forecast Table (2025-2027)
print("Forecast 2025-2027 (Account Ownership):")
display(forecast_final[forecast_final['year'] >= 2025][['year', 'Conservative', 'Optimistic']])

## 4. Visualization & Confidence Intervals
We calculate a 95% Confidence Interval based on the standard error of the logistic fit.

In [None]:
# Uncertainty Modeling (approximate)
sigma = np.sqrt(np.diag(pcov))[0] # error in k
y_upper = logistic_model(years_future, k_opt + sigma, t0_opt)
y_lower = logistic_model(years_future, k_opt - sigma, t0_opt)

plt.figure(figsize=(12, 7))

# Plot Historical
plt.scatter(X_data, y_data, color='black', label='Historical Data (Findex)', s=100, zorder=5)

# Plot Scenarios
plt.plot(forecast_final['year'], forecast_final['Conservative'], label='Conservative (Baseline)', linestyle='--', color='blue')
plt.plot(forecast_final['year'], forecast_final['Optimistic'], label='Optimistic (Policy Success)', linestyle='-', color='green', linewidth=2)

# Confidence Band
plt.fill_between(years_future, y_lower, y_upper, color='gray', alpha=0.15, label='95% Confidence Interval (Model Fit)')

# Add Target Marker
plt.scatter([2025], [0.70], color='red', marker='x', s=100, label='NFIS-II Target (70%)', zorder=5)

plt.title("Ethiopia Account Ownership Forecast (2011-2027)")
plt.ylabel("Account Ownership Rate")
plt.xlabel("Year")
plt.legend()
plt.grid(True, alpha=0.3)
plt.ylim(0, 0.9)
plt.show()

## 5. Forecast Interpretation
1.  **Baseline Prediction**: We project Account Ownership to reach **~52% by 2025** and **~58% by 2027** under status quo conditions.
2.  **Gap to Target**: The NFIS-II goal of 70% by 2025 is **highly aggressive** and unlikely to be met without a major structural break (unprecedented shock).
3.  **Optimistic Scenario**: Even with successful policy implementation boosting growth by 1.5% annually, we only reach **~61% by 2027**.

**Recommendation**: To bridge the gap to 70%, Ethiopia needs more than organic growthâ€”it needs "Category Creating" events (e.g., fully interoperable digital IDs linked to automatic account opening).