# Task 4: Forecasting Access and Usage (2025–2027)

This notebook produces point forecasts and uncertainty bounds for **Account Ownership (Access)** and **Digital Payment Usage** for 2025, 2026, and 2027 using baseline trend, event-augmented trend, and scenario analysis. No ML models are used; methods are simple and interpretable given sparse data.

## 1. Forecast targets (definitions)

- **Account Ownership Rate (Access)**  
  % of adults (15+) with an account at a bank or another type of financial institution, or with a mobile money account.

- **Digital Payment Usage**  
  % of adults (15+) who made or received digital payments in the past year.

**Forecast horizon:** 2025, 2026, 2027.

In [None]:
import sys
sys.path.append("..")
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path

from src.data_loading import load_processed_enriched, load_unified_dataset
from src.forecasting import (
    baseline_trend_forecast,
    event_augmented_forecast,
    scenario_forecasts,
)

FORECAST_YEARS = [2025, 2026, 2027]
INDICATOR_ACCESS = "ACC_OWNERSHIP"
INDICATOR_USAGE = "USG_DIGITAL_PAYMENT"

In [None]:
# Load data: prefer processed enriched; fallback to raw
processed_path = Path("../data/processed/ethiopia_fi_enriched.xlsx")
if processed_path.exists():
    data, events, impact_links = load_processed_enriched(str(processed_path))
else:
    raw = load_unified_dataset("../data/raw/ethiopia_fi_unified_data.xlsx")
    data = raw[raw["record_type"].isin(["observation", "target"])].copy()
    events = raw[raw["record_type"] == "event"].copy()
    impact_links = raw[raw["record_type"] == "impact_link"].copy()

obs = data[data["record_type"] == "observation"].copy()
obs["observation_date"] = pd.to_datetime(obs["observation_date"])
# Map indicator names if different in data
codes = obs["indicator_code"].dropna().unique().tolist()
if INDICATOR_ACCESS not in codes and codes:
    INDICATOR_ACCESS = [c for c in codes if "OWN" in str(c).upper() or "ACC" in str(c).upper()] or codes[0]
    if isinstance(INDICATOR_ACCESS, list):
        INDICATOR_ACCESS = INDICATOR_ACCESS[0]
if INDICATOR_USAGE not in codes and codes:
    INDICATOR_USAGE = [c for c in codes if "DIGITAL" in str(c).upper() or "USG" in str(c).upper() or "PAY" in str(c).upper()] or (codes[1] if len(codes) > 1 else codes[0])
    if isinstance(INDICATOR_USAGE, list):
        INDICATOR_USAGE = INDICATOR_USAGE[0]
print("Access indicator:", INDICATOR_ACCESS, "Usage indicator:", INDICATOR_USAGE)
print("Observations:", len(obs))

## 2. Modeling approach

**A. Baseline trend**  
Linear regression of indicator value on year. Justified by sparse, low-frequency data (~5 points over ~13 years).

**B. Event-augmented trend**  
Baseline trend plus cumulative impacts from Task 3 events (e.g. Telebirr, ID rollout, policy/infrastructure).

**C. Scenario analysis**  
- **Pessimistic:** Slowed adoption, low event effectiveness (scale 0.5).  
- **Base:** Continuation of recent trends, central event scale.  
- **Optimistic:** Accelerated uptake, higher event effectiveness (scale 1.5).

In [None]:
# Baseline trend forecasts (point + regression intervals)
base_access = baseline_trend_forecast(obs, INDICATOR_ACCESS, FORECAST_YEARS, confidence=0.95)
base_usage = baseline_trend_forecast(obs, INDICATOR_USAGE, FORECAST_YEARS, confidence=0.95)
print("Baseline trend — Access"); display(base_access)
print("Baseline trend — Usage"); display(base_usage)

In [None]:
# Event-augmented forecasts
evt_access = event_augmented_forecast(obs, INDICATOR_ACCESS, FORECAST_YEARS, events, impact_links, event_scale=1.0, confidence=0.95)
evt_usage = event_augmented_forecast(obs, INDICATOR_USAGE, FORECAST_YEARS, events, impact_links, event_scale=1.0, confidence=0.95)
print("Event-augmented — Access"); display(evt_access)
print("Event-augmented — Usage"); display(evt_usage)

In [None]:
# Scenario forecasts (pessimistic / base / optimistic)
scen_access = scenario_forecasts(obs, INDICATOR_ACCESS, FORECAST_YEARS, events, impact_links)
scen_usage = scenario_forecasts(obs, INDICATOR_USAGE, FORECAST_YEARS, events, impact_links)
scen_all = pd.concat([scen_access, scen_usage], ignore_index=True)
display(scen_all.head(15))

## 3. Uncertainty

- **Regression intervals:** Baseline uses OLS prediction intervals (approximate t) so lower/upper reflect sampling and trend uncertainty.  
- **Scenario ranges:** Pessimistic vs optimistic give a range; base is central.  
- **Why uncertainty is large:** Only ~5 Findex points over ~13 years, so trend slope is imprecise; event effectiveness is expert-based, not estimated; point precision is limited and intervals are wide.

## 4. Visualizations

In [None]:
# Historical + baseline forecast (Access)
obs["year"] = pd.to_datetime(obs["observation_date"]).dt.year
hist_acc = obs[obs["indicator_code"] == INDICATOR_ACCESS][["year", "value_numeric"]].drop_duplicates("year").sort_values("year")
fig, ax = plt.subplots(figsize=(9, 5))
ax.plot(hist_acc["year"], hist_acc["value_numeric"], "o-", label="Historical", color="C0")
ax.plot(base_access["year"], base_access["forecast"], "s--", label="Baseline forecast")
ax.fill_between(base_access["year"], base_access["lower"], base_access["upper"], alpha=0.3)
ax.set_title("Account Ownership (Access): Historical + Baseline Forecast")
ax.set_ylabel("% adults")
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Historical + baseline (Usage)
hist_use = obs[obs["indicator_code"] == INDICATOR_USAGE][["year", "value_numeric"]].drop_duplicates("year").sort_values("year")
fig, ax = plt.subplots(figsize=(9, 5))
ax.plot(hist_use["year"], hist_use["value_numeric"], "o-", label="Historical", color="C0")
ax.plot(base_usage["year"], base_usage["forecast"], "s--", label="Baseline forecast")
ax.fill_between(base_usage["year"], base_usage["lower"], base_usage["upper"], alpha=0.3)
ax.set_title("Digital Payment Usage: Historical + Baseline Forecast")
ax.set_ylabel("% adults")
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Scenario fan chart (Usage)
pess_u = scen_usage[scen_usage["scenario"] == "pessimistic"]
base_u = scen_usage[scen_usage["scenario"] == "base"]
opt_u = scen_usage[scen_usage["scenario"] == "optimistic"]
fig, ax = plt.subplots(figsize=(9, 5))
if len(hist_use):
    ax.plot(hist_use["year"], hist_use["value_numeric"], "o-", label="Historical", color="k")
ax.fill_between(base_u["year"], pess_u["forecast"], opt_u["forecast"], alpha=0.35, color="C2")
ax.plot(base_u["year"], base_u["forecast"], "s-", label="Base", color="C2")
ax.plot(pess_u["year"], pess_u["forecast"], "v--", label="Pessimistic", alpha=0.8)
ax.plot(opt_u["year"], opt_u["forecast"], "^--", label="Optimistic", alpha=0.8)
ax.set_title("Usage: Scenario fan (pessimistic / base / optimistic)")
ax.set_ylabel("% adults")
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Event-augmented vs baseline comparison (Usage)
fig, ax = plt.subplots(figsize=(9, 5))
if len(hist_use):
    ax.plot(hist_use["year"], hist_use["value_numeric"], "o-", label="Historical", color="C0")
ax.plot(base_usage["year"], base_usage["forecast"], "s--", label="Baseline trend")
ax.plot(evt_usage["year"], evt_usage["forecast"], "^-.", label="Event-augmented")
ax.fill_between(evt_usage["year"], evt_usage["lower"], evt_usage["upper"], alpha=0.2)
ax.set_title("Usage: Baseline vs event-augmented forecast")
ax.set_ylabel("% adults")
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Scenario fan chart (Access)
pess = scen_access[scen_access["scenario"] == "pessimistic"]
base = scen_access[scen_access["scenario"] == "base"]
opt = scen_access[scen_access["scenario"] == "optimistic"]
fig, ax = plt.subplots(figsize=(9, 5))
ax.plot(hist_acc["year"], hist_acc["value_numeric"], "o-", label="Historical", color="k")
ax.fill_between(base["year"], pess["forecast"], opt["forecast"], alpha=0.35, color="C1")
ax.plot(base["year"], base["forecast"], "s-", label="Base", color="C1")
ax.plot(pess["year"], pess["forecast"], "v--", label="Pessimistic", alpha=0.8)
ax.plot(opt["year"], opt["forecast"], "^--", label="Optimistic", alpha=0.8)
ax.set_title("Access: Scenario fan (pessimistic / base / optimistic)")
ax.set_ylabel("% adults")
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

In [None]:
# Event-augmented vs baseline comparison (Access)
fig, ax = plt.subplots(figsize=(9, 5))
ax.plot(hist_acc["year"], hist_acc["value_numeric"], "o-", label="Historical", color="C0")
ax.plot(base_access["year"], base_access["forecast"], "s--", label="Baseline trend")
ax.plot(evt_access["year"], evt_access["forecast"], "^-.", label="Event-augmented")
ax.fill_between(evt_access["year"], evt_access["lower"], evt_access["upper"], alpha=0.2)
ax.set_title("Access: Baseline vs event-augmented forecast")
ax.set_ylabel("% adults")
ax.legend()
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

## 5. Forecast table (Indicator, Year, Scenario, Forecast, Lower, Upper)

In [None]:
table = scen_all.rename(columns={"forecast": "Forecast", "lower": "Lower", "upper": "Upper", "indicator": "Indicator", "year": "Year", "scenario": "Scenario"})
table = table[["Indicator", "Year", "Scenario", "Forecast", "Lower", "Upper"]]
table["Forecast"] = table["Forecast"].round(2)
table["Lower"] = table["Lower"].round(2)
table["Upper"] = table["Upper"].round(2)
display(table)

## 6. Interpretation and insights

**A. What does the model predict?**  
Direction and speed of change: Both Access and Usage are projected to rise over 2025–2027 under the base scenario, with Usage often growing faster than Access (digital payments adoption can outpace account ownership). Differences between indicators reflect historical slopes and event coverage.

**B. What events matter most?**  
Telebirr and broader mobile money rollout move both Access and Usage; policy/ID rollout events add to Access. Mechanism: new accounts and channels directly increase ownership and payment use.

**C. Key uncertainties**  
Data sparsity (~5 points), expert-based event effectiveness, and risk of structural breaks (e.g. new policies or shocks) mean intervals and scenario bands are wide; point precision is limited.

**D. Policy relevance**  
Forecasts support tracking progress toward financial inclusion goals (e.g. NFIS-II, BRIDGE 2030); scenario ranges inform planning under different adoption and policy assumptions.

## 7. Methodology and limitations

**Assumptions**  
- Linear trends in year.  
- Stable policy environment; no unmodeled shocks.  
- Event impacts additive and scaled by scenario.

**Limitations**  
- Small sample size (few Findex points).  
- No causal inference; event effects are expert-driven.  
- Structural change (e.g. new regulation or crisis) can invalidate extrapolation.