# **0. Judul & Ringkasan**
Notebook revisi GTWR-GNN (paper-style).

## **1. Pendahuluan**
Ringkasan tujuan dan alur.

### **1.1 Kontribusi & Alur**
- (i) OLS → VCM → GWR/GTWR
- (ii) Hubungkan GNNWR/SRGCNN/GSTRGCN
- (iii) Pipeline implementasi
- (iv) Perbaiki duplicate labels

## **2. Teori Dasar**
### **2.1 OLS**
Model: y = X beta + eps.

### **2.2 VCM**
Koefisien bervariasi terhadap z. Estimasi lokal via pembobotan kernel.

### **2.3 GWR & GTWR**
Kasus khusus VCM (z = (u,v) dan (u,v,t)). Estimasi ridge-WLS lokal.

In [None]:

# 6. Implementasi: Import Pustaka & Konfigurasi
import os, warnings, random, math
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.nn.functional as F

try:
    from torch_geometric.data import Data
    from torch_geometric.nn import GCNConv, GATv2Conv
    HAS_PYG = True
except Exception:
    HAS_PYG = False

from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.neighbors import kneighbors_graph
from sklearn.preprocessing import StandardScaler

warnings.filterwarnings("ignore")
SEED = 42
random.seed(SEED); np.random.seed(SEED); torch.manual_seed(SEED)
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Device:", DEVICE)

PATH_XLSX = "Data BPS Laporan KP - Coded.xlsx"
LAT_COL, LON_COL = "lat", "lon"
TIME_COL, TARGET_COL = "Tahun", "y"
FEATURE_COLS = ["X1","X2","X3","X4","X5","X6","X7","X8"]

K_SPATIAL = 8
USE_TEMPORAL = True
RIDGE_LAMBDA = 0.1


In [None]:

# 6.1 Utilitas: builder panel (FIX duplicate labels)
def build_panel_arrays(df, time_col, target_col, feature_cols, lat_col, lon_col, times_sorted):
    df = df.copy()
    df["coord_key"] = list(zip(df[lat_col].round(6), df[lon_col].round(6)))
    df = df.drop_duplicates(subset=["coord_key", time_col])
    df = df.groupby(["coord_key", time_col], as_index=False).mean(numeric_only=True)

    key_list = sorted(df["coord_key"].unique())

    X_blocks, y_blocks, coords_blocks = [], [], []
    for t in times_sorted:
        block = df[df[time_col] == t].copy().set_index("coord_key")
        if not block.index.is_unique:
            block = block[~block.index.duplicated(keep="first")]
        block = block.reindex(key_list).reset_index(drop=False)

        X_blocks.append(block[feature_cols].values.astype(np.float32))
        y_blocks.append(block[target_col].values.astype(np.float32).reshape(-1,1))
        coords_blocks.append(block[[lat_col, lon_col]].values.astype(np.float32))

    X_all = np.stack(X_blocks, axis=0)
    y_all = np.stack(y_blocks, axis=0)
    coords_blocks = np.stack(coords_blocks, axis=0)
    N_per_year = X_blocks[0].shape[0]

    return {"X_all": X_all, "y_all": y_all, "coords_blocks": coords_blocks,
            "times": np.array(times_sorted), "N_per_year": N_per_year, "key_list": key_list}


In [None]:

# 6.2 Konstruksi Graf Spasio-Temporal
def build_spatiotemporal_graph(coords_blocks, k_spatial=8, use_temporal=True):
    T, N, _ = coords_blocks.shape
    rows, cols = [], []
    for ti in range(T):
        coords = coords_blocks[ti]
        A = kneighbors_graph(coords, k_spatial, mode="connectivity", include_self=False)
        Ai = A.tocoo()
        off = ti * N
        rows.extend((Ai.row + off).tolist())
        cols.extend((Ai.col + off).tolist())
    if use_temporal:
        for ti in range(T-1):
            off_curr = ti * N
            off_next = (ti+1) * N
            rows.extend([off_curr + i for i in range(N)])
            cols.extend([off_next + i for i in range(N)])
            rows.extend([off_next + i for i in range(N)])
            cols.extend([off_curr + i for i in range(N)])
    edge_index = np.vstack([np.array(rows, dtype=np.int64), np.array(cols, dtype=np.int64)])
    return edge_index


In [None]:

# 6.3 Encoder GNN Sederhana untuk Bobot
class SimpleGCN(nn.Module):
    def __init__(self, in_dim, hidden_dim=32, out_dim=16, use_gat=False):
        super().__init__()
        self.use_gat = use_gat and HAS_PYG
        if HAS_PYG:
            if self.use_gat:
                self.conv1 = GATv2Conv(in_dim, hidden_dim, heads=1, dropout=0.0)
                self.conv2 = GATv2Conv(hidden_dim, out_dim, heads=1, dropout=0.0)
            else:
                self.conv1 = GCNConv(in_dim, hidden_dim)
                self.conv2 = GCNConv(hidden_dim, out_dim)
        else:
            self.fc1 = nn.Linear(in_dim, hidden_dim)
            self.fc2 = nn.Linear(hidden_dim, out_dim)

    def forward(self, x, edge_index=None):
        if HAS_PYG:
            h = self.conv1(x, edge_index).relu()
            h = self.conv2(h, edge_index)
            return h
        else:
            h = self.fc1(x).relu()
            h = self.fc2(h)
            return h

def compute_weights_from_embeddings(H, tau=0.5):
    with torch.no_grad():
        sim = (H @ H.t()) / max(1e-6, tau)
        W = torch.exp(sim)
        W = W / (W.sum(dim=1, keepdim=True) + 1e-8)
    return W


In [None]:

# 6.4 Ridge-WLS untuk Koefisien Lokal
def ridge_wls_local_beta(X, y, W_row, lam=0.1):
    M, p = X.shape
    w = W_row.reshape(-1,1)
    Xw = X * np.sqrt(w)
    yw = y * np.sqrt(w)
    A = Xw.T @ Xw + lam * np.eye(p)
    b = Xw.T @ yw
    try:
        beta = np.linalg.solve(A, b)
    except np.linalg.LinAlgError:
        beta = np.linalg.pinv(A) @ b
    return beta

def estimate_all_betas(X_all, y_all, W, lam=0.1):
    T, N, p = X_all.shape
    TN = T*N
    beta_hat = np.zeros((T, N, p), dtype=np.float32)
    X_flat = X_all.reshape(TN, p)
    y_flat = y_all.reshape(TN, 1)
    W_np = W.cpu().numpy()
    for idx in range(TN):
        beta = ridge_wls_local_beta(X_flat, y_flat, W_np[idx], lam=lam)
        t = idx // N
        i = idx % N
        beta_hat[t, i, :] = beta.flatten()
    return beta_hat


In [None]:

# 7. Pipeline Data -> Graf -> GNN -> Beta Lokal
def run_pipeline(df_full):
    times_sorted = sorted(df_full[TIME_COL].unique())
    P = build_panel_arrays(
        df_full, TIME_COL, TARGET_COL, FEATURE_COLS, LAT_COL, LON_COL, times_sorted
    )
    X_all, y_all = P["X_all"], P["y_all"]
    coords_blocks, times, N = P["coords_blocks"], P["times"], P["N_per_year"]
    T = X_all.shape[0]

    # Standarisasi fitur
    X_flat = X_all.reshape(T*N, -1)
    scaler = StandardScaler()
    X_flat = scaler.fit_transform(X_flat).astype(np.float32)
    y_flat = y_all.reshape(T*N, 1)

    # Graf
    edge_index_np = build_spatiotemporal_graph(coords_blocks, k_spatial=K_SPATIAL, use_temporal=USE_TEMPORAL)
    edge_index = torch.tensor(edge_index_np, dtype=torch.long)

    # GNN
    gnn = SimpleGCN(in_dim=X_flat.shape[1], hidden_dim=32, out_dim=16, use_gat=False).to(DEVICE)
    x_tensor = torch.tensor(X_flat, dtype=torch.float32, device=DEVICE)
    if HAS_PYG:
        H = gnn(x_tensor, edge_index.to(DEVICE))
    else:
        H = gnn(x_tensor)

    W = compute_weights_from_embeddings(H, tau=0.5)

    # Estimasi beta lokal
    beta_hat = estimate_all_betas(
        X_all.astype(np.float32), y_all.astype(np.float32), W, lam=RIDGE_LAMBDA
    )
    y_hat = np.einsum("tnp,tnp->tn", beta_hat, X_all).reshape(T, N, 1)

    # Evaluasi
    y_true = y_all.reshape(T*N, 1)
    y_pred = y_hat.reshape(T*N, 1)
    rmse = float(np.sqrt(mean_squared_error(y_true, y_pred)))
    mae  = float(mean_absolute_error(y_true, y_pred))
    r2   = float(r2_score(y_true, y_pred))

    return {"beta_hat": beta_hat, "y_hat": y_hat, "metrics": {"rmse": rmse, "mae": mae, "r2": r2}, "times": times}


In [None]:

# 7.1 Contoh Pemanggilan (Sesuaikan PATH!)
# df_full = pd.read_excel(PATH_XLSX)
# result = run_pipeline(df_full)
# result["metrics"]



## **8. Hasil, Diskusi, & Interpretasi**
- Koefisien lokal mengungkap variasi spasio-temporal pada pengaruh kovariat.
- Bobot W yang dipelajari lewat GNN memungkinkan kedekatan efektif di luar jarak Euclidean.
- Bandingkan dengan GTWR klasik: cek selisih metrik dan peta koefisien.



## **9. Catatan Praktis & Validasi**
1) Diagnostik: multikolinearitas lokal, stabilitas ridge, Moran's I residual.  
2) Robustness: variasikan KNN, tau, arsitektur GCN vs GAT.  
3) Potensi bias: gunakan cross-fitting bila W dilatih dengan y.



## **10. Referensi (Ringkas)**
- Hastie, T., & Tibshirani, R. (1993). Varying-Coefficient Models. JRSS-B.  
- Fan, J., & Zhang, W. (2008). Statistical Methods with Varying Coefficient Models. Springer.  
- GNNWR / GTNNWR: ringkasan paket/model yang memadukan NN dengan pembobotan untuk non-stasioneritas spasio-temporal (diakses 2025-09-30).  
- SRGCNN: Zhu, D., Liu, Y., Yao, X., & Fischer, M.M. (2022). Spatial Regression Graph Convolutional Neural Networks (SRGCNNs). GeoInformatica.  
- GSTRGCN: Lang Xiong et al. (2024). Generalized Spatial–Temporal Regression Graph Convolutional Transformer for Traffic Forecasting. Complex & Intelligent Systems.
