This repository contains the code for the paper
Doubly-Unlinked Regression for Dependent Data Anik Burman¹, Sayantan Choudhury², Debangan Dey³ ¹Department of Biostatistics, Johns Hopkins University ²Department of Statistics and Data Science, MBZUAI ³Department of Statistics, Texas A&M University
REPAIR (REgression with Permutation Alignment via varIational infeRence) is a variational Bayes method for spatial regression when both the covariate–outcome linkage and the response–location linkage within spatial blocks are unknown.
Consider
where
REPAIR jointly recovers
A self-contained walkthrough is provided in vignette.ipynb. It covers:
- Generating synthetic doubly-unlinked data for any rectangular
$B = n_\text{rows} \times n_\text{cols}$ block grid - Fitting the FullGP oracle baseline (true links known)
- Fitting the ArealGP baseline (block-averaged data)
- Running REPAIR and inspecting ELBO convergence
- Evaluating permutation recovery accuracy
- Comparing parameter estimates across all three methods
SpatialReg-Unlinked/
├── vignette.ipynb # ← start here
├── src/
│ ├── helper_functions.py # clean API (generate data, train models, evaluate)
│ ├── revised_VIGP_Unlinked.py # core REPAIR implementation
│ ├── revised_elbo.py # variational distributions q(π_X), q(π_S)
│ ├── utils.py # Sinkhorn, Hungarian, GP kernel utilities
│ ├── GPModel.py # FullGP (oracle) likelihood model
│ ├── GPArealModel.py # ArealGP (block-averaged) likelihood model
│ ├── analysis_varyB_annealed.py # simulation study script (vary B, K, φ, seed)
│ ├── data_analysis_2.ipynb # real-data analysis (dataset 1)
│ ├── data_analysis_3.ipynb # real-data analysis (dataset 2)
│ └── generate_data_new.ipynb # data generation notebook
├── Figures/ # paper figures
├── data/ # simulation inputs and results (not tracked)
└── notebooks/ # exploratory notebooks
import sys
sys.path.insert(0, 'src')
from helper_functions import (
generate_synthetic_data,
train_oracle_gp,
train_areal_gp,
run_vigp_unlinked,
permutation_accuracy,
summarise_results,
)
# Generate doubly-unlinked data — any rectangular grid works
# B=6 → 2×3, B=12 → 3×4, B=9 → 3×3 (auto-detected)
data = generate_synthetic_data(B=9, n_i=5, beta_true=1.5, phi_true=2.0, seed=42)
# Fit all three methods
oracle = train_oracle_gp(s=data["s"], x=data["x"], y=data["y"])
areal = train_areal_gp(s_perm=data["s_perm"],
region_assignments=data["region_assignments"],
x_perm_flat=data["x_perm_flat"], y=data["y"])
repair = run_vigp_unlinked(data=data, n_iter=50, seed=521)
# Permutation recovery
print(permutation_accuracy(repair["est_perm_x"], data["perm_matrix_x"]))
# Parameter comparison table (True / FullGP / ArealGP / REPAIR)
print(summarise_results(oracle, areal, repair, data))The simulation script varies the number of blocks
cd src
python analysis_varyB_annealed.py <B> <K> <seed> [phi]
# e.g.: python analysis_varyB_annealed.py 9 5 1 2.0Results are saved to data/results/vary_B2_annealed/B_{B}_n_{K}/phi_{phi}/results_seed_{seed}.pt.
python >= 3.10
torch
numpy
scipy
pandas
matplotlib
tqdm
Install with:
pip install torch numpy scipy pandas matplotlib tqdmIf you use this code, please cite:
@article{burman2025doubly,
title = {Doubly-Unlinked Regression for Dependent Data},
author = {Burman, Anik and Choudhury, Sayantan and Dey, Debangan},
journal = {},
year = {2025}
}