Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
name: CI

on:
push:
branches: [ main ]
pull_request:

jobs:
lint-and-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.12'
- name: Install dependencies
run: |
pip install -r requirements.txt
- name: Lint with flake8
run: |
pip install flake8
flake8 src/ dashboard/
- name: Smoke-run Streamlit (health check)
run: |
streamlit run dashboard/app.py -- --headless --run-once || true
139 changes: 139 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@

# Explainable GeoAI: Interpreting Socio-Spatial Patterns

An end-to-end spatial XAI pipeline combining XGBoost, SHAP, GeoShapley, and MGWR to uncover and visualize interpretable spatial effects in U.S. county-level voting data.

---

## 📂 Repository Structure

```

explainable-geoai/
├── data/
│ ├── raw/
│ │ ├── census/ # raw ACS downloads
│ │ ├── shapefiles/ # county geometries
│ │ └── voting\_2021.csv # raw vote share
│ └── processed/
│ ├── voting\_clean.csv # cleaned tabular data
│ ├── voting\_features.csv # with engineered features & spatial lags
│ ├── xgb\_automl\_model.pkl # trained FLAML+XGBoost model
│ ├── shap\_explanations.csv # SHAP outputs
│ ├── geoshapley\_explanations.csv # GeoShapley outputs
│ ├── mgwr\_coefficients.csv # MGWR baseline
│ ├── bootstrap\_shap\_stats.csv # SHAP uncertainty stats
│ └── fairness\_metrics.csv # spatial fairness gaps
├── src/
│ ├── data\_loader.py # load & clean
│ ├── feature\_engineering.py # spatial lags, GeoDataFrame
│ ├── model\_training.py # FLAML + XGBoost training
│ ├── shap\_explainer.py # Kernel SHAP wrapper
│ ├── geoshapley\_explainer.py # GeoShapley computations
│ ├── mgwr\_comparison.py # MGWR baseline scripts
│ ├── bootstrap\_uncertainty.py # bootstrap SHAP stats
│ ├── spatial\_fairness.py # compute residual‐fairness
│ └── config.py # paths & constants
├── dashboard/
│ └── app.py # Streamlit + Folium dashboard
├── docs/
│ ├── implementation\_notes.md # detailed pipeline doc
│ └── paper\_summary.pdf # summary of Li (2025) chapter
├── README.md # this file
└── requirements.txt # pip dependencies

```

## ⚙️ Installation

1. **Clone repo**
```bash
git clone https://github.com/yourusername/explainable-geoai.git
cd explainable-geoai
```

2. **Create & activate** a virtual environment

```bash
python3 -m venv venv
source venv/bin/activate # macOS/Linux
venv\Scripts\activate # Windows
```

3. **Install dependencies**

```bash
pip install -r requirements.txt
```

4. **Download raw data**

* Place `voting_2021.csv` in `data/raw/`
* Download ACS and shapefiles via `src/download_census.py` or manually.



## 🚀 Quick Start

1. **Data & features**

```bash
python src/data_loader.py
python src/feature_engineering.py
```

2. **Train model**

```bash
python src/model_training.py
```

3. **Generate explanations**

```bash
python src/shap_explainer.py
python src/geoshapley_explainer.py
python src/mgwr_comparison.py
python src/bootstrap_uncertainty.py
python src/spatial_fairness.py
```

4. **Launch dashboard**

```bash
cd dashboard
streamlit run app.py
```



## 📝 Scripts & Modules

* **`data_loader.py`**: cleans raw vote + ACS, saves `voting_clean.csv`.
* **`feature_engineering.py`**: builds spatial lags, exports `voting_features.csv`.
* **`model_training.py`**: uses FLAML to find best XGBoost; saves model.
* **`shap_explainer.py`**: Kernel SHAP over FLAML model → `shap_explanations.csv`.
* **`geoshapley_explainer.py`**: computes GeoShapley components → `geoshapley_explanations.csv`.
* **`mgwr_comparison.py`**: fits MGWR baseline → `mgwr_coefficients.csv`.
* **`bootstrap_uncertainty.py`**: bootstraps SHAP → `bootstrap_shap_stats.csv`.
* **`spatial_fairness.py`**: calculates fairness gaps → `fairness_metrics.csv`.
* **`dashboard/app.py`**: interactive Streamlit + Folium map.


## 📊 Dashboard Overview

* **SHAP**: county-level attributions, with uncertainty.
* **GeoShapley**: decomposed intrinsic (GEO), main, and interaction effects.
* **MGWR/OLS**: local regression coefficients for comparison.
* **Fairness**: residual differences across demographic groups.
* **Download** any CSV for offline analysis.



## 🧾 Citing

If you use this work, please cite:

> Li, Ziqi (2025). *Explainable AI in Spatial Analysis*. In:
> *Advances in Spatial Data Science*, Springer.

Loading
Loading