# GWLinearRegression example

A complete workflow illustrating geographically weighted linear regression using the [Guerry](https://geodacenter.github.io/data-and-lab//Guerry/) dataset.

In [None]:
import geopandas as gpd
import matplotlib.pyplot as plt
from geodatasets import get_path
from sklearn import metrics

from gwlearn.linear_model import GWLinearRegression

## Data

Load the Guerry dataset - 85 French dÃ©partements with socio-moral statistics from the early 1800s.

In [None]:
gdf = gpd.read_file(get_path("geoda.guerry"))
gdf.plot().set_axis_off()

## Fitting the model

Predict the number of suicides (`Suicids`) from property crime, literacy, charitable donations, and lottery revenue.

The model uses an **adaptive** bandwidth of 25 nearest neighbours and a `tricube` kernel. A separate local linear model is fitted at each observation's representative point, with neighbouring observations down-weighted by distance. Changing `fixed=True` switches to a fixed-distance bandwidth in CRS units.

In [None]:
model = GWLinearRegression(bandwidth=25, fixed=False, kernel="tricube")
model.fit(
    gdf[["Crm_prp", "Litercy", "Donatns", "Lottery"]],
    gdf["Suicids"],
    geometry=gdf.representative_point(),
)

## Focal predictions and model fit

Focal predictions are stored in `pred_`. Each observation is predicted by the local model fitted around it, the model never sees a global relationship.

In [None]:
model.pred_

Global $R^2$ computed from focal predictions is comparable to the value reported by `mgwr`.

In [None]:
metrics.r2_score(gdf["Suicids"], model.pred_)

A local $R^2$ is also computed per location, reflecting how well each neighbourhood-level model fits.

In [None]:
model.local_r2_

Mapping local $R^2$ reveals where the model fits well and where it struggles spatially.

In [None]:
gdf.plot(model.local_r2_, legend=True).set_axis_off()

## Residuals

Focal residuals (observed minus focal prediction) are stored in `resid_`. Mapping them helps detect spatial autocorrelation in the errors, a sign that the model may be missing a spatial pattern.

In [None]:
gdf.plot(model.resid_, legend=True).set_axis_off()

## Local coefficients

`local_coef_` is a `DataFrame` with one column per predictor. Each row is the coefficient estimated by the local model at that location. Mapping these surfaces reveals **spatial non-stationarity**, where and how each predictor's relationship with suicides varies across France.

In [None]:
model.local_coef_

In [None]:
f, axs = plt.subplots(2, 2, figsize=(12, 10))

for column, ax in zip(model.local_coef_.columns, axs.flat, strict=False):
    gdf.plot(model.local_coef_[column], legend=True, ax=ax)
    ax.set_title(column)
    ax.set_axis_off()

The local intercept captures the baseline suicide level at each location after accounting for the predictors.

In [None]:
gdf.plot(model.local_intercept_, legend=True).set_axis_off()

## Bandwidth selection

The bandwidth controls how many neighbours each local model uses. A narrower bandwidth produces more spatially varying (but noisier) estimates; a wider one smooths toward a global OLS model. For data-driven bandwidth selection via cross-validation or AICc, see the [bandwidth search guide](./bandwidth_search.ipynb).

For details on predicting at new locations, see the [Prediction](./predict.ipynb) guide.