# GWLogisticRegression example

A complete workflow illustrating geographically weighted logistic regression using the [Guerry](https://geodacenter.github.io/data-and-lab//Guerry/) dataset.

In [None]:
import geopandas as gpd
import matplotlib.pyplot as plt
from geodatasets import get_path
from sklearn import metrics

from gwlearn.linear_model import GWLogisticRegression

## Data

Load the Guerry dataset â€” 85 French dÃ©partements with socio-moral statistics from the early 1800s.

In [None]:
gdf = gpd.read_file(get_path("geoda.guerry"))
gdf.plot().set_axis_off()

## Binary target

Due to the nature of the principle in which `gwlearn` extends scikit-learn, fitting a non-binary target would lead to inconsistent local models â€” not all classes may be present in every local neighbourhood. `gwlearn` therefore supports only **binary** logistic regression, akin to the binomial model in `mgwr`.

Split dÃ©partements into those above and at-or-below the median suicide count.

In [None]:
y = gdf["Suicids"] > gdf["Suicids"].median()

## Fitting the model

The API mirrors the linear counterpart and follows scikit-learn conventions. Predict the binary outcome from property crime, literacy, charitable donations, and lottery revenue.

In [None]:
model = GWLogisticRegression(bandwidth=25, fixed=False, max_iter=500)
model.fit(
    gdf[["Crm_prp", "Litercy", "Donatns", "Lottery"]],
    y,
    geometry=gdf.representative_point(),
)

## Focal predictions

Focal predicted probabilities are stored in `proba_` â€” the probability that the focal observation belongs to the positive class.

In [None]:
model.proba_

Binary class predictions (the class with the higher probability) are in `pred_`.

In [None]:
model.pred_

Locations with missing values were not fitted due to extreme class imbalance in their local context (fewer than 20% of local `y` values belong to the minority class). Rather than failing, `gwlearn` skips those locations and records them as not predicted.

`prediction_rate_` gives the proportion of focal locations that have a fitted local model.

In [None]:
model.prediction_rate_

## Focal metrics

Focal accuracy is computed only over locations that have a prediction.

In [None]:
na_mask = model.pred_.notna()
metrics.accuracy_score(y[na_mask], model.pred_[na_mask])

## Pooled metrics

You can also extract all observations from all local models and measure performance on that pooled set. `y_pooled_` and `pred_pooled_` aggregate the weighted training observations across every local model.

In [None]:
metrics.accuracy_score(model.y_pooled_, model.pred_pooled_)

Pooled metrics tend to be lower than focal metrics because distance-decay weighting means that observations far from a focal point are still included in pooled data, but fit less well than the focal observation itself.

You can also compute a metric separately for each local model using `local_metric`.

In [None]:
local_accuracy = model.local_metric(metrics.accuracy_score)
local_accuracy

In [None]:
gdf.plot(
    local_accuracy, legend=True, missing_kwds=dict(color="lightgray")
).set_axis_off()

## Local coefficients

`local_coef_` contains the locally estimated **log-odds** coefficients â€” one column per predictor, one row per location. Skipped locations appear as `NaN`. Mapping these surfaces shows where each predictor's association with the outcome strengthens, weakens, or reverses direction across space.

In [None]:
model.local_coef_

In [None]:
f, axs = plt.subplots(2, 2, figsize=(12, 10))

for column, ax in zip(model.local_coef_.columns, axs.flat, strict=False):
    gdf.plot(
        model.local_coef_[column],
        legend=True,
        ax=ax,
        missing_kwds=dict(color="lightgray"),
    )
    ax.set_title(column)
    ax.set_axis_off()

For more on dealing with class imbalance, see the [imbalance guide](./imbalance.ipynb). For details on predicting at new locations, see the [Prediction](./predict.ipynb) guide.