## How gwlearn Differs from MGWR

`gwlearn` is a modern, flexible alternative to traditional geographically weighted regression libraries like [`mgwr`](https://github.com/pysal/mgwr). While both enable spatial regression analysis, they have fundamentally different design philosophies.

### Setup: Load Common Data

Let's start by loading a common dataset that can be used with both gwlearn and mgwr.

In [1]:
import geopandas as gpd
import numpy as np
import pandas as pd
import time
from geodatasets import get_path
from sklearn import metrics

# Load dataset
gdf = gpd.read_file(get_path("geoda.guerry"))
X = gdf[['Crm_prp', 'Litercy', 'Donatns', 'Lottery']]
y = gdf['Suicids']
geom = gdf.representative_point()
coords = np.column_stack([geom.x, geom.y]) #For mgwr


### API Difference:


In [3]:
# NOTE: run the same cell twice to get an accurate result:
# The first gwlearn run is slower due to setup; later runs are faster.

# gwlearn
from gwlearn.linear_model import GWLinearRegression

start = time.time()
gw = GWLinearRegression(geometry=geom, bandwidth=25, fixed=False)
gw.fit(X, y)
gw_time = time.time() - start
print(f"gwlearn R²: {metrics.r2_score(y, gw.pred_):.4f}  AICc: {gw.aicc_:.1f}  Time: {gw_time:.2f}s")

# mgwr
from mgwr.gwr import GWR

start = time.time()
mg = GWR(coords, y.values.reshape(-1,1), X.values, bw=25, fixed=False).fit()
mg_time = time.time() - start
print(f"mgwr R²: {metrics.r2_score(y, mg.predy):.4f}  AICc: {mg.aicc:.1f}  Time: {mg_time:.2f}s")

print(f"\n  mgwr is {gw_time/mg_time:.1f}x faster for basic linear GWR")

gwlearn R²: 0.7251  AICc: 2007.4  Time: 0.54s
mgwr R²: 0.7251  AICc: 2011.2  Time: 0.09s

  mgwr is 6.2x faster for basic linear GWR


### Why the Timing Difference?

Here, mgwr estimates all local coefficients in a single vectorized computation using optimized linear algebra routines.

While gwlearn fits a separate local model for each location using a scikit-learn estimator.
As a result, the two libraries differ in how model fitting work is distributed across Python and compiled routines.

### Differences in API formats

**MGWR**: Custom API with coordinate arrays

In [4]:
from mgwr.gwr import GWR

# mgwr requires: numpy arrays, coords as separate array, y reshaped
mgwr_model = GWR(
    coords=coords,                    # Coordinate array (n, 2)
    y=y.values.reshape(-1, 1),        # Must reshape to (n, 1)
    X=X.values,                       # Feature array
    bw=25,
    fixed=False,
    kernel='bisquare'
)
mgwr_results = mgwr_model.fit()

print("MGWR API:")
print(f"- Input: numpy arrays + coordinate matrix")
print(f"- Output: Custom results object")
print(f"- R²: {metrics.r2_score(y, mgwr_results.predy):.4f}")

MGWR API:
- Input: numpy arrays + coordinate matrix
- Output: Custom results object
- R²: 0.7251


  
**gwlearn**: sklearn-compatible API with GeoSeries

In [5]:
from gwlearn.linear_model import GWLinearRegression

# gwlearn uses: pandas DataFrames, GeoSeries for geometry
gwlearn_model = GWLinearRegression(
    geometry=geom,                    # GeoSeries (integrated)
    bandwidth=25,
    fixed=False,
    kernel='bisquare'
)
gwlearn_model.fit(X, y)               # Standard sklearn .fit(X, y)

print("gwlearn API:")
print(f"- Input: pandas DataFrame + GeoSeries")
print(f"- Output: Fitted estimator with attributes")
print(f"- R²: {metrics.r2_score(y, gwlearn_model.pred_):.4f}")

gwlearn API:
- Input: pandas DataFrame + GeoSeries
- Output: Fitted estimator with attributes
- R²: 0.7251


### Difference in Model Fitting: Same Results, Different Approach

In [6]:
print("gwlearn vs MGWR output comparison\n")

print(f"R²: gwlearn={metrics.r2_score(y, gwlearn_model.pred_):.4f}, mgwr={metrics.r2_score(y, mgwr_results.predy):.4f}")
print(f"AIC: gwlearn={gwlearn_model.aic_:.2f}, mgwr={mgwr_results.aic:.2f}")
print(f"BIC: gwlearn={gwlearn_model.bic_:.2f}, mgwr={mgwr_results.bic:.2f}")
print(f"AICc: gwlearn={gwlearn_model.aicc_:.2f}, mgwr={mgwr_results.aicc:.2f}")

gwlearn vs MGWR output comparison

R²: gwlearn=0.7251, mgwr=0.7251
AIC: gwlearn=1960.75, mgwr=1960.75
BIC: gwlearn=2045.63, mgwr=2045.63
AICc: gwlearn=2007.43, mgwr=2011.20


Note: ```gwlearn``` and ```mgwr``` produce the same predictions and fitted values, but AICc can differ slightly. This is because each library uses a different method to count effective model parameters. The difference reflects how model complexity is measured, not a difference in model fit.

### Bandwidth search comparison 


In [7]:
#mgwr
from mgwr.sel_bw import Sel_BW

start = time.time()
mgwr_selector = Sel_BW(coords, y.values.reshape(-1, 1), X.values, fixed=False)
mgwr_bw = mgwr_selector.search(criterion='AICc')
mgwr_bw_time = time.time() - start

print(f"mgwr optimal bandwidth: {mgwr_bw}")
print(f"Time: {mgwr_bw_time:.2f}s")

mgwr optimal bandwidth: 70.0
Time: 0.62s


To find out the optimal bandwidth, gwlearn provides a BandwidthSearch class, which trains models on a range of bandwidths and selects the most optimal one.

In [9]:
from gwlearn.search import BandwidthSearch

start = time.time()
gwlearn_search = BandwidthSearch(
    GWLinearRegression,
    geometry=geom,
    fixed=False,
    criterion='aicc',
    min_bandwidth=10,
    max_bandwidth=80,
)
gwlearn_search.fit(X, y)
gwlearn_bw_time = time.time() - start

print(f"gwlearn optimal bandwidth: {gwlearn_search.optimal_bandwidth_}")
print(f"Time: {gwlearn_bw_time:.2f}s")
print(f"\nSpeedup: mgwr is {gwlearn_bw_time/mgwr_bw_time:.1f}x faster for bandwidth search")

gwlearn optimal bandwidth: 69
Time: 3.97s

Speedup: mgwr is 6.4x faster for bandwidth search


gwlearn's `BandwidthSearch` also supports interval search to track multiple metrics across bandwidths:

In [10]:
interval_search = BandwidthSearch(
    GWLinearRegression,
    geometry=geom,
    fixed=False,
    search_method='interval',
    criterion='aicc',
    metrics=['aic', 'bic'],
    min_bandwidth=15,
    max_bandwidth=60,
    interval=5,
)
interval_search.fit(X, y)

print("Scores at each bandwidth:")
print(interval_search.scores_)
print(f"\nOptimal bandwidth: {interval_search.optimal_bandwidth_}")

Scores at each bandwidth:
15    2112.050494
20    2032.033644
25    2007.433836
30    1991.763417
35    1983.540290
40    1978.314198
45    1974.192606
50    1971.369979
55    1970.657519
60    1969.622931
Name: aicc, dtype: float64

Optimal bandwidth: 60


### Model Support Comparison

**MGWR**: Only linear regression  
**gwlearn**: Any sklearn estimator

In [11]:
from gwlearn.base import BaseRegressor
from sklearn.linear_model import Ridge

gw_ridge = BaseRegressor(
    model=Ridge,
    geometry=geom,
    bandwidth=25,
    fixed=False,
    alpha=1.0 # Regularization parameter passed to Ridge
)
gw_ridge.fit(X, y)

print(f"GW-Ridge R²: {metrics.r2_score(y, gw_ridge.pred_):.4f}")

GW-Ridge R²: 0.1235
