# Using Spatial Lag regression to estimate the impact of socio-economic factors on *Crime Rate* in Chicago


In the process of evaluating our models, we adhere to a standard statistical significance threshold of 5%.

In [1]:
import geopandas as gpd
import libpysal as  ps
import spreg
import warnings
warnings.filterwarnings('ignore')
from constants import PREPROCESSED_DATA_PATH

gdf = gpd.read_file(PREPROCESSED_DATA_PATH)

The code below creates a “Queen” contiguity weight matrix. In spatial analysis, a weight matrix defines the “neighbors” for each observation in the dataset. The “Queen” method considers two observations as neighbors if they share a common edge or a common vertex. Also, the normalization of the weight matrix using row-standardization is performed.

In [2]:
w = ps.weights.Queen.from_dataframe(gdf)
w.transform='r'

The `spreg.ML_Lag` class provides access to the ML estimation of the Maximum Likelihood Spatial Lag Model (SLM) with all results and diagnostics. SLM regression utilizing the White Population Percent variable:

In [3]:
Y_name = 'CRIME_RATE'
X_name = ['POP_DENSITY', 'LIQUOR_STORES_DENSITY', 'POP_BELOW_125_POVERTY_PCT', 'WHITE_POP_PCT', 'ASIAN_POP_PCT']

Y = gdf[Y_name].to_numpy()
X = gdf[X_name].to_numpy()

slm = spreg.ML_Lag(Y, X,  w=w, name_y=Y_name, name_x=X_name)

print(slm.summary)

REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: MAXIMUM LIKELIHOOD SPATIAL LAG (METHOD = FULL)
-----------------------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :  CRIME_RATE                Number of Observations:          77
Mean dependent var  :     61.8206                Number of Variables   :           7
S.D. dependent var  :     39.4260                Degrees of Freedom    :          70
Pseudo R-squared    :      0.8707
Spatial Pseudo R-squared:  0.8454
Log likelihood      :   -314.9700
Sigma-square ML     :    198.5522                Akaike info criterion :     643.940
S.E of regression   :     14.0909                Schwarz criterion     :     660.347

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
---------------------------------------------------------------

SLM regression utilizing the Black Population Percent variable:

In [4]:
X_name = ['POP_DENSITY', 'LIQUOR_STORES_DENSITY', 'POP_BELOW_125_POVERTY_PCT', 'BLACK_POP_PCT', 'ASIAN_POP_PCT']

X = gdf[X_name].to_numpy()

slm = spreg.ML_Lag(Y, X,  w=w, name_y=Y_name, name_x=X_name)

print(slm.summary)

REGRESSION RESULTS
------------------

SUMMARY OF OUTPUT: MAXIMUM LIKELIHOOD SPATIAL LAG (METHOD = FULL)
-----------------------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :  CRIME_RATE                Number of Observations:          77
Mean dependent var  :     61.8206                Number of Variables   :           7
S.D. dependent var  :     39.4260                Degrees of Freedom    :          70
Pseudo R-squared    :      0.8693
Spatial Pseudo R-squared:  0.8364
Log likelihood      :   -315.1734
Sigma-square ML     :    200.6803                Akaike info criterion :     644.347
S.E of regression   :     14.1662                Schwarz criterion     :     660.753

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
---------------------------------------------------------------

The SLM model shows a Pseudo R-squared value of 0.8707 where the *WHITE_POP_PCT* variable is used, and 0.8693 where *BLACK_POP_PCT* is utilized. This indicates that the SLM model is comparable to the S2SLS model, and definitely better than the OLS model.

Among all the predictors, *LIQUOR_STORES_DENSITY*, *POP_BELOW_125_POVERTY_PCT*, *WHITE_POP_PCT*, *BLACK_POP_PCT* were found to be statistically significant at the 5% level. The *ASIAN_POP_PCT* was significant only in the regression utlizing the *WHITE_POP_PCT*. These results confirm the observations from the previously used models.

The variable *W_CRIME_RATE* is significant as well, representing the spatial lag of the dependent variable *CRIME_RATE*. The positive Coefficient value indicates that the crime rate in a given location is influenced by the crime rates in neighboring locations.