---
format: 
  html:
    toc: true
    page-layout: full
execute:
    warning: false
    echo: true
    eval: true
---

## **Poisson Regression Model**

***


To analyze the relationship between urban risk factors and the count of assaults, we implemented a **Poisson regression model**. Below is the step-by-step process undertaken to prepare the data and fit the model:

1.	**Spatial Join and Data Aggregation**:
Using the fishnet grid as the spatial framework, we performed a spatial join between the fishnet and the risk factor dataset (`variable_net`). Each data point representing a specific risk factor (e.g., graffiti, street light outages, liquor retail stores, and ShotSpotter incidents) was associated with the corresponding grid cell it intersected. This allowed for the aggregation of variables by fishnet polygons, where predictors were either summed or averaged as appropriate.

2.	**Renaming and Cleaning**:
After aggregation, the resulting columns were renamed for clarity (e.g., `Graffiti_count`, `StreetLightsOut_count`, etc.). The index column `uniqueID_left` was renamed to `uniqueID` to ensure consistency. Missing values in the dataset were replaced with zeros to handle cells without data.

3.	**Merging Datasets**:
The aggregated risk factors dataset was merged with the assault dataset (`Assault21_net`) based on shared geometry. This combined dataset enabled a comprehensive view of all relevant predictors alongside the target variable (`countAssault`).

4.	**Feature Selection**:
To prepare the dataset for modeling, the geometry column and the target variable (`countAssault`) were excluded from the features. Only numeric columns were retained for the analysis, ensuring compatibility with the regression model.

5.	**Target Variable Cleaning**:
The target variable (`countAssault`) was converted to a numeric type, with missing values replaced by zeros. This ensured the data was clean and suitable for the Poisson regression analysis.

6.	**Model Fitting**:
A **Generalized Linear Model** (GLM) with a Poisson family was employed to predict the count of assaults (`countAssault`) using the selected risk factors. The statsmodels library was utilized to fit the model, and the results were summarized to evaluate the significance and impact of the predictors.

***


### Poisson Model Summary

The Poisson regression model analyzes the impact of various predictors on the count of assaults (countAssault) across the spatial grid. Here is the detailed interpretation of the results:

1.	**Model Fit and Diagnostics**:
	- Number of Observations: The model was fitted using **1,098 grid cells**.
	- Log-Likelihood: The value of **-9330.8** suggests the model's goodness-of-fit. Lower values generally indicate a better fit, but comparisons with other models are necessary for deeper insights.
	- Deviance and Pearson Chi-Square: The deviance **(14,511)** and Pearson chi-square **(19,500)** indicate the extent of variance explained by the model.
	- Pseudo R-Squared (Cox & Snell): The value of **0.9994** shows that the model explains a substantial proportion of the variability in the data.

2.	**Predictor Significance**:
All predictors in the model have statistically significant coefficients (**p < 0.001**), suggesting they are strongly associated with the count of assaults.

3.	**Coefficients**:
Each coefficient represents the log of the expected count of assaults per unit increase in the predictor, holding other variables constant. A positive coefficient indicates that the predictor increases the assault count, while a negative coefficient would indicate the opposite.
- *UniqueID_x and UniqueID_y*: These identifiers have very small coefficients (**0.0007**), indicating minimal effect on the model. 
- *cvID* (Cross-Validation ID): The coefficient (**0.0154**) suggests a modest positive relationship with assaults, likely reflecting localized variations captured during cross-validation.
- *Local Moran’s I*: The coefficient (**0.0609**) indicates that areas with higher spatial clustering of similar values have a slight increase in assault counts.
- *Cluster*: The positive coefficient (**0.1116**) shows that being part of a spatial cluster is associated with a higher assault count.

4.	**Risk Factors**:
	- Graffiti Count: The coefficient (**0.0010**) suggests a minimal but positive association between graffiti incidents and assaults.
	- Street Lights Out Count: The coefficient (**0.1229**) shows that non-functional streetlights are strongly associated with higher assault counts.
	- Liquor Retail Count: The coefficient (**0.0015**) indicates a small but significant association between the presence of liquor retail stores and assaults.
	- ShotSpotter Count: With a coefficient of **0.0333**, incidents detected by ShotSpotter are moderately associated with assault counts.

5.	**Implications**:
	- The strong positive relationships between `StreetLightsOut_count`, `ShotSpotter_count`, and assault counts suggest that urban infrastructure and real-time crime detection play critical roles in understanding assault distributions.
	- The association with `LiquorRetail_count` highlights potential links between alcohol availability and assaults, consistent with existing criminological studies.
	- Graffiti, while a minor contributor, might indicate broader socio-environmental conditions affecting assaults.

\


In [None]:
#| code-fold: true

import statsmodels.api as sm

fishnet = fishnet.to_crs(epsg=3435)


variable_net_index = variable_net.drop(columns=['index_right'])
variable_net_poly = gpd.sjoin(fishnet, variable_net_index, how='left', predicate='intersects')

legend_counts = variable_net_poly.groupby(['uniqueID_left', 'Legend']).size().unstack(fill_value=0)


legend_counts.columns = ['Graffiti_count', 'StreetLightsOut_count', 'LiquorRetail_count', 'ShotSpotter_count']


legend_counts = legend_counts.reset_index()


legend_counts.rename(columns={'uniqueID_left': 'uniqueID'}, inplace=True)


variable_net_agg = fishnet.merge(legend_counts, on='uniqueID', how='left')


variable_net_agg = variable_net_agg.fillna(0)

combined_net = Assault21_net.merge(variable_net_agg, on='geometry', how='left')

combined_net = combined_net.fillna(0)

features = combined_net.drop(columns=['geometry', 'countAssault'])  
features = features.select_dtypes(include=[np.number]) 


combined_net['countAssault'] = pd.to_numeric(combined_net['countAssault'], errors='coerce')
combined_net['countAssault'] = combined_net['countAssault'].fillna(0)


import statsmodels.api as sm
poisson_model = sm.GLM(combined_net['countAssault'], features, family=sm.families.Poisson())
poisson_results = poisson_model.fit()


print(poisson_results.summary())

![](../images/model.jpeg){width=65%}

We then visualized the regression coefficients and their **95% confidence intervals** to better understand the relationships between the predictors and assault counts. The coefficients represent the magnitude and direction of the association for each variable, while the confidence intervals provide a range within which the true coefficient value is likely to fall 95% of the time, indicating the precision of these estimates.

A bar plot was used to visually compare the coefficients, with error bars illustrating the confidence intervals. Variables with error bars that do not overlap zero are considered statistically significant, as they show a meaningful relationship with the response variable. 

This visualization provides a quick and intuitive understanding of the predictors’ effects on assault counts. Variables with bars entirely above the zero line (y = 0), such as `StreetLightsOut_count` and `ShotSpotter_count`, exhibit a strong positive association, while those with bars overlapping zero suggest a negligible or insignificant effect. 


In [None]:
#| code-fold: true

coefficients = poisson_results.params
conf_intervals = poisson_results.conf_int()
conf_intervals.columns = ['Lower 95%', 'Upper 95%']

coef_df = pd.DataFrame({
    'Coefficient': coefficients,
    'Lower CI': conf_intervals['Lower 95%'],
    'Upper CI': conf_intervals['Upper 95%']
})

plt.figure(figsize=(10, 6))
coef_df['Coefficient'].plot(kind='bar', yerr=(coef_df['Coefficient'] - coef_df['Lower CI'], 
                                              coef_df['Upper CI'] - coef_df['Coefficient']),
                            capsize=5, color='#d0c7e1', edgecolor='#777181')
plt.grid(axis='y', linestyle='-', alpha=0.1)
plt.title('Regression Coefficients with Confidence Intervals')
plt.ylabel('Coefficient Value')
plt.xlabel('Variables')
plt.xticks(rotation=0, ha='center', fontsize=8)
plt.yticks(fontsize=8)
plt.gca().set_facecolor('white')
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.gca().spines['left'].set_color('grey')
plt.gca().spines['bottom'].set_color('grey')
plt.tight_layout()
plt.show()

![](../images/bar1.jpeg){width=75%}