# Empirical Final Project

### SOCI-20559 Spatial Regression Analysis

#### Empirical Assignment - 2

#### Polina Rozhkova

#### 5/5/2023

The objective of this research is to assess the spatial relationship between the accessibility of opioid treatment programs/opioid treatment medication clinics and opioid overdose fatalities across Chicago’s 77 Community Areas. Related literature measures treatment accessibility through distance as well as the number of licensed providers, the number of treatment facilities, and the number of treatment facilities accepting Medicaid (Ogneva-Himmelberger 2019). For this project I will use the number of licensed providers and the number of treatment or dispensing facilities in a community area. Additional factors that vary by environment and may affect the rate of fatal overdoses in given area include demographic data (race, percent of population living in poverty, median income, unemployment rate, percent of population that is uninsured) and accessibility to other healthcare providers such pharmacies and hospitals. The dependent variable will be the overdose mortality rate per 1000 (in an attempt to standardize across community areas that vary significantly in population size). 

Related studies employ geographically weighted regression (when the dependent variable is mortality rate) and other studies have used: logistic regressions and Poisson regression model (when the dependent variable is binary and includes all drug overdoses (fatal and non-fatal) with fatalities as 1’s). I will start by assessing a simple OLS regression and running tests for spatial effects before moving to a regression with spatial dependence. Finally, I hope to add GWR if it makes sense. 

*Background*

Intervention and public discourse around opioid-use disorder (and related substance use disorder) have led to the creation of the “opioid epidemic”, a public health emergency devastating every state in the nation, targeting predominantly white lower income Americans living in rural areas (Griffith et al. 2018, 843-844). We know however that heroin addiction and related substance-use disorders are far from a new phenomenon. As of late, it’s become evident (or more widely accepted) that prescription opioid overuse and abuse has impacted and continues to devastate individuals across racial and socioeconomic groups. Treatment options and care have been primarily targeted at white communities and have ignored the detrimental effects on Black people thereby reinforcing the racial inequities. 
Different groups require different and nuanced forms of treatment and support to live with addiction. Substance use disorder like most mental disorders can be the result of trauma, environment, negative or unlucky circumstances, and social pressures. Because addiction is so stratified, what may work for one individual might not work for another individual living in a different environment with less resources. One category of individuals with opioid use disorder may have greater access to prescription opiates, another category may be more likely to access “street drug” alternatives, and another relevant group is composed of individuals who are actively in treatment for opioid use disorder who then relapse due to decreased tolerance often resulting in death.


In [1]:
import numpy as np
import pandas as pd
import os
os.environ['USE_PYGEOS'] = '0'
import geopandas as gpd
import libpysal
import esda
import spreg

In [2]:
spreg.__version__

'1.3.2'

In [3]:
pd.set_option('display.max_columns', None)

In [4]:
path = r'/Users/polinarozhkova/Desktop/GitHub/moud_access/data_final'

In [5]:
area_shp = gpd.read_file(os.path.join(path, 'Boundaries - Community Areas (current)',
                        'geo_export_122237a7-de0c-463d-b81f-e6d53bf2e92a.shp'))
od_df = pd.read_csv(os.path.join(path, 'chicago_overdose.csv'))
dem_df = pd.read_csv(os.path.join(path, 'chicago_demograph.csv'))

### Load and Explore Data

**od_df**: Opioid related mortality records are available through the Cook County Medical Examiner’s data portal including, individual characteristics, as well as the community area and coordinates of the deceased individual’s residence. I compiled buprenorphine provider location data from the Substance Abuse and Mental Health Services Administration and pharmacy location data from the Chicago city data portal. 

**dem_df**: The Heartland Alliance gathers demographic data from the American Community Survey on community health and economics which include race, population estimates for each community area, median household income, the percentage of individuals living in poverty, unemployment rate, and the percentage of uninsured population. While all these variables are related, and will likely present multicollinearity, they indicate slightly different characteristics that might be at play in different communities.  

Community areas most impacted by opioid related overdose deaths are Austin, East Garfield, West Garfield, Humboldt Park, and North Lawndale. Though steadily increasing between 2019 and 2021, there seem to be no major changes in the areas that appear to be most heavily impacted (community areas with the highest rates of opioid related mortality in 2019 remain high through 2020 and 2021). For the project, I plan to use the 2021 opioid mortality data and demographic data from 2020—the findings for these years might not be generalizable but could be compared to cross sections from past years or future data.  

In [6]:
area_shp.head()

Unnamed: 0,POLY_ID,area,area_num_1,area_numbe,comarea,comarea_id,community,perimeter,shape_area,shape_len,geometry
0,1.0,0.0,35,35,0.0,0.0,DOUGLAS,0.0,46004620.0,31027.05451,"POLYGON ((-87.60914 41.84469, -87.60915 41.844..."
1,2.0,0.0,36,36,0.0,0.0,OAKLAND,0.0,16913960.0,19565.506153,"POLYGON ((-87.59215 41.81693, -87.59231 41.816..."
2,3.0,0.0,37,37,0.0,0.0,FULLER PARK,0.0,19916700.0,25339.08975,"POLYGON ((-87.62880 41.80189, -87.62879 41.801..."
3,4.0,0.0,38,38,0.0,0.0,GRAND BOULEVARD,0.0,48492500.0,28196.837157,"POLYGON ((-87.60671 41.81681, -87.60670 41.816..."
4,5.0,0.0,39,39,0.0,0.0,KENWOOD,0.0,29071740.0,23325.167906,"POLYGON ((-87.59215 41.81693, -87.59215 41.816..."


Fatal overdoses, buprenorphone providers, and pharmacy locations were joined to the community area shapefile and aggregated.

In [7]:
od_df.head()

Unnamed: 0,area_num_1,community,shape_area,shape_len,geometry,od_2019,od_2020,od_2021,bupren_area,pharmacy_area
0,35,DOUGLAS,46004620.0,31027.05451,POLYGON ((-87.60914087617894 41.84469250265398...,5.0,12.0,7.0,6.0,2.0
1,36,OAKLAND,16913960.0,19565.506153,POLYGON ((-87.59215283879394 41.81692934626684...,2.0,3.0,3.0,1.0,2.0
2,37,FULLER PARK,19916700.0,25339.08975,POLYGON ((-87.62879823733725 41.80189303368919...,5.0,6.0,8.0,1.0,1.0
3,38,GRAND BOULEVARD,48492500.0,28196.837157,"POLYGON ((-87.6067081256125 41.81681377057218,...",13.0,18.0,22.0,7.0,4.0
4,39,KENWOOD,29071740.0,23325.167906,POLYGON ((-87.59215283879394 41.81692934626684...,0.0,3.0,11.0,2.0,2.0


In [8]:
od_df.describe()

Unnamed: 0,area_num_1,shape_area,shape_len,od_2019,od_2020,od_2021,bupren_area,pharmacy_area
count,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0
mean,39.0,83614530.0,44397.606964,10.909091,15.844156,17.597403,7.181818,5.441558
std,22.371857,54946260.0,20090.463816,13.945554,20.069003,22.436612,17.614832,5.608953
min,1.0,16913960.0,18137.944253,0.0,0.0,0.0,0.0,0.0
25%,20.0,49769640.0,31948.59884,3.0,4.0,4.0,0.0,2.0
50%,39.0,79635750.0,43229.372704,6.0,9.0,8.0,2.0,3.0
75%,58.0,98853170.0,49478.427771,14.0,18.0,26.0,5.0,7.0
max,77.0,371835600.0,173625.98466,88.0,106.0,138.0,131.0,25.0


In [9]:
dem_df.head()

Unnamed: 0,community,% in Poverty,% in Extreme Poverty,Child Poverty Rate,% Female,% Male,% Asian,% Black,% Hispanic,% White,% Aged 0-4,% Aged 5-17,% Aged 18-24,% Aged 25-64,% Aged 65+,Total Pop.,Change in Pop. (2018 to 2019),Overall Unemp.,Unemp. 16-19 yr. olds,Unemp. 20-24 yr. olds,% no HS Diploma,% HS or GED,Some College,% Associates Deg.,% Bachelors Deg.,Forclosure,Homeownership Rate,Rent Burden,SNAP Enrollment Rate,% Cash Assist.,Uninsured Rate,Avg. Med. Household Income (2020 Dollars)
0,CHICAGO OVERALL,17.3,8.0,-0.4,51.4,48.6,6.8,28.8,28.6,33.3,6.1,14.3,9.8,57.1,12.7,2699347.0,0.0,8.1,33.4,16.4,14.1,22.0,17.2,5.7,41.1,1898.0,45.3,47.4,17.5,3.1,10.7,62097.0
1,ALBANY PARK,13.4,5.0,-0.7,49.4,50.6,12.6,5.1,45.9,33.7,5.8,16.4,9.8,58.5,9.5,49454.5,0.0,6.6,27.0,16.6,19.2,23.5,11.5,5.6,40.2,17.0,41.1,41.1,16.0,2.2,19.4,74054.1
2,ARCHER HEIGHTS,11.0,2.1,-0.6,45.4,54.6,4.5,0.8,79.9,14.7,6.7,20.3,9.3,50.4,13.3,13650.2,0.0,8.5,32.4,22.7,29.4,41.7,14.2,4.2,10.5,7.0,64.5,55.4,18.1,1.9,13.1,52218.0
3,ARMOUR SQUARE,28.0,8.2,-1.4,54.0,46.0,70.8,8.6,5.7,14.1,5.2,12.6,7.6,49.8,24.8,13352.6,0.0,6.9,40.1,48.0,33.4,25.0,9.1,3.8,28.6,-,38.9,45.3,27.1,4.1,9.0,50823.7
4,ASHBURN,13.5,5.4,-0.6,51.3,48.7,0.8,45.1,42.7,10.0,6.1,19.7,10.2,51.8,12.2,43074.3,0.0,10.5,39.8,24.4,17.8,30.7,21.8,9.5,20.2,65.0,84.7,61.5,13.2,4.0,11.8,63747.5


In [10]:
dem_df.iloc[1:].describe() 

Unnamed: 0,% in Poverty,% in Extreme Poverty,Child Poverty Rate,% Female,% Male,% Black,% Hispanic,% White,% Aged 0-4,% Aged 5-17,% Aged 18-24,% Aged 25-64,% Aged 65+,Total Pop.,Change in Pop. (2018 to 2019),Overall Unemp.,Unemp. 16-19 yr. olds,Unemp. 20-24 yr. olds,% no HS Diploma,% HS or GED,Some College,% Associates Deg.,% Bachelors Deg.,Homeownership Rate,Rent Burden,SNAP Enrollment Rate,% Cash Assist.,Uninsured Rate,Avg. Med. Household Income (2020 Dollars)
count,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0
mean,19.07013,8.907792,-0.406494,52.272727,47.727273,36.907792,26.449351,27.890909,6.238961,15.236364,9.550649,54.898701,14.07013,35060.361039,0.001299,10.536364,33.006494,19.274026,15.242857,24.890909,19.471429,6.155844,34.246753,48.47013,49.896104,21.287013,3.427273,10.607792,60061.219481
std,10.982032,6.205023,2.910523,3.330388,3.330388,38.575052,27.459035,26.347292,1.711813,4.695572,3.251705,6.716936,4.776782,23114.726472,0.025616,6.674084,21.18343,12.692511,9.146576,10.138828,7.052936,1.92494,22.080526,19.223882,11.36022,14.523849,1.806686,4.823238,24285.11598
min,3.3,1.7,-6.1,42.4,38.7,0.4,0.0,0.8,0.9,2.6,2.1,40.5,5.9,2158.2,-0.1,0.7,0.0,0.0,1.8,3.3,6.8,1.8,6.7,8.8,24.5,1.5,0.6,1.5,25700.0
25%,10.9,4.3,-2.2,50.1,45.4,3.0,5.4,4.4,5.4,11.6,7.7,50.5,10.6,18337.8,0.0,5.1,18.5,9.0,8.1,18.9,14.4,4.9,16.3,35.2,41.6,9.2,2.1,8.2,42576.2
50%,15.7,6.7,-0.3,51.7,48.3,13.3,13.2,14.7,6.3,16.0,9.2,52.8,13.2,29489.9,0.0,8.5,29.4,15.3,13.6,25.6,19.3,6.1,28.6,45.3,50.6,17.3,3.3,10.4,52596.6
75%,25.5,12.1,0.5,54.6,49.9,82.6,45.9,48.8,7.2,18.4,10.9,58.5,16.6,46747.8,0.0,16.1,48.5,27.8,20.8,32.1,25.1,7.6,43.6,64.5,58.7,33.0,4.6,12.5,74054.1
max,47.6,34.8,12.3,61.3,57.6,96.5,91.0,82.7,11.4,24.0,24.6,75.9,28.5,101392.0,0.1,29.7,100.0,51.6,41.0,42.2,42.7,10.2,85.3,91.7,72.5,61.6,9.1,24.0,128699.8


In [11]:
dem_df2 = dem_df[['community', 'Total Pop.', '% in Poverty',
                        '% in Extreme Poverty', '% Female',
                        '% Black', '% Hispanic', '% White', '% Aged 0-4',
                        '% Aged 5-17', '% Aged 18-24',
                        '% Aged 25-64', '% Aged 65+',
                        'Overall Unemp.', 'Unemp. 20-24 yr. olds',
                        '% no HS Diploma', '% HS or GED',
                        '% Bachelors Deg.', 'Homeownership Rate',
                        'SNAP Enrollment Rate', '% Cash Assist.',
                        'Uninsured Rate',
                        'Avg. Med. Household Income (2020 Dollars)']]

In [12]:
od_df2 = od_df.merge(dem_df2, how='inner', indicator=True) #drop Chicago overall

In [13]:
od_df2.shape

(77, 33)

In [14]:
od_df2.columns

Index(['area_num_1', 'community', 'shape_area', 'shape_len', 'geometry',
       'od_2019', 'od_2020', 'od_2021', 'bupren_area', 'pharmacy_area',
       'Total Pop.', '% in Poverty', '% in Extreme Poverty', '% Female',
       '% Black', '% Hispanic', '% White', '% Aged 0-4', '% Aged 5-17',
       '% Aged 18-24', '% Aged 25-64', '% Aged 65+', 'Overall Unemp.',
       'Unemp. 20-24 yr. olds', '% no HS Diploma', '% HS or GED',
       '% Bachelors Deg.', 'Homeownership Rate', 'SNAP Enrollment Rate',
       '% Cash Assist.', 'Uninsured Rate',
       'Avg. Med. Household Income (2020 Dollars)', '_merge'],
      dtype='object')

Standardizing the dependent variable, number of fatal overdoses in a given area, and the key independent variables, the concentration of buprenorphine and pharmacies in a given community area.

Overdose rate per 100 (minimum pop. is 2158)

In [15]:
def spat_intensive(df, var1, var2, var3, var4, var5):
    df[var1]  = (df[var1]/df['Total Pop.'])*100
    df[var2]  = (df[var2]/df['Total Pop.'])*100
    df[var3]  = (df[var3]/df['Total Pop.'])*100
    df[var4]  = (df[var4]/df['Total Pop.'])*100
    df[var5]  = (df[var5]/df['Total Pop.'])*100
    return df

od_df2 = spat_intensive(od_df2, 'od_2019', 'od_2020', 'od_2021', 'bupren_area', 'pharmacy_area')
od_df2

Unnamed: 0,area_num_1,community,shape_area,shape_len,geometry,od_2019,od_2020,od_2021,bupren_area,pharmacy_area,Total Pop.,% in Poverty,% in Extreme Poverty,% Female,% Black,% Hispanic,% White,% Aged 0-4,% Aged 5-17,% Aged 18-24,% Aged 25-64,% Aged 65+,Overall Unemp.,Unemp. 20-24 yr. olds,% no HS Diploma,% HS or GED,% Bachelors Deg.,Homeownership Rate,SNAP Enrollment Rate,% Cash Assist.,Uninsured Rate,Avg. Med. Household Income (2020 Dollars),_merge
0,35,DOUGLAS,4.600462e+07,31027.054510,POLYGON ((-87.60914087617894 41.84469250265398...,0.023310,0.055944,0.032634,0.027972,0.009324,21450.0,34.9,21.0,54.1,65.1,5.4,10.6,5.3,10.7,17.0,50.7,16.3,13.8,20.4,11.1,16.3,43.1,17.6,28.0,3.4,9.3,39107.7,both
1,36,OAKLAND,1.691396e+07,19565.506153,POLYGON ((-87.59215283879394 41.81692934626684...,0.028682,0.043023,0.043023,0.014341,0.028682,6973.0,28.6,10.6,56.5,86.7,5.8,4.4,9.6,21.5,7.3,51.0,10.6,19.9,37.6,8.1,21.8,37.1,24.4,34.0,4.0,6.8,41528.1,both
2,37,FULLER PARK,1.991670e+07,25339.089750,POLYGON ((-87.62879823733725 41.80189303368919...,0.226091,0.271309,0.361745,0.045218,0.045218,2211.5,47.2,14.2,51.5,86.6,6.6,5.0,3.9,14.6,12.6,40.5,28.5,24.6,27.2,23.4,37.2,10.6,21.6,57.4,4.4,8.4,35868.0,both
3,38,GRAND BOULEVARD,4.849250e+07,28196.837157,"POLYGON ((-87.6067081256125 41.81681377057218,...",0.054924,0.076049,0.092949,0.029575,0.016900,23669.0,24.8,10.3,59.2,89.6,3.3,4.1,6.7,15.0,8.2,55.7,14.4,13.7,24.8,12.1,20.7,35.9,27.4,35.5,3.1,7.7,32917.2,both
4,39,KENWOOD,2.907174e+07,23325.167906,POLYGON ((-87.59215283879394 41.81692934626684...,0.000000,0.016693,0.061206,0.011128,0.011128,17972.0,23.4,13.7,54.4,66.0,2.2,20.9,7.3,11.6,6.8,55.5,18.7,9.2,4.3,5.2,15.2,58.8,35.1,19.3,1.4,8.7,49367.4,both
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
72,74,MOUNT GREENWOOD,7.558429e+07,48665.130539,POLYGON ((-87.69645961375822 41.70714491233857...,0.015924,0.010616,0.010616,0.000000,0.015924,18840.0,4.6,3.4,52.0,3.2,10.5,81.1,7.2,17.5,9.2,52.5,13.6,4.4,7.9,4.7,22.0,40.1,82.7,2.4,0.7,3.2,79915.6,both
73,75,MORGAN PARK,9.187734e+07,46396.419362,POLYGON ((-87.64215204651398 41.68508211967084...,0.023661,0.028393,0.028393,0.009464,0.037858,21131.7,11.6,6.9,54.1,60.4,5.0,30.9,5.5,16.8,7.2,52.4,18.1,10.7,30.0,6.4,22.6,38.8,70.1,13.2,1.8,7.6,67781.7,both
74,76,OHARE,3.718356e+08,173625.984660,MULTIPOLYGON (((-87.83658087874365 41.98639611...,0.020768,0.034613,0.006923,0.000000,0.006923,14445.6,10.9,4.0,46.1,6.3,10.6,66.0,7.9,9.3,4.3,62.7,15.7,3.9,15.3,9.1,20.7,41.1,38.7,5.8,0.6,18.3,72927.1,both
75,77,EDGEWATER,4.844999e+07,31004.830946,POLYGON ((-87.65455590025104 41.99816614970252...,0.020457,0.015343,0.020457,0.005114,0.022162,58658.8,14.8,6.3,48.6,13.3,15.9,53.5,4.3,7.0,11.2,62.7,14.9,5.5,6.7,9.2,13.7,57.5,35.5,10.5,2.5,10.4,56418.1,both


In [16]:
wq = libpysal.io.open(os.path.join(path, "Boundaries - Community Areas (current)/q_order1.gal")).read()
wq.transform = 'r'
# wq.weights

In [17]:
wq.n
wq.weights['77']

[0.25, 0.25, 0.25, 0.25]

### Model Specification

I have three dependent variables, the overdose rate in 2019, the overdose rate in 2020, and the overdose rate in 2021. I will start with 2019. 

In [18]:
y_name1 = 'od_2019'
y_name2 = 'od_2020'
y_name3 = 'od_2021'

In [45]:
x_names1 = ['bupren_area', 'pharmacy_area']
x_names2 = ['bupren_area', 'pharmacy_area',
            #'community', 
            #'Total Pop.', 
            '% in Poverty',
            #'% in Extreme Poverty',
            '% Female',
            '% Black',
            '% Hispanic',
            '% White', 
            #'% Aged 0-4',
            #'% Aged 5-17',
            #'% Aged 18-24',
            #'% Aged 25-64',
            '% Aged 65+',
            'Overall Unemp.',
            #'Unemp. 20-24 yr. olds',
            '% no HS Diploma',
            '% HS or GED',
            '% Bachelors Deg.',
            #'Homeownership Rate',
            #'SNAP Enrollment Rate',
            #'% Cash Assist.',
            'Uninsured Rate',
            #'Avg. Med. Household Income (2020 Dollars)'
           ]

In [46]:
ds_name = 'od_df2'
w_name = 'q_order1'

In [47]:
y1 = np.array(od_df2[y_name1])
y2 = np.array(od_df2[y_name2])
y3 = np.array(od_df2[y_name3])
y1.shape

(77,)

In [48]:
x1 = np.array(od_df2[x_names1])
x1.shape

(77, 2)

In [49]:
x2 = np.array(od_df2[x_names2])
x2.shape

(77, 13)

### OLS Regression

In [50]:
ols1 = spreg.OLS(y1, x1, name_y=y_name1, name_x=x_names1, name_ds=ds_name)
print(ols1.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set            :      od_df2
Weights matrix      :        None
Dependent Variable  :     od_2019                Number of Observations:          77
Mean dependent var  :      0.0364                Number of Variables   :           3
S.D. dependent var  :      0.0482                Degrees of Freedom    :          74
R-squared           :      0.0640
Adjusted R-squared  :      0.0387
Sum squared residual:       0.165                F-statistic           :      2.5306
Sigma-square        :       0.002                Prob(F-statistic)     :     0.08649
S.E. of regression  :       0.047                Log likelihood        :     127.325
Sigma-square ML     :       0.002                Akaike info criterion :    -248.651
S.E of regression ML:      0.0463                Schwarz criterion     :    -241.619

-----------------------------------------------------------------------------

In [51]:
ols2 = spreg.OLS(y1, x2, name_y=y_name1, name_x=x_names2, name_ds=ds_name)

In [52]:
print(ols2.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set            :      od_df2
Weights matrix      :        None
Dependent Variable  :     od_2019                Number of Observations:          77
Mean dependent var  :      0.0364                Number of Variables   :          14
S.D. dependent var  :      0.0482                Degrees of Freedom    :          63
R-squared           :      0.6701
Adjusted R-squared  :      0.6020
Sum squared residual:       0.058                F-statistic           :      9.8433
Sigma-square        :       0.001                Prob(F-statistic)     :    8.39e-11
S.E. of regression  :       0.030                Log likelihood        :     167.473
Sigma-square ML     :       0.001                Akaike info criterion :    -306.945
S.E of regression ML:      0.0275                Schwarz criterion     :    -274.132

-----------------------------------------------------------------------------

### LM Diagnostics 

### Spatial Lag Model 

In [53]:
ols1b = spreg.OLS(y1, x1, w=wq, name_y=y_name1, name_x=x_names1, name_ds=ds_name,
                 spat_diag=True, moran=True, name_w=w_name)
print(ols1b.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set            :      od_df2
Weights matrix      :    q_order1
Dependent Variable  :     od_2019                Number of Observations:          77
Mean dependent var  :      0.0364                Number of Variables   :           3
S.D. dependent var  :      0.0482                Degrees of Freedom    :          74
R-squared           :      0.0640
Adjusted R-squared  :      0.0387
Sum squared residual:       0.165                F-statistic           :      2.5306
Sigma-square        :       0.002                Prob(F-statistic)     :     0.08649
S.E. of regression  :       0.047                Log likelihood        :     127.325
Sigma-square ML     :       0.002                Akaike info criterion :    -248.651
S.E of regression ML:      0.0463                Schwarz criterion     :    -241.619

-----------------------------------------------------------------------------

In [54]:
ols2b = spreg.OLS(y1, x2, w=wq, name_y=y_name1, name_x=x_names2, name_ds=ds_name,
                 spat_diag=True, moran=True, name_w=w_name)
print(ols2b.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set            :      od_df2
Weights matrix      :    q_order1
Dependent Variable  :     od_2019                Number of Observations:          77
Mean dependent var  :      0.0364                Number of Variables   :          14
S.D. dependent var  :      0.0482                Degrees of Freedom    :          63
R-squared           :      0.6701
Adjusted R-squared  :      0.6020
Sum squared residual:       0.058                F-statistic           :      9.8433
Sigma-square        :       0.001                Prob(F-statistic)     :    8.39e-11
S.E. of regression  :       0.030                Log likelihood        :     167.473
Sigma-square ML     :       0.001                Akaike info criterion :    -306.945
S.E of regression ML:      0.0275                Schwarz criterion     :    -274.132

-----------------------------------------------------------------------------

There appears to be strong evidence of spatial misspecification from the Moran's I statistic. 

Robust LM Lag statistic remains signfiicant so I will proceed to run a spatial lag model. Multicollinearity is high and I should probably remove some of the explanatory variables. 

## Spatial Regimes

For the regimes variable:

In [61]:
rvar = od_df2['regionno']
od_df2 = od_df2.astype({rvar:'int'})
regimes = od_df2[rvar].tolist()
type(regimes)

TypeError: unhashable type: 'Series'

In [None]:
spreg.ML_Lag_Regimes
spreg.ML_Error_Regimes