------------

# Spatial Panel Models with Fixed Effects

* **This notebook uses the [Panel_FE_Lag](https://pysal.org/spreg/generated/spreg.Panel_FE_Lag.html#spreg.Panel_FE_Lag) and [Panel_FE_Error](https://pysal.org/spreg/generated/spreg.Panel_FE_Error.html#spreg.Panel_FE_Error) classes.**


In [91]:
import numpy
import libpysal
import libpysal.weights as lpw
import spreg
import pandas as pd
import pysal as ps
from datetime import datetime

In [92]:
#df_dummies= pd.read_csv(r"C:\Users\PcLaptop\Documents\GitHub\Climate-and-conflict\df_with_dummies.csv")
#states_gdf = r"C:\Users\PcLaptop\Documents\GitHub\Climate-and-conflict\Datasets\som_adm_ocha_itos_20230308_shp\som_admbnda_adm1_ocha_20230308.shp"

df_dummies= pd.read_csv(r"/home/sara/Documenti/GitHub/Climate-and-conflict/df_CRU4_lag5.csv")
states_gdf = r"/home/sara/Documenti/GitHub/Climate-and-conflict/Datasets/som_adm_ocha_itos_20230308_shp/som_admbnda_adm1_ocha_20230308.shp"

In [93]:
#read xlsx file
df = pd.read_excel(r"/home/sara/Documenti/GitHub/Climate-and-conflict/displacements/UNHCR-PRMN-Displacement-Dataset - Somalia.xlsx")
#in current arrival region column substitute spaces with _
df['Current (Arrival) Region'] = df['Current (Arrival) Region'].str.replace(' ', '_')
df['Previous (Departure) Region'] = df['Previous (Departure) Region'].str.replace(' ', '_')

In [94]:
v= df["Month End"]
v=v.values
dt = [datetime.strptime(v[i], "%d/%m/%Y") for i in range(len(v))]
q=[]

for i in range(len(dt)):
    q.append(datetime.timestamp(dt[i]))
    
df.insert(loc=3, column='date_timestamp', value=q)
df = df.sort_values("date_timestamp")

df['Month End'] = pd.to_datetime(df['Month End'], dayfirst=True)

In [95]:
aggregated_data = df.groupby([pd.Grouper(key='Month End', freq='M'),'Previous (Departure) Region', 'Current (Arrival) Region'])['Number of Individuals'].sum().to_frame()

In [96]:
dates = aggregated_data.index.get_level_values('Month End').unique()
districts = aggregated_data.index.get_level_values('Previous (Departure) Region').unique()
all_combinations = pd.MultiIndex.from_product([dates, districts,districts], names=['time', 'Previous (Departure) Region','Current (Arrival) Region'])

disp_data = aggregated_data.reindex(all_combinations, fill_value=0).reset_index()   

In [97]:
disp_matxs = disp_data.pivot_table(index=['time','Current (Arrival) Region'], columns='Previous (Departure) Region', values='Number of Individuals', aggfunc='sum').reset_index()
#rename column current arrival region
disp_matxs = disp_matxs.rename(columns={'Current (Arrival) Region': 'admin1'})

In [98]:
#remove the day from the date
disp_matxs['yr_mth'] = disp_matxs['time'].map(lambda x: x.strftime('%Y-%m'))
df_dummies['yr_mth'] = pd.to_datetime(df_dummies['time'], dayfirst=True).map(lambda x: x.strftime('%Y-%m'))

df_merged = pd.merge(df_dummies, disp_matxs, on=['yr_mth', 'admin1'], how='left')

--------------------

## Spatial Lag model

Let's estimate a spatial lag panel model with fixed effects:

$$
y = \rho Wy + X\beta + \mu_i + e
$$

In [99]:
#add a column with the sum of the displacements
df_merged['sum_disp'] = df_merged.iloc[:, 61:].sum(axis=1)

In [100]:
w = libpysal.weights.Queen.from_shapefile(states_gdf)
w.transform = 'r'

# Define dependent variable
name_y = ["conflicts"]
y = numpy.array([df_dummies[name] for name in name_y]).T

# Define independent variables
name_x = ['TA_lag1','PA_lag5','DL_lag5']
x = numpy.array([df_dummies[name] for name in name_x]).T



In [106]:
name_x = ['TA_lag1','PA_lag5','DL_lag5']
x = numpy.array([df_dummies[name] for name in name_x]).T

In [107]:
fe_lag = spreg.Panel_FE_Lag(y, x, w, name_y=name_y, 
                            name_x=name_x, name_ds="df_dummies")

In [None]:
print(fe_lag.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: MAXIMUM LIKELIHOOD SPATIAL LAG PANEL - FIXED EFFECTS
-----------------------------------------------------------------------
Data set            :  df_dummies
Weights matrix      :     unknown
Dependent Variable  :   conflicts                Number of Observations:        5616
Mean dependent var  :      0.0000                Number of Variables   :           4
S.D. dependent var  :     10.5150                Degrees of Freedom    :        5612
Pseudo R-squared    :      0.3183
Spatial Pseudo R-squared:  0.1683
Sigma-square ML     :      76.525                Log likelihood        :  -44565.340
S.E of regression   :       8.748                Akaike info criterion :   89138.681
                                                 Schwarz criterion     :   89165.214

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
-----------------

In [134]:
df_merged = df_merged.dropna()
#df_dummies = df_dummies[df_dummies['yr_mth'] >= '2016-01']

# Define dependent variable
name_y = ["conflicts"]
y = numpy.array([df_merged[name] for name in name_y]).T

# include as independent variables the displacements for each time
name_x = ['TA_lag1','PA_lag5','DL_lag5','Awdal_y','Bakool_y','Banadir_y', 'Bari_y', 'Bay_y', 'Galgaduud_y', 'Gedo_y', 'Hiraan_y', 'Lower_Juba_y', 'Lower_Shabelle_y', 'Middle_Juba_y', 'Middle_Shabelle_y', 'Mudug_y', 'Nugaal_y', 'Sanaag_y', 'Sool_y', 'Togdheer_y', 'Woqooyi_Galbeed_y']
x = numpy.array([df_merged[name] for name in name_x]).T

In [135]:
fe_lag = spreg.Panel_FE_Lag(y, x, w, name_y=name_y, 
                            name_x=name_x, name_ds="df_merged")

In [136]:
print(fe_lag.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: MAXIMUM LIKELIHOOD SPATIAL LAG PANEL - FIXED EFFECTS
-----------------------------------------------------------------------
Data set            :   df_merged
Weights matrix      :     unknown
Dependent Variable  :   conflicts                Number of Observations:        1512
Mean dependent var  :      0.0000                Number of Variables   :          22
S.D. dependent var  :      7.3422                Degrees of Freedom    :        1490
Pseudo R-squared    :      0.0787
Spatial Pseudo R-squared:  0.0440
Sigma-square ML     :      49.712                Log likelihood        :  -10640.757
S.E of regression   :       7.051                Akaike info criterion :   21325.514
                                                 Schwarz criterion     :   21442.580

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
-----------------

In [126]:
y_var_name = 'conflicts'
X_var_names = ['TA_lag1','PA_lag5','DL_lag5']

In [130]:
# Regression expression for OLS with dummies

unit_names = df_dummies['admin1'].unique().tolist()
unit_names.sort()
unit_names_t = df_dummies['month_name'].unique().tolist()

lsdv_expr = y_var_name + ' ~ '
i = 0
for X_var_name in X_var_names:
    if i > 0:
        lsdv_expr = lsdv_expr + ' + ' + X_var_name
    else:
        lsdv_expr = lsdv_expr + X_var_name
    i = i + 1
#for dummy_name in unit_names[:-1]:
  # lsdv_expr = lsdv_expr + ' + ' + dummy_name + '_x'
for dummy_name_t in unit_names_t[:-1]:
    lsdv_expr = lsdv_expr + ' + ' + dummy_name_t
for dummy_name_mr in name_x[:-1]:
    lsdv_expr = lsdv_expr + ' + ' + dummy_name_mr
#lsdv_expr = lsdv_expr + ' - ' + '1'
print('Regression expression for OLS with dummies=' + lsdv_expr)

Regression expression for OLS with dummies=conflicts ~ TA_lag1 + PA_lag5 + DL_lag5 + January + February + March + April + May + June + July + August + September + October + November + Awdal_y + Bakool_y + Banadir_y + Bari_y + Bay_y + Galgaduud_y + Gedo_y + Hiraan_y + Lower_Juba_y + Lower_Shabelle_y + Middle_Juba_y + Middle_Shabelle_y + Mudug_y + Nugaal_y + Sanaag_y + Sool_y + Togdheer_y


In [131]:
import statsmodels.api as sm
import statsmodels.formula.api as smf

In [132]:
lsdv_model = smf.ols(formula=lsdv_expr, data=df_merged)
lsdv_model_results = lsdv_model.fit()
print(lsdv_model_results.summary())

                            OLS Regression Results                            
Dep. Variable:              conflicts   R-squared:                       0.274
Model:                            OLS   Adj. R-squared:                  0.259
Method:                 Least Squares   F-statistic:                     18.03
Date:                Wed, 20 Sep 2023   Prob (F-statistic):           1.08e-81
Time:                        12:54:33   Log-Likelihood:                -6207.0
No. Observations:                1512   AIC:                         1.248e+04
Df Residuals:                    1480   BIC:                         1.265e+04
Df Model:                          31                                         
Covariance Type:            nonrobust                                         
                        coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------
Intercept            13.7230      1.43

In [133]:
lsdv_model_results.summary()

0,1,2,3
Dep. Variable:,conflicts,R-squared:,0.274
Model:,OLS,Adj. R-squared:,0.259
Method:,Least Squares,F-statistic:,18.03
Date:,"Wed, 20 Sep 2023",Prob (F-statistic):,1.08e-81
Time:,12:54:46,Log-Likelihood:,-6207.0
No. Observations:,1512,AIC:,12480.0
Df Residuals:,1480,BIC:,12650.0
Df Model:,31,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,13.7230,1.434,9.567,0.000,10.909,16.537
TA_lag1,-0.5052,0.666,-0.759,0.448,-1.811,0.801
PA_lag5,0.2961,1.334,0.222,0.824,-2.321,2.914
DL_lag5,0.0256,0.010,2.627,0.009,0.006,0.045
January,-1.1101,1.882,-0.590,0.555,-4.803,2.582
February,-0.9442,1.880,-0.502,0.616,-4.633,2.744
March,-2.4299,1.879,-1.293,0.196,-6.116,1.257
April,-2.5004,1.885,-1.326,0.185,-6.198,1.198
May,-0.6967,1.876,-0.371,0.710,-4.377,2.984

0,1,2,3
Omnibus:,590.54,Durbin-Watson:,1.989
Prob(Omnibus):,0.0,Jarque-Bera (JB):,2713.485
Skew:,1.818,Prob(JB):,0.0
Kurtosis:,8.464,Cond. No.,122000.0
