This is just supposed to demonstrate the functionality of the handler written for spreg. 

Essentially, it provides a unified interface to apply the correct function to data, allowing it to serve as the single point of access for a patsy/pandas interface handler. 

First, let's set up.

In [1]:
import handler as h
import pysal as ps
import geopandas as gpd

In [4]:
df = gpd.read_file(ps.examples.get_path('columbus.json'))
dbf = ps.open(ps.examples.get_path('columbus.dbf'))
y = dbf.by_col_array(['HOVAL'])
X = dbf.by_col_array(['INC', 'CRIME'])
W = ps.open(ps.examples.get_path('columbus.gal')).read()

In [10]:
original = ps.spreg.OLS(y,X,W, name_x=['INC', 'CRIME'], name_y='HOVAL')

In [12]:
handled = h.Model(y,X,W,name_x=['INC', 'CRIME'], name_y='HOVAL')

In [13]:
print(original.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :       HOVAL                Number of Observations:          49
Mean dependent var  :     38.4362                Number of Variables   :           3
S.D. dependent var  :     18.4661                Degrees of Freedom    :          46
R-squared           :      0.3495
Adjusted R-squared  :      0.3212
Sum squared residual:   10647.015                F-statistic           :     12.3582
Sigma-square        :     231.457                Prob(F-statistic)     :   5.064e-05
S.E. of regression  :      15.214                Log likelihood        :    -201.368
Sigma-square ML     :     217.286                Akaike info criterion :     408.735
S.E of regression ML:     14.7406                Schwarz criterion     :     414.411

-----------------------------------------------------------------------------

In [14]:
print(handled.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :       HOVAL                Number of Observations:          49
Mean dependent var  :     38.4362                Number of Variables   :           3
S.D. dependent var  :     18.4661                Degrees of Freedom    :          46
R-squared           :      0.3495
Adjusted R-squared  :      0.3212
Sum squared residual:   10647.015                F-statistic           :     12.3582
Sigma-square        :     231.457                Prob(F-statistic)     :   5.064e-05
S.E. of regression  :      15.214                Log likelihood        :    -201.368
Sigma-square ML     :     217.286                Akaike info criterion :     408.735
S.E of regression ML:     14.7406                Schwarz criterion     :     414.411

-----------------------------------------------------------------------------

So, there's really not a difference here. I'm trying to swap in testing for the handlers. In fact, the "real" model sits under `handled._called`, so, at worst, we can just reference aspects of `handled` down to `_called`. I do this by iterating through `dir(handled_called)`, but there's probably a more elegant way to do that. 

In addition, all the stuff the actual "model" class uses to interpret the arguments that are passed down to the underlying estimators is parsed *around* then model. So, it dispatches the arguments to the specified model type without knowing any special information about the function call. 

This means we can do some pretty cool things, while keeping the actual wrapper at ~30 LoC

In [15]:
ML = ps.spreg.ML_Lag(y,X,W)



In [16]:
handled_ML = h.Model(y,X,W,mtype='ML_Lag')

In [17]:
print(ML.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: MAXIMUM LIKELIHOOD SPATIAL LAG (METHOD = FULL)
-----------------------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :     dep_var                Number of Observations:          49
Mean dependent var  :     38.4362                Number of Variables   :           4
S.D. dependent var  :     18.4661                Degrees of Freedom    :          45
Pseudo R-squared    :      0.3639
Spatial Pseudo R-squared:  0.3384
Sigma-square ML     :     212.490                Log likelihood        :    -200.903
S.E of regression   :      14.577                Akaike info criterion :     409.807
                                                 Schwarz criterion     :     417.374

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
-----------------------------

In [18]:
print(handled_ML.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: MAXIMUM LIKELIHOOD SPATIAL LAG (METHOD = FULL)
-----------------------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :     dep_var                Number of Observations:          49
Mean dependent var  :     38.4362                Number of Variables   :           4
S.D. dependent var  :     18.4661                Degrees of Freedom    :          45
Pseudo R-squared    :      0.3639
Spatial Pseudo R-squared:  0.3384
Sigma-square ML     :     212.490                Log likelihood        :    -200.903
S.E of regression   :      14.577                Akaike info criterion :     409.807
                                                 Schwarz criterion     :     417.374

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
-----------------------------

### Intercepting formulas

So, this is pretty neat, but gives us nothing above using *one* function to dispatch models. That's cool and R-like, but it's not necessarly better. Where it does add functionality is in its ability to intercept model formulas.

In [19]:
handled_eq = h.Model("HOVAL ~ INC + CRIME", data=df)

In [21]:
print(handled_eq.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set            :     unknown
Dependent Variable  :     dep_var                Number of Observations:          49
Mean dependent var  :     38.4362                Number of Variables   :           3
S.D. dependent var  :     18.4661                Degrees of Freedom    :          46
R-squared           :      0.3495
Adjusted R-squared  :      0.3212
Sum squared residual:   10647.015                F-statistic           :     12.3582
Sigma-square        :     231.457                Prob(F-statistic)     :   5.064e-05
S.E. of regression  :      15.214                Log likelihood        :    -201.368
Sigma-square ML     :     217.286                Akaike info criterion :     408.735
S.E of regression ML:     14.7406                Schwarz criterion     :     414.411

------------------------------------------------------------------------------------
            Variable     C

In [22]:
print(handled.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :       HOVAL                Number of Observations:          49
Mean dependent var  :     38.4362                Number of Variables   :           3
S.D. dependent var  :     18.4661                Degrees of Freedom    :          46
R-squared           :      0.3495
Adjusted R-squared  :      0.3212
Sum squared residual:   10647.015                F-statistic           :     12.3582
Sigma-square        :     231.457                Prob(F-statistic)     :   5.064e-05
S.E. of regression  :      15.214                Log likelihood        :    -201.368
Sigma-square ML     :     217.286                Akaike info criterion :     408.735
S.E of regression ML:     14.7406                Schwarz criterion     :     414.411

-----------------------------------------------------------------------------

That means `HOVAL`, `CRIME`, and `INC` all get drawn out of the dataframe using patsy and pushed into arrays. This works for any class, since we're just turning the equations into their consituent arrays. 

Where there is a possible bikeshedding point is over the syntax for TSLS-type models. Right now, I have is specified in this way:

In [23]:
y = dbf.by_col_array(['CRIME'])
X = dbf.by_col_array(['INC'])
yend = dbf.by_col_array(['HOVAL'])
q = dbf.by_col_array(['DISCBD'])

In [25]:
tsls = ps.spreg.TSLS(y,X,yend,q,W)

In [26]:
handledtsls = h.Model(y,X,yend,q,W,mtype='TSLS')

In [30]:
handledtsls_eq = h.Model("CRIME ~ INC || HOVAL ~ DISCBD", W, data=df, mtype='TSLS')

In [33]:
print(tsls.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: TWO STAGE LEAST SQUARES
------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :     dep_var                Number of Observations:          49
Mean dependent var  :     35.1288                Number of Variables   :           3
S.D. dependent var  :     16.7321                Degrees of Freedom    :          46
Pseudo R-squared    :      0.2794

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      88.4657958      15.1346096       5.8452645       0.0000000
        endogenous_1      -1.5821659       0.7931892      -1.9946891       0.0460768
               var_1       0.5200379       1.4146781       0.3676016       0.7131703
------------------------

In [34]:
print(handledtsls.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: TWO STAGE LEAST SQUARES
------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :     dep_var                Number of Observations:          49
Mean dependent var  :     35.1288                Number of Variables   :           3
S.D. dependent var  :     16.7321                Degrees of Freedom    :          46
Pseudo R-squared    :      0.2794

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      88.4657958      15.1346096       5.8452645       0.0000000
        endogenous_1      -1.5821659       0.7931892      -1.9946891       0.0460768
               var_1       0.5200379       1.4146781       0.3676016       0.7131703
------------------------

In [35]:
print(handledtsls_eq.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: TWO STAGE LEAST SQUARES
------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :     dep_var                Number of Observations:          49
Mean dependent var  :     35.1288                Number of Variables   :           3
S.D. dependent var  :     16.7321                Degrees of Freedom    :          46
Pseudo R-squared    :      0.2794

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      88.4657958      15.1346096       5.8452645       0.0000000
        endogenous_1      -1.5821659       0.7931892      -1.9946891       0.0460768
               var_1       0.5200379       1.4146781       0.3676016       0.7131703
------------------------

Unfortunately, this leads us down an interesting road, as far as semantics are concerned: 

In [36]:
handledtsls_ignored = h.Model("CRIME ~ INC || HOVAL ~ DISCBD",'getsignored', W, 'getsignoredtoo', data=df, mtype='GM_Lag')

In [37]:
print(handledtsls_ignored.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: SPATIAL TWO STAGE LEAST SQUARES
--------------------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :     dep_var                Number of Observations:          49
Mean dependent var  :     35.1288                Number of Variables   :           4
S.D. dependent var  :     16.7321                Degrees of Freedom    :          45
Pseudo R-squared    :      0.2377
Spatial Pseudo R-squared:  0.2477

------------------------------------------------------------------------------------
            Variable     Coefficient       Std.Error     z-Statistic     Probability
------------------------------------------------------------------------------------
            CONSTANT      96.5182979      52.5305917       1.8373731       0.0661548
           W_dep_var      -0.0148182       0.0857203      -0.1728667       0.8627562
        endogenous_1      -1.8097627       1.7652028      -

Those other positional arguments get ignored in the equation framework. So, you couldn't provide some variables in equations and some in vectors... you'd have to either have equations or vectors.

This is because I assume the equation is a complete specification of the model, and extract only the "W" from the rest of the positional arguments. All the rest are assumed to be needed by the wrapper instead. 

Right now, I'm working on getting this swapped into the testing framework. Since I flatten the `dir` of the pysal call into the handler, it should function exactly the same as a pysal model. In fact, you may be able to rewrite a magic method to fool python into thinking `Model.__class__ == Model._called.__class__`, but I don't think that would be smart. For a model, you need to check the underlying model type, rather than the type of the wrapper. 