# What is the handler?

The `spreg-handler` provides a unified interface to apply any specifed regression function in pysal to data, like a call to `lm` in `R`:

In [9]:
import handler as h
import pysal as ps

In [10]:
dbf = ps.pdio.read_files(ps.examples.get_path('columbus.dbf'))
y = dbf[['HOVAL']].values
X = dbf[['INC', 'CRIME']].values
W = ps.open(ps.examples.get_path('columbus.gal')).read()

In [11]:
original = ps.spreg.OLS(y,X,W, name_x=['INC', 'CRIME'], name_y='HOVAL')

The handler's default model is `OLS`. So, for a model of type `OLS`, no extra argument needs to be passed. However, for the sake of clarity, I'll pass the model specification argument, `mtype`.

In [12]:
formulaic = h.Model('HOVAL ~ INC + CRIME', data=dbf)

In [13]:
handled = h.Model(y,
                  X,
                  W,
                  name_x=['INC', 'CRIME'], 
                  name_y='HOVAL', 
                  mtype='OLS')

In [15]:
print(original.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set            :     unknown
Weights matrix      :     unknown
Dependent Variable  :       HOVAL                Number of Observations:          49
Mean dependent var  :     38.4362                Number of Variables   :           3
S.D. dependent var  :     18.4661                Degrees of Freedom    :          46
R-squared           :      0.3495
Adjusted R-squared  :      0.3212
Sum squared residual:   10647.015                F-statistic           :     12.3582
Sigma-square        :     231.457                Prob(F-statistic)     :   5.064e-05
S.E. of regression  :      15.214                Log likelihood        :    -201.368
Sigma-square ML     :     217.286                Akaike info criterion :     408.735
S.E of regression ML:     14.7406                Schwarz criterion     :     414.411

-----------------------------------------------------------------------------

In [7]:
print(handled.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set            :     unknown
Weights matrix      :        None
Dependent Variable  :       HOVAL                Number of Observations:          49
Mean dependent var  :     38.4362                Number of Variables   :           3
S.D. dependent var  :     18.4661                Degrees of Freedom    :          46
R-squared           :      0.3495
Adjusted R-squared  :      0.3212
Sum squared residual:   10647.015                F-statistic           :     12.3582
Sigma-square        :     231.457                Prob(F-statistic)     :   5.064e-05
S.E. of regression  :      15.214                Log likelihood        :    -201.368
Sigma-square ML     :     217.286                Akaike info criterion :     408.735
S.E of regression ML:     14.7406                Schwarz criterion     :     414.411

-----------------------------------------------------------------------------

In [8]:
print(formulaic.summary)

REGRESSION
----------
SUMMARY OF OUTPUT: ORDINARY LEAST SQUARES
-----------------------------------------
Data set            :     unknown
Weights matrix      :        None
Dependent Variable  :      HOVAL                 Number of Observations:          49
Mean dependent var  :     38.4362                Number of Variables   :           3
S.D. dependent var  :     18.4661                Degrees of Freedom    :          46
R-squared           :      0.3495
Adjusted R-squared  :      0.3212
Sum squared residual:   10647.015                F-statistic           :     12.3582
Sigma-square        :     231.457                Prob(F-statistic)     :   5.064e-05
S.E. of regression  :      15.214                Log likelihood        :    -201.368
Sigma-square ML     :     217.286                Akaike info criterion :     408.735
S.E of regression ML:     14.7406                Schwarz criterion     :     414.411

-----------------------------------------------------------------------------

## How does it work?

The long and short of it is that `Model` classes pass estimation to the function specified in `mtype`, and then contain the results in a reasonable way. 

In fact, the "real" `PySAL` model class sits under `handled._called`, so, at worst, we can just reference aspects of `handled` down to `_called`. I currently do this by iterating through `dir(handled._called)` and using `eval` to flatten all of `_called`'s attributes into `handled` at initialization. 

But, eventually, I am think about adding plotting, visual diagnostics, out of sample prediction, or other stuff to this wrapper. So, I will probably not duplicate the access points for intermediate computations, like `X'X`, `e`, or `TSLS`'s arcane-sounding `zthhthi`.

I'd like to clean up this `Model` interface so that only X, Y, residuals, and some statistics are directly exposed. 

Keep in mind, since [assignment **never** copies data](https://youtu.be/_AEJHKGk9ns?t=296), and the original model sits in `handled._called`, this isn't actually a *loss* of information, just a *hiding*, which is a standard OOP principle. 

### Isn't this wastefully storing multiple copies of data in memory?

No. Let's see where everything lives using the python built-in `id` function. 

Recall that the original model is stuffed into `Model._called`. So, if anything in there has a different memory address from what's being displayed by `Model`, the data is duplicated:

In [None]:
for atname in dir(handled._called):
    attr = eval("handled._called.{}".format(atname))
    composed_id = hex(id(attr))
    outattr = eval("handled.{}".format(atname))
    outer_id = hex(id(outattr))
    if composed_id != outer_id:
        print(atname + "is in two different addresses.")
        print("\t Outer is at " + outer_id +"\n\t Inner is at " + composed_id)

Only the double underscore functions exposed by `Model` are different from `Model._called`.

If we wanted to access `Model._called.__init__`, it's still there. This means we could implement some "refit" method, `Model.refit(y=Model.y, X=Model.X, ...)` which could use `Model._called.__init__` to revise estimates in `Model` in place or returning a new model.

I don't know why we might want to do this, but it's kinda neat :)

# What does this buy us?

Regardless, all the stuff the wrapping `Model` class is parsed *around* the underlying PySAL classes. That is, the wrapper would only inject commands into the API. At minimum, it *is exactly* the underlying class. 

This is because it dispatches the arguments to the specified model type without knowing any special information about the function call.  

This means we can do some pretty cool things, while keeping the actual wrapper at ~40 LoC!

In [None]:
ML = ps.spreg.ML_Lag(y,X,W)

In [None]:
handled_ML = h.Model(y,X,W,mtype='ML_Lag')

In [None]:
print(ML.summary)

In [None]:
print(handled_ML.summary)

### Intercepting formulas

So, this is pretty neat, but gives us nothing above using *one* function to dispatch models. That's cool and R-like, but it's not necessarly better. Where it does add functionality is in its ability to intercept model formulas.

In [None]:
df = ps.pdio.read_files(ps.examples.get_path('columbus.dbf'))
handled_eq = h.Model("HOVAL ~ INC + CRIME", data=df)

In [None]:
print(handled_eq.summary)

That means `HOVAL`, `CRIME`, and `INC` all get drawn out of the dataframe using patsy and pushed into arrays. This works for any class, since we're just turning the equations into their consituent arrays. 

Where there is a possible bikeshedding point is over the syntax for TSLS-type models. Right now, I have it specified with (what I think is) a clear synatx reflecting the simultanous equations approach: 

`y ~ x1 + x2 || yend ~ xend1 + xend2`

implies an equation where your exogenous relationship is `y ~ x1 + x2` and your endogenous relationship is `yend ~ xend1 + xend2`. 

For any simultaneous equation-type model, I would suggest using double pipe as the separator. Under the hood, I'm just using `string.split('||')`, since patsy doesn't use the double pipe. 

In [None]:
y = dbf.by_col_array(['CRIME'])
X = dbf.by_col_array(['INC'])
yend = dbf.by_col_array(['HOVAL'])
q = dbf.by_col_array(['DISCBD'])

In [None]:
tsls = ps.spreg.TSLS(y,X,yend,q,W)

In [None]:
handledtsls = h.Model(y,X,yend,q,W,mtype='TSLS')

In [None]:
handledtsls_eq = h.Model("CRIME ~ INC || HOVAL ~ DISCBD", w=W, data=df, mtype='TSLS')

In [None]:
print(tsls.summary)

In [None]:
print(handledtsls.summary)

In [None]:
print(handledtsls_eq.summary)

This would also enable adding plotting capabilities to spatial regression models, like the standard four-plot output from plotting an `lm` in `R`, but wouldn't have to be hacked into each and every model class. 