Add Wilkinson formula interface and scikit-learn style estimators #103

tdhoffman · 2022-09-19T18:20:55Z

This pull request supercedes #102. It adds the following features:

spatial and nonspatial Wilkinson formulas (spreg/formula.py)
a submodule for spreg called spreg.sklearn that includes scikit-learn interfaces to Error, Lag, DurbinError, and DurbinLag models, as well as Anselin LM tests in the scikit-learn metrics style. This submodule has its own formula interface independent of spreg.from_formula that properly dispatches to the sklearn models.
two notebooks that demonstrate the usage and value of these features (spreg/notebooks/formula_example.ipynb and spreg/notebooks/sklearn_example.ipynb)
unit tests for all added code (spreg/tests for formulas, spreg/sklearn/tests for the
documentation and doctests for all added code

Pursuant to this comment on #101, everything in this PR is fully backwards compatible. These are add-ons to the package which sit on top of existing code and provide alternative ways for users to interface with the core functionality of spreg. To ensure the changes are not breaking, the spreg.sklearn submodule must be directly loaded by users (e.g. via import spreg.sklearn) as it will not be automatically imported otherwise.

Currently, the formula interface function dispatches to all of the spatial regression models. While it does not support regime regression or seemingly unrelated regression, it provides a solid base of code to which these features can be added.

More information about the design process for these features can be found in my GSoC 2022 Progress Journal, and of course I'm more than happy to answer any questions and make any changes!

TaylorOshan · 2023-10-16T23:59:39Z

@jGaboardi just curious, was there any off-line discussion about this before closing? IIRC, this was a pretty good effort towards adding the formula capabilities that we have talked about for a while.

lanselin · 2023-10-17T00:25:25Z

I'm not a fan of the Wilkinson formula. For any but the simplest models it creates all kinds of issues, e.g., how to introduce WX variables without having to explicitly list them, etc. For example, compare to the in my opinion overly complicated interface in spatialreg in R to accommodate all the situations related to spatial Durbin, endogenous variables, instruments, etc. Btw, instruments are never part of an actual formula but added on by means of these (again, in my opinion) very awkward | | options. While not ideal, I very much like the most recent setup (1.4) which allows for specification of a range of models such as SLX, spatial Durbin, SLX error and even GNS, using GMM_Error and slx_lags and add_wy arguments. The WX and Wy variables never need to be listed explicitly but are computed (and variable names created) under the hood. The fewer variable names need to be passed, the fewer opportunities for typos. My golden rule in life :-)

I think our efforts are better spent at enhancing functionality in supporting different models and estimation methods. I was never a fan of this in the first place, not sure why it even became a GOSC project ...

jGaboardi · 2023-10-17T00:26:10Z

@TaylorOshan I did not close this... Perhaps I screwed something up with the pysal:master -> pysal:main switch. All open PRs were supposed to be automatically updated, but it looks like something funky happened...

jGaboardi · 2023-10-17T00:32:55Z

@TaylorOshan I very much apologize for the screw up. I have never seen this happen with master->main switch. Good news is that @tdhoffman's branch still has all the work. Shall we create a new PR from that?

lanselin · 2023-10-17T01:17:40Z

I say no. Neither Pedro nor I like this approach. As long as I am bdfl for spreg, I nix it. The sklearn interface is built for prediction, spreg is mostly about inference, so it actually turns out to be very awkward.

lanselin · 2023-10-17T01:26:49Z

Just to make clear, I would like it to be closed. We are done with that.

pedrovma · 2023-10-17T10:31:20Z

The many additional combinations of specifications introduced by v1.4 make this outdated. As @lanselin pointed out, these formulas can't address them all easily, especially when combined with regimes and/or seemingly unrelated regressions. I believe the suggestion made by @martinfleis in #127 is a more interesting approach to simplify the function calls given the changes introduced by v1.4. Regardless, adding other functionalities we have already mapped, such as additional tests and the computation of direct/indirect effects, seems to me to be more of a priority now than enhancing the arguments' structure.

TaylorOshan · 2023-10-17T15:50:40Z

Understood @lanselin. Thanks for the additional feedback @pedrovma. My intention was not to advocate for this in spreg. There were several conversations during the GSOC period about formulas more generally and in the context of other modules and I was curious if this would be worth preserving as a proof-of-concept in case it is useful elsewhere in the future. Thanks to @jGaboardi for pointing out that the original branch is in tact and takes care of that.

tdhoffman added 30 commits July 20, 2022 14:12

add new API code

e81a14e

test commit

6709076

revert edited helper files

73c2fd7

reformat formulas to work with existing API

615b94a

clean up formula code and double check compatibility

305381f

polish testing code

a05e847

remove emacs backup files

893772f

edit docs

4f4a0af

update docs

8ea2a34

exclude intercept column and clean up example

363b6e2

update __init__ and add debug option to formula

5cac771

handful of small updates

1697cc3

minor fix

aa48c27

small fixes

b56c9c3

add combos and example notebook

01aba4c

default to adding X and WX in a lag

17c84a4

add doctest to formula.py

fd62497

add demo notebook, streamline variable names, polish documentation

63d0308

add unit tests

4720231

add dispatching to skedastic and endogenous error models

d8bd94a

add unit tests for skedastic errors and combo skedastic errors

d92085c

update docs and demo from old info

95288db

revert modified files

ae25f3f

finish testing code

aae3e76

begin sklearn restructure

a5c8787

create sklearn gm lag and error

876de09

gm estimation for prop_err and prop_lag

705149e

implemented ML for error but does not match yet

b0b0dd8

add ML to Lag

04b6b11

remove undo trees

32cffcb

tdhoffman and others added 17 commits August 31, 2022 13:30

edit sklearn/formula.pyy for compatibility

425642c

finish LM tests

6898910

add sklearn interface demo

4df7449

add formulas and remove sklearn automatic import

eaa63a8

finish LM tests and update api.api.rst for building docs

3f2ab95

add docs, doctests, and unittests for lm_tests

950eff7

add error docs+doctests, edit lm_tests doctest

95b7983

add unittests for spreg.sklearn.from_formula

8eb7bb8

finish docs and doctests for lag, durbin error, and durbin lag

70ac78a

add error unittests

9d5a621

add lag tests

ac06131

add durbin error tests, begin durbin lag

961c0a5

add durbin lag unit tests

f10f4ac

add intercepts to decision functnctions

2250c5e

finish sklearn_example.ipynb

8ea3e45

roll back diagnostics_panel.py

f3abe04

properly roll back diagnostics_panel.py

0c3c11f

jGaboardi mentioned this pull request Oct 28, 2022

Add support for spatial and nonspatial Wilkinson formulas #102

Closed

jGaboardi added the GSOC2022 label Oct 28, 2022

jGaboardi mentioned this pull request Oct 28, 2022

change default branch from master to main #106

Closed

jGaboardi deleted the branch pysal:master October 5, 2023 21:49

jGaboardi closed this Oct 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Wilkinson formula interface and scikit-learn style estimators #103

Add Wilkinson formula interface and scikit-learn style estimators #103

tdhoffman commented Sep 19, 2022

TaylorOshan commented Oct 16, 2023

lanselin commented Oct 17, 2023

jGaboardi commented Oct 17, 2023

jGaboardi commented Oct 17, 2023

lanselin commented Oct 17, 2023

lanselin commented Oct 17, 2023

pedrovma commented Oct 17, 2023

TaylorOshan commented Oct 17, 2023

Add Wilkinson formula interface and scikit-learn style estimators #103

Add Wilkinson formula interface and scikit-learn style estimators #103

Conversation

tdhoffman commented Sep 19, 2022

TaylorOshan commented Oct 16, 2023

lanselin commented Oct 17, 2023

jGaboardi commented Oct 17, 2023

jGaboardi commented Oct 17, 2023

lanselin commented Oct 17, 2023

lanselin commented Oct 17, 2023

pedrovma commented Oct 17, 2023

TaylorOshan commented Oct 17, 2023