-
Notifications
You must be signed in to change notification settings - Fork 8
PEP 14: PySAL Integrations
When PySAL gets feedback from users, there tends to be two main points.
- This isn't well integrated with the way I work with Python and the packages I use (beyond
numpy
). - The documentation is comprehensive, but difficult to "discover." I've worked on a few related problems and had no clue that X or Y solution was implemented already in PySAL.
This is often driven by users who follow our example code. In that, weights are built directly from a shapefile, columns are pulled from the database in flat numpy arrays, and then everything is passed to computational classes. In the past, we've used this for example code since it only depends on the internally-consistent ecosystem of functions, data models, and classes we've built. And, we can do this, since we've built tools to get from IO to analysis and back. Presenting this angle exclusively, however, makes it appear like you have to use our stuff up and down the stack, rather than showing the library as a highly-integrated package.
Integrations are methods, functions, classes, or examples that show how a user might get into analysis from not-PySAL or out of analysis to not-PySAL.
Thus, we can directly address the first & try to resolve the second by showing off our integrations. A few ideas about how to both show our current integrations and make new ones consistent are discussed below.
- For new example code, try to prefer a pandas solution:
- IO:
pdio
overopen('filepath.shp').by_col_array()
- masking data before regression/ESDA computations (
pd.dropna()
,pd.replace(np.nan, ...)
) - Construction of fixed-effects/regime weights (
pd.get_dummies
) - Instead of making a weights object and then post-processing it using
Wsets
, make the weights on the fly from a dataframe munged in Python. Subsetting to a list of IDs becomes constructing directly from that subsetweights.Rook.from_dataframe(df.query('ID in @filterlist'))
. Intersection becomes constructing directly from an intersected dataset:weights.KNN.from_dataframe(pd.merge(df1, df2, how='inner'))
- IO:
- Pushing for visibility/code contributions & integration where possible
- Continuing to improve and extend classmethods will help extend our API, like
weights.Rook.from_dataframe
. This means that, in quite a few cases, we can do new things without breaking the API directly. Classes can gain alternative "paths" into their__init__
function with.from_*
:
>>> LMTests.from_statsmodels(my_statsmodels_regression.fit())
>>> GM_Het_Combo.from_formula('patsy ~ regressors')
>>> LISA_Markov.from_timeseries(time_indexed_pandas_series)
>>> W.from_networkx(my_special_weights)
Classes can continue to gain alternative "export" options with .to_*
:
>>> ML_Lag(Y,X,W).to_statsmodels()#returns ResultsWrapper
>>> OLS(Y,X,W).to_file('./my_regression.txt') #writes summary out to file
>>> Moran_Local(X,W).to_frame() #returns a dataframe of I, p-value
Or add commonly-used visualizations directly to the classes using soft dependencies.
>>> Moran_Local(X,W).plot() #passes to a preconfig'd geoplot
>>> W.plot() #depends on matplotlib and bails if unavailable
This will let us extend the API without introducing breaking changes.