[ENH]: option to return series from lag_spatial #512

knaaptime · 2023-01-20T01:22:55Z

roundtripping a W through an adjacency list will create a valid but re-ordered matrix. That is, all the relationships are preserved with the proper weights, but the ordering of the keys may be different from the index of the dataframe it was created from. In general thats not a problem because all of the internal logic of the W still works fine.

But creating a spatial lag from a combo like this (where W and dataframe are paired but in different orders) will generate a series of values that no longer match the df. A simple fix (and a useful enhancement more broadly) would be to add a kwarg to lag_spatial that optionally returns a pandas series indexed on W.neighbors.keys. Should make things a little easier and more explicit without breaking backwards compat

The text was updated successfully, but these errors were encountered:

martinfleis · 2023-01-20T08:20:52Z

mmmm, I understand the use case. but I'm generally not a fan of a single function returning different objects based on args. It drives me crazy in cuML which returns an object based on an input (so if a cudf object is given it returns cudf, if cupy it returns cupy array...). It sounds like a nice idea but when creating something on top of that, it is a massive pain that you don't know which object comes back and you end up with endless if/else mess.

Controlling this behaviour with a keyword is slightly more predictable but still feels weird. Can we just ensure that the array is properly sorted? Or opt-in for always pandas but that deviates from most of PySAL.

knaaptime · 2023-01-20T15:29:57Z

id be curious to hear what the other devs think. Currently the array is properly sorted (according to the ordering of the W), but that can be very confusing if users dont realize their weights and dfs are not aligned. We cant always give back a series because it will break backwards compatibility

(so if a cudf object is given it returns cudf, if cupy it returns cupy array...). It sounds like a nice idea but when creating something on top of that, it is a massive pain that you don't know which object comes back and you end up with endless if/else mess.

in general im with you, but this feels like a very different case where the return is explicit--you need to ask for a series to be provided a series (and the default is false). So building on top feels pretty stable to me? We use this pattern throughout the library (e.g. alpha shapes have at least three potential returns

knaaptime · 2023-01-20T15:52:12Z

to be clear, i think it would be preferable to always return a series, but i know thats a non-starter for legacy code

jGaboardi · 2023-01-21T16:31:38Z

As a tangential discussion on alpha_shapes see geopandas/geopandas#1739

sjsrey · 2023-01-23T03:09:55Z

My sense is we have, historically, decided to assume that in lag_spatial(w, y), w and y are properly aligned. That is left for the user to determine, not lag_spatial.

That said, we haven't really provided much in the way of help for the user to be able to check that assumption or to reorder y if they need to do so.

With the new API, the keys of the w.neighbors are now ordered, so if y is a series, we could add some internals to lag_spatial to check for alignment and raise a warning if w and y were not aligned, and/or a kw arg to align y prior to calculation of the lag.

I'm generally not in favor of a function returning different return types based on different kw settings, so even with the new suggestions above, I would keep the return from lag_spatial as an array.

knaaptime closed this as completed Jan 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENH]: option to return series from lag_spatial #512

[ENH]: option to return series from lag_spatial #512

knaaptime commented Jan 20, 2023

martinfleis commented Jan 20, 2023

knaaptime commented Jan 20, 2023

knaaptime commented Jan 20, 2023

jGaboardi commented Jan 21, 2023

sjsrey commented Jan 23, 2023

[ENH]: option to return series from lag_spatial #512

[ENH]: option to return series from lag_spatial #512

Comments

knaaptime commented Jan 20, 2023

martinfleis commented Jan 20, 2023

knaaptime commented Jan 20, 2023

knaaptime commented Jan 20, 2023

jGaboardi commented Jan 21, 2023

sjsrey commented Jan 23, 2023