Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH]: option to return series from lag_spatial #512

Closed
knaaptime opened this issue Jan 20, 2023 · 5 comments
Closed

[ENH]: option to return series from lag_spatial #512

knaaptime opened this issue Jan 20, 2023 · 5 comments

Comments

@knaaptime
Copy link
Member

roundtripping a W through an adjacency list will create a valid but re-ordered matrix. That is, all the relationships are preserved with the proper weights, but the ordering of the keys may be different from the index of the dataframe it was created from. In general thats not a problem because all of the internal logic of the W still works fine.

But creating a spatial lag from a combo like this (where W and dataframe are paired but in different orders) will generate a series of values that no longer match the df. A simple fix (and a useful enhancement more broadly) would be to add a kwarg to lag_spatial that optionally returns a pandas series indexed on W.neighbors.keys. Should make things a little easier and more explicit without breaking backwards compat

@martinfleis
Copy link
Member

mmmm, I understand the use case. but I'm generally not a fan of a single function returning different objects based on args. It drives me crazy in cuML which returns an object based on an input (so if a cudf object is given it returns cudf, if cupy it returns cupy array...). It sounds like a nice idea but when creating something on top of that, it is a massive pain that you don't know which object comes back and you end up with endless if/else mess.

Controlling this behaviour with a keyword is slightly more predictable but still feels weird. Can we just ensure that the array is properly sorted? Or opt-in for always pandas but that deviates from most of PySAL.

@knaaptime
Copy link
Member Author

id be curious to hear what the other devs think. Currently the array is properly sorted (according to the ordering of the W), but that can be very confusing if users dont realize their weights and dfs are not aligned. We cant always give back a series because it will break backwards compatibility

(so if a cudf object is given it returns cudf, if cupy it returns cupy array...). It sounds like a nice idea but when creating something on top of that, it is a massive pain that you don't know which object comes back and you end up with endless if/else mess.

in general im with you, but this feels like a very different case where the return is explicit--you need to ask for a series to be provided a series (and the default is false). So building on top feels pretty stable to me? We use this pattern throughout the library (e.g. alpha shapes have at least three potential returns

@knaaptime
Copy link
Member Author

to be clear, i think it would be preferable to always return a series, but i know thats a non-starter for legacy code

@jGaboardi
Copy link
Member

As a tangential discussion on alpha_shapes see geopandas/geopandas#1739

@sjsrey
Copy link
Member

sjsrey commented Jan 23, 2023

My sense is we have, historically, decided to assume that in lag_spatial(w, y), w and y are properly aligned. That is left for the user to determine, not lag_spatial.

That said, we haven't really provided much in the way of help for the user to be able to check that assumption or to reorder y if they need to do so.

With the new API, the keys of the w.neighbors are now ordered, so if y is a series, we could add some internals to lag_spatial to check for alignment and raise a warning if w and y were not aligned, and/or a kw arg to align y prior to calculation of the lag.

I'm generally not in favor of a function returning different return types based on different kw settings, so even with the new suggestions above, I would keep the return from lag_spatial as an array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants