-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: include Graph.describe() to describe neighbourhood values #717
Conversation
Just to add some context to this. As we are refactoring momepy, we realised that we rely very often on this internal function, which is fairly generic and shall be tied directly to Graph. The idea behind the |
I think, for generality, this should be called a truncated or trimmed reduction/lag? This is very useful generally... @weikang9009 and I have been working on related concepts recently, so it'd be very nice to have something core here! |
Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #717 +/- ##
======================================
Coverage 85.0% 85.1%
======================================
Files 141 145 +4
Lines 15203 15483 +280
======================================
+ Hits 12924 13169 +245
- Misses 2279 2314 +35
|
|
||
Parameters | ||
---------- | ||
grouper : pandas.GroupBy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be pandas.Grouper
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the pandas.Grouper is another type of object, that is used for filtering columns , i used the name grouper since its used in other functions and the type is groupby since, pandas groupy returns a groupby object
libpysal/graph/base.py
Outdated
|
||
Weight values do not affect the calculations, only adjacency does. | ||
|
||
Returns nan for all isolates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe 'nan' is OK here, but also maybe NaN
or numpy.nan
(or something else?)
Probably not a big deal either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed to numpy.nan
Co-authored-by: James Gaboardi <jgaboardi@gmail.com>
libpysal/graph/base.py
Outdated
if not isinstance(y, pd.Series): | ||
y = pd.Series(y) | ||
y = pd.Series(y, index=self.unique_ids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this, we may want to check that the y.index matches self.unique_ids in case of a custom Series is passed. I suppose that non-matching index may break this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a check
i think i would call this |
But this is not describing cardinalities, no? Where cardinality is a number of elements in a set. It is describing distribution of values within a neighbourhood. |
oh i see. It was this note on line 2014 that tripped me up: 'Weight values do not affect the calculations, only adjacency does.' |
This PR adds a method to the graph api which takes an array of values and calculates descriptive statistics within each neighborhood.
Optionally, some neighbors can be filtered out based on the percentiles of the passed values.
The supported stats are - "count", "mean", "median", "std", "min", "max", "sum", "nunique" and "mode".
The method similar to .apply, but all values are calculated in one grouping operation and all functions are jitted.