ENH: include Graph.describe() to describe neighbourhood values #717

u3ks · 2024-06-04T12:42:36Z

This PR adds a method to the graph api which takes an array of values and calculates descriptive statistics within each neighborhood.
Optionally, some neighbors can be filtered out based on the percentiles of the passed values.
The supported stats are - "count", "mean", "median", "std", "min", "max", "sum", "nunique" and "mode".

The method similar to .apply, but all values are calculated in one grouping operation and all functions are jitted.

martinfleis · 2024-06-04T12:49:20Z

Just to add some context to this. As we are refactoring momepy, we realised that we rely very often on this internal function, which is fairly generic and shall be tied directly to Graph.

The idea behind the q limiting the range is coming from morphology. We often want to get some sort of a spatial average but given the high likelihood of outliers (think of a church in the middle of a neighborhood), we can't include all the values within each neighborhood.

ljwolf · 2024-06-04T12:55:29Z

I think, for generality, this should be called a truncated or trimmed reduction/lag?

This is very useful generally... @weikang9009 and I have been working on related concepts recently, so it'd be very nice to have something core here!

libpysal/graph/base.py

libpysal/graph/tests/test_base.py

martinfleis · 2024-06-04T13:03:29Z

I think, for generality, this should be called a truncated or trimmed reduction/lag?

Only if q is not None. Otherwise it is just a generic lag. I am also not sure what can be called a lag (nunique?). The describe terminology comes from pandas. It felt close enough to what we're doing here.

Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net>

codecov · 2024-06-04T15:16:26Z

Codecov Report

Attention: Patch coverage is 97.82609% with 2 lines in your changes missing coverage. Please review.

Project coverage is 85.1%. Comparing base (bcabdbc) to head (879f3f5).
Report is 18 commits behind head on main.

Additional details and impacted files

@@          Coverage Diff           @@
##            main    #717    +/-   ##
======================================
  Coverage   85.0%   85.1%            
======================================
  Files        141     145     +4     
  Lines      15203   15483   +280     
======================================
+ Hits       12924   13169   +245     
- Misses      2279    2314    +35

Files	Coverage Δ
libpysal/graph/tests/test_base.py	`100.0% <100.0%> (ø)`
libpysal/graph/_utils.py	`97.1% <97.6%> (+2.2%)`	⬆️
libpysal/graph/base.py	`96.8% <92.9%> (-1.1%)`	⬇️

... and 6 files with indirect coverage changes

libpysal/graph/_utils.py

jGaboardi · 2024-06-04T15:18:44Z

libpysal/graph/_utils.py

+
+    Parameters
+    ----------
+    grouper : pandas.GroupBy


Should this be pandas.Grouper?

I think the pandas.Grouper is another type of object, that is used for filtering columns , i used the name grouper since its used in other functions and the type is groupby since, pandas groupy returns a groupby object

libpysal/graph/_utils.py

jGaboardi · 2024-06-04T15:21:01Z

libpysal/graph/base.py

+
+        Weight values do not affect the calculations, only adjacency does.
+
+        Returns nan for all isolates.


Maybe 'nan' is OK here, but also maybe NaN or numpy.nan (or something else?)

Probably not a big deal either way.

changed to numpy.nan

libpysal/graph/base.py

Co-authored-by: James Gaboardi <jgaboardi@gmail.com>

martinfleis · 2024-06-05T12:24:50Z

libpysal/graph/base.py

        if not isinstance(y, pd.Series):
-            y = pd.Series(y)
+            y = pd.Series(y, index=self.unique_ids)


Looking at this, we may want to check that the y.index matches self.unique_ids in case of a custom Series is passed. I suppose that non-matching index may break this.

added a check

knaaptime · 2024-06-05T19:00:00Z

i think i would call this describe_cardinalities or something because "Graph.describe() to describe neighbourhood values" implies we're looking at the neighbor values

martinfleis · 2024-06-05T19:27:47Z

But this is not describing cardinalities, no? Where cardinality is a number of elements in a set. It is describing distribution of values within a neighbourhood.

knaaptime · 2024-06-05T20:41:05Z

oh i see. It was this note on line 2014 that tripped me up:

'Weight values do not affect the calculations, only adjacency does.'

u3ks added 3 commits June 4, 2024 14:22

summary statistics for neigbourhood values

aedb687

summary statistics for neigbourhood values

cb4b1e0

formatting

188f02b

martinfleis reviewed Jun 4, 2024

View reviewed changes

libpysal/graph/base.py Outdated Show resolved Hide resolved

libpysal/graph/base.py Outdated Show resolved Hide resolved

libpysal/graph/tests/test_base.py Show resolved Hide resolved

libpysal/graph/tests/test_base.py Show resolved Hide resolved

Apply suggestions from code review

045e792

Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net>

martinfleis changed the title ~~Describe neighbourhood values~~ ENH: include Graph.describe() to describe neighbourhood values Jun 4, 2024

martinfleis assigned u3ks Jun 4, 2024

martinfleis added enhancement graph labels Jun 4, 2024

u3ks added 2 commits June 4, 2024 16:42

extra tests

b634b38

testing fix

805e5c2

martinfleis approved these changes Jun 4, 2024

View reviewed changes

martinfleis requested review from sjsrey, ljwolf and jGaboardi June 4, 2024 15:11

jGaboardi reviewed Jun 4, 2024

View reviewed changes

u3ks and others added 2 commits June 4, 2024 19:24

Apply suggestions from code review

3a0ebed

Co-authored-by: James Gaboardi <jgaboardi@gmail.com>

docstring

8f3a300

u3ks requested a review from jGaboardi June 5, 2024 09:00

ndarray test

4cfa943

martinfleis reviewed Jun 5, 2024

View reviewed changes

nas equivalence and more filtration tests

879f3f5

jGaboardi approved these changes Jun 5, 2024

View reviewed changes

martinfleis merged commit 34a7fe1 into pysal:main Jun 5, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: include Graph.describe() to describe neighbourhood values #717

ENH: include Graph.describe() to describe neighbourhood values #717

u3ks commented Jun 4, 2024

martinfleis commented Jun 4, 2024

ljwolf commented Jun 4, 2024 •

edited

Loading

martinfleis commented Jun 4, 2024

codecov bot commented Jun 4, 2024 •

edited

Loading

jGaboardi Jun 4, 2024

u3ks Jun 5, 2024

jGaboardi Jun 4, 2024

u3ks Jun 5, 2024

martinfleis Jun 5, 2024

u3ks Jun 5, 2024

knaaptime commented Jun 5, 2024

martinfleis commented Jun 5, 2024

knaaptime commented Jun 5, 2024


		Weight values do not affect the calculations, only adjacency does.

		Returns nan for all isolates.

ENH: include Graph.describe() to describe neighbourhood values #717

ENH: include Graph.describe() to describe neighbourhood values #717

Conversation

u3ks commented Jun 4, 2024

martinfleis commented Jun 4, 2024

ljwolf commented Jun 4, 2024 • edited Loading

martinfleis commented Jun 4, 2024

codecov bot commented Jun 4, 2024 • edited Loading

Codecov Report

jGaboardi Jun 4, 2024

Choose a reason for hiding this comment

u3ks Jun 5, 2024

Choose a reason for hiding this comment

jGaboardi Jun 4, 2024

Choose a reason for hiding this comment

u3ks Jun 5, 2024

Choose a reason for hiding this comment

martinfleis Jun 5, 2024

Choose a reason for hiding this comment

u3ks Jun 5, 2024

Choose a reason for hiding this comment

knaaptime commented Jun 5, 2024

martinfleis commented Jun 5, 2024

knaaptime commented Jun 5, 2024

ljwolf commented Jun 4, 2024 •

edited

Loading

codecov bot commented Jun 4, 2024 •

edited

Loading