Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

functional reached calculations #575

Merged
merged 10 commits into from
May 13, 2024
Merged

Conversation

u3ks
Copy link
Collaborator

@u3ks u3ks commented Apr 29, 2024

a new functional version of the mm.Reached function. It calculates several statistics at once similar to mm.describe(

@u3ks
Copy link
Collaborator Author

u3ks commented Apr 29, 2024

Timings:

  1. count without spatial weights
  • old: 24.5 ms ± 216 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
  • new: 15.4 ms ± 368 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
  1. count with spatial weights
  • old: 2.59 s ± 33.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  • new: 322 ms ± 6.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
  1. statistics with spatial weights:
    -old: 1min ± 601 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
    -new: 322 ms ± 6.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Copy link
Member

@martinfleis martinfleis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round

Comment on lines 9 to 19
try:
from numba import njit
except (ModuleNotFoundError, ImportError):
warnings.warn(
"The numba package is used extensively in this function to accelerate the"
" computation of statistics but it is not installed or cannot be imported."
" Without numba, these computations may become slow on large data.",
UserWarning,
stacklevel=2,
)
from libpysal.common import jit as njit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning should be printed only if a user calls a function that actually uses numba. That is why it was originally inside describe. I would still keep the warning inside if possible.



def describe_reached(
y, y_group, result_index=None, graph=None, q=None, include_mode=False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably call y_group graph_index following what we did in street_alignment

def street_alignment(
building_orientation: Series,
street_orientation: Series,
street_index: Series,
) -> Series:

it should be clear from the name that this contains ID's aligned with y that link to the index in graph.

y, y_group, result_index=None, graph=None, q=None, include_mode=False
) -> DataFrame:
"""
Calculates statistics of ``y`` objects reached on a street network.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not only street network. Graph or IDs can come from anywhere. You can easily use enclosure ID here instead. The function is pretty generic.

"""
Calculates statistics of ``y`` objects reached on a street network.
Requires a ``y_group`` that links the ``y`` objects to streets (or ``graph``)
assigned beforehand (e.g. using :py:func:`momepy.get_network_id`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assigned beforehand (e.g. using :py:func:`momepy.get_network_id`).
assigned beforehand (e.g. using :py:func:`momepy.get_nearest_street`).

that is the new one

return stat_
if isinstance(y, np.ndarray):
y = pd.Series(y, name="obs_index")
elif isinstance(y, Series):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
elif isinstance(y, Series):
if isinstance(y, Series):

to treat the series created above, no?

)
result.loc[stats.index.values] = stats.values
result.columns = stats.columns
result = result.fillna(0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this if you are filling the results full of zeros, no?

@@ -11,7 +12,7 @@
GPD_013 = Version(gpd.__version__) >= Version("0.13")


class TestDistribution:
class TestDiscribe:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class TestDiscribe:
class TestDescribe:

typo

Comment on lines 171 to 172
# not using assert_result since the method
# is returning an aggregation, indexed based on nID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not? We use it in describe without issues. I'd prefer to use it as it checks the index.

Copy link

codecov bot commented Apr 29, 2024

Codecov Report

Attention: Patch coverage is 98.88889% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 97.8%. Comparing base (4037c70) to head (e21ca5e).
Report is 20 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##            main    #575     +/-   ##
=======================================
+ Coverage   97.4%   97.8%   +0.4%     
=======================================
  Files         26      37     +11     
  Lines       4328    5299    +971     
=======================================
+ Hits        4214    5183    +969     
- Misses       114     116      +2     
Files Coverage Δ
momepy/functional/tests/conftest.py 100.0% <ø> (ø)
momepy/functional/tests/test_diversity.py 100.0% <100.0%> (ø)
momepy/functional/_diversity.py 97.8% <97.6%> (ø)

... and 13 files with indirect coverage changes

@@ -10,7 +10,7 @@ dependencies:
- networkx=2.7
- numpy=1.22
- packaging
- pandas>=1.4.0,!=1.5.0,<2
- pandas=2.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert this since we're checking for 2.1.0 anyway?

Comment on lines 10 to 15
throw_numba_warning = False
try:
from numba import njit
except (ModuleNotFoundError, ImportError):
throw_numba_warning = True
from libpysal.common import jit as njit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw_numba_warning = False
try:
from numba import njit
except (ModuleNotFoundError, ImportError):
throw_numba_warning = True
from libpysal.common import jit as njit
try:
from numba import njit
HAS_NUMBA = True
except (ModuleNotFoundError, ImportError):
HAS_NUMBA = False
from libpysal.common import jit as njit

Just a minor nit to keep this consistent across PySAL.

Comment on lines 46 to 53
results = [
values.shape[0],
np.mean(values),
np.median(values),
np.std(values),
np.min(values),
np.max(values),
np.sum(values),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to work properly with missing values.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the code, I think we can just get rid of the nans beforehand and compute everything on the non_na values, what do you think?

@martinfleis martinfleis requested a review from jGaboardi May 2, 2024 16:08
Parameters
----------
grouper : pandas.GroupBy
Groupby Object which specifies the aggregations to be performed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Groupby Object which specifies the aggregations to be performed
Groupby Object which specifies the aggregations to be performed.

"""

if Version(pd.__version__) <= Version("2.1.0"):
raise NotImplementedError("Please update to a newer version of pandas.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a proper usage of a NotImplementedError?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be ImportError I think


if (result_index is None) and (graph is None):
raise ValueError(
"One of result_index or graph has to be specified, but not both."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's declare this ValueError so it can be used here and below. Or at least declare the message to re-use.

Copy link
Member

@martinfleis martinfleis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few notes, mostly on the documentation side.

momepy/functional/_diversity.py Outdated Show resolved Hide resolved
Comment on lines 187 to 189
"""
Calculates statistics of ``y`` objects reached on a neighbourhood graph.
Requires a ``graph_index`` that links the ``y`` objects to ``graph`` or streets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It do not have to be streets, the graph can be done based on anything. The key point is that it is a Graph based on another object than the one on which we have y. We can include link to streets as an example.

Comment on lines +195 to +197
The statistics calculated are count, sum, mean, median, std.
Optionally, mode can be calculated, or the statistics can be calculated in
quantiles ``q``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to the top and have the implementation details come after this? You first want to know what it does and then how to use it.

momepy/functional/_diversity.py Outdated Show resolved Hide resolved
momepy/functional/_diversity.py Show resolved Hide resolved
momepy/functional/_diversity.py Outdated Show resolved Hide resolved
u3ks and others added 2 commits May 13, 2024 13:45
Co-authored-by: Martin Fleischmann <martin@martinfleischmann.net>
@martinfleis martinfleis merged commit 889fb44 into pysal:main May 13, 2024
14 checks passed
@martinfleis martinfleis added enhancement New feature or request and removed refactor labels Jun 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants