-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: stats.binned_statistic_dd binnumber is not usable #16195
Comments
Have you tried setting |
(Also xref #5449 where this was discussed previously) |
Expanded binnumbers make more sense, but
Overall I think the documentation would need some adjustments and/or the returned values should be more "friendly" to reutilize |
@rebelot Forgive me if you already know this, but I thought it was worth mentioning since I don't see it explicitly above. It appears that you are not providing the Do you agree that it is reasonable to add these extra bins in this case? The designer decided to make the behavior consistent regardless of whether the extra bins are needed or not. Would you suggest something different? For instance, would it be reasonable to add a boolean option that by default maintains the current behavior but, when set manually, refuses to add these extra bins? Then, you'd get the indexing you expected, and perhaps the |
thanks for your reply. |
I see. Yeah, that's surprising to me. |
Describe your issue.
when using
binned_statistic_dd
one could be interested in getting the indexes/values of data points that fall in certain bins depending on the value of the computed statistic.After computing the statistic, one could be tempted to use
np.nonzero((hist >= x) & (hist < y))
paired withnp.ravel_multi_index
or other indexer functions and think that the returned indices should correspond, in principle, to binnumbers. Then, accessing your data is as easy asdata[bs.binnumber == indices]
.However, this is not the case. If you have
(N, M)
bins, It looks like binnumbers are counted as if there were 2 extra bins per data dimension, i.e.(N+2, M+2)
despite the shape ofstatistc
being(N, M)
.So, in order to "fix" the mapping between binnumber and statistic indices, one should re-ravel binnumbers or their indices as if they were coming from a larger array and "shifted one diagonal element up/down".
I guess this is needed internally when one wants to reuse previous results, however I think it would be better for binned_statistic_dd to encode/decode binnumbers so that they are are consistent with the "user's side" data shapes.
Reproducing Code Example
SciPy/NumPy/Python version information
1.7.3 1.21.5 sys.version_info(major=3, minor=9, micro=12, releaselevel='final', serial=0)
The text was updated successfully, but these errors were encountered: