-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/average #650
Feature/average #650
Conversation
Thanks @MaximilianR. There has been an open issue here on this for a while (#422). @shoyer - I'm actually not sure I love how I implemented this but I'm teaching a session on open source contributions and code review today so I threw this up here as an example. |
Any thoughts on the tradeoff between adding |
|
||
# if NaNs are present, we need individual weights | ||
valid = self.notnull() | ||
if valid.any(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you have this logic backwards? I think should be if not valid.all()
.
That said, these sort of conditionals (that look at the data) are usually best avoided when using dask because they will bring constructing the computation to a screeching halt while it does the evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right about the logic being backwards. I'm not sure how that happened. There was some discussion about why we needed this conditional in this comment: #422 (comment).
If we can think of a way to side step this, I'm fine with removing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just use sum_of_weights = weights.where(valid).sum(dim=dim, axis=axis)
. That will work regardless of whether there are any nulls in the array.
This line is here because we don't want to count weights corresponding to NaN values in the sum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes I must have been thinking in terms of isnan
That would be the main motivation. If Pandas is going the way of pandas-dev/pandas#10030 via mean, I think we could do that as well. I actually like that approach more since we tend to call it a "weighted mean" (see title of pandas issue). |
If you think the ability to return |
Okay, let's go with the @mathause - any comment? |
Didn't realize you were working on this. Pulling it into mean is fine for me (if you need the weights it is a one-liner).
@jhamman you showed this in a lecture? cool :) |
I'm doing some cleanup on my outstanding issues/PRs. After thinking about this again, I'm not all than keen on pushing this into the |
I am fine having it as extra method. I think it is an important feature to have - I use this function every day. |
I would still lean toward trying to put this into #770 might help with some of the redundant code (if/when we get around to it). |
It seems incorporating this to Anyway, I have tried to put together some corner cases whre there are NaN in the data or the weights. Unfortunately there is no https://gist.github.com/mathause/720cbca2d97597a99534581b8ca296a5 The above implementation works fine, however there are currently two cases where I expect another answer:
I think this should return NaN.
I think these should also return NaN. |
Like most new features for pandas (or xarray, for that matter), there isn't anyone who has committed to working on it -- it will depend on the interest of a contributor. |
I could imagine to continue working on this - however, there are some open design questions:
|
How about designing this as a So for example And then this is extensible, clean, pandan-tic. |
@MaximilianR great idea! A groupby like interface is much cleaner than adding more orthogonal code paths to .mean and the like. |
@MaximilianR - I really like this idea. I'm going to close this PR and we can continue to discuss this feature in the original issue (#422 (comment)). |
closes #422