-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Add optional weighting to statistics.harmonic_mean() #82489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Currently, harmonic_mean() is difficult to use in real applications because it assumes equal weighting. While that is sometimes true, the API precludes a broad class of applications where the weights are uneven. That is easily remedied with an optional *weights* argument modeled after the API for random.choices(): harmonic_mean(data, weights=None) Examples Suppose a car travels 40 km/hr for 5 km, and when traffic clears, speeds-up to 60 km/hr for the remaining 30 km of the journey. What is the average speed? >>> harmonic_mean([40, 60], weights=[5, 30])
56.0 Suppose an investor owns shares in each of three companies, with P/E (price/earning) ratios of 2.5, 3 and 10, and with market values of 10,000, 7,200, and 12,900 respectively. What is the weighted average P/E ratio for the investor’s portfolio? >>> avg_pe = harmonic_mean([2.5, 3, 10], weights=[10_000, 7_200, 12_900])
>>> round(avg_pe, 1)
3.9 Existing workarounds It is possible to use the current API for theses tasks, but it is inconvenient, awkward, slow, and only works with integer ratios: >>> harmonic_mean([40]*5 + [60]*30)
56.0
>>> harmonic_mean([2.5]*10_000 + [3]*7_200 + [10]*12_900)
3.9141742522756826 Algorithm Following the formula at https://en.wikipedia.org/wiki/Harmonic_mean#Weighted_harmonic_mean , the algorithm is straight forward: def weighted_harmonic_mean(data, weights):
num = den = 0
for x, w in zip(data, weights):
num += w
den += w / x
return num / den PRIf you're open to this suggestion, I'll work-up a PR modeled after the existing code and that uses _sum() and _fail_neg() for exactness and data validity checks. |
Sounds like a great idea to me. Thanks, Steven |
Great idea! |
@rhettinger |
Thank you, but this is one I would like to do myself. I've already done work on it and would like to wrap it up (also, it's more complicated than it seems because the supporting functions are a bit awkward to use in this context). |
Oh.. Thank you for letting me know |
I like the addition but I'm not sure why you removed the price-earnings ratio example from the docs. I think that it's useful to have an example that shows that harmonic mean is not *just* for speed-related problems. I'm not going to reject your change just on this documentation issue, but I would like to hear why you removed the P/E example instead of just adding additional examples. |
I tried out the existing P/E example in my Python courses and found that it had very little explanatory power — in general, non-finance people know less about P/E ratios than they know about the harmonic mean :-) For people with a finance background who do already understand P/E ratios, the example is weak. The current example only works mathematically if the portfolios are exactly the same market value at the time the ratios are combined — this never happens. Also P/E ratios in real portfolios include zero and negative values — that won't work with our harmonic mean. Also, combining P/Es for non-homogenous securities is a bit of dark art. Given a utility stock, a healthcare stock, and a tech stock, the aggregate P/E is rarely comparable to anything else. All that said, I would be happy to add the example back if you think it is necessary. It's your module and it's important that you're happy with it :-)
I considered using a resistors in parallel example, but that is somewhat specialized and isn't directly applicable because we normally don't want a mean at all, we just want the equivalent resistance. I also thought about adding something like: "The harmonic mean is the smaller of the three Pythagorean means and tends to emphasize the impact of small outliers while minimizing the impact of large outliers." But while this is true, I've never seen a data scientist switch from an arithmetic mean to a harmonic mean to achieve this effect. |
Okay, I'm satisfied with that reasoning, thanks Raymond. Patch looks good to me. Go for it! Have a good Christmas and stay safe. |
Thank you and Merry Christmas. |
The "versionchanged" for *weights* should be 3.10, not 3.8. I've created PR 23919 to fix this. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: