Add optional weighting to statistics.harmonic_mean() #82489
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
assignee = 'https://github.com/rhettinger' closed_at = <Date 2020-12-24.03:53:17.407> created_at = <Date 2019-09-28.18:28:15.804> labels = ['type-feature', 'library', '3.10'] title = 'Add optional weighting to statistics.harmonic_mean()' updated_at = <Date 2021-01-03.12:35:29.887> user = 'https://github.com/rhettinger'
activity = <Date 2021-01-03.12:35:29.887> actor = 'serhiy.storchaka' assignee = 'rhettinger' closed = True closed_date = <Date 2020-12-24.03:53:17.407> closer = 'rhettinger' components = ['Library (Lib)'] creation = <Date 2019-09-28.18:28:15.804> creator = 'rhettinger' dependencies =  files =  hgrepos =  issue_num = 38308 keywords = ['patch'] message_count = 13.0 messages = ['353469', '353526', '353550', '353985', '353992', '353999', '383664', '383665', '383667', '383671', '383672', '383677', '384268'] nosy_count = 6.0 nosy_names = ['rhettinger', 'mark.dickinson', 'steven.daprano', 'serhiy.storchaka', 'corona10', 'ZackerySpytz'] pr_nums = ['23914', '23919'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue38308' versions = ['Python 3.10']
The text was updated successfully, but these errors were encountered:
Currently, harmonic_mean() is difficult to use in real applications because it assumes equal weighting. While that is sometimes true, the API precludes a broad class of applications where the weights are uneven.
That is easily remedied with an optional *weights* argument modeled after the API for random.choices():
Suppose a car travels 40 km/hr for 5 km, and when traffic clears, speeds-up to 60 km/hr for the remaining 30 km of the journey. What is the average speed?
>>> harmonic_mean([40, 60], weights=[5, 30]) 56.0
Suppose an investor owns shares in each of three companies, with P/E (price/earning) ratios of 2.5, 3 and 10, and with market values of 10,000, 7,200, and 12,900 respectively. What is the weighted average P/E ratio for the investor’s portfolio?
>>> avg_pe = harmonic_mean([2.5, 3, 10], weights=[10_000, 7_200, 12_900]) >>> round(avg_pe, 1) 3.9
It is possible to use the current API for theses tasks, but it is inconvenient, awkward, slow, and only works with integer ratios:
>>> harmonic_mean(*5 + *30) 56.0 >>> harmonic_mean([2.5]*10_000 + *7_200 + *12_900) 3.9141742522756826
Following the formula at https://en.wikipedia.org/wiki/Harmonic_mean#Weighted_harmonic_mean , the algorithm is straight forward:
def weighted_harmonic_mean(data, weights): num = den = 0 for x, w in zip(data, weights): num += w den += w / x return num / den
If you're open to this suggestion, I'll work-up a PR modeled after the existing code and that uses _sum() and _fail_neg() for exactness and data validity checks.
Thank you, but this is one I would like to do myself. I've already done work on it and would like to wrap it up (also, it's more complicated than it seems because the supporting functions are a bit awkward to use in this context).
I like the addition but I'm not sure why you removed the price-earnings ratio example from the docs. I think that it's useful to have an example that shows that harmonic mean is not *just* for speed-related problems.
I'm not going to reject your change just on this documentation issue, but I would like to hear why you removed the P/E example instead of just adding additional examples.
I tried out the existing P/E example in my Python courses and found that it had very little explanatory power — in general, non-finance people know less about P/E ratios than they know about the harmonic mean :-)
For people with a finance background who do already understand P/E ratios, the example is weak. The current example only works mathematically if the portfolios are exactly the same market value at the time the ratios are combined — this never happens. Also P/E ratios in real portfolios include zero and negative values — that won't work with our harmonic mean. Also, combining P/Es for non-homogenous securities is a bit of dark art. Given a utility stock, a healthcare stock, and a tech stock, the aggregate P/E is rarely comparable to anything else.
All that said, I would be happy to add the example back if you think it is necessary. It's your module and it's important that you're happy with it :-)
I considered using a resistors in parallel example, but that is somewhat specialized and isn't directly applicable because we normally don't want a mean at all, we just want the equivalent resistance.
I also thought about adding something like: "The harmonic mean is the smaller of the three Pythagorean means and tends to emphasize the impact of small outliers while minimizing the impact of large outliers." But while this is true, I've never seen a data scientist switch from an arithmetic mean to a harmonic mean to achieve this effect.