The `geometric mean` is a measure of central tendency which represents the central number in a set of numbers by considering the **product** of their values (as opposed to the arithmetic mean, which uses their sum).

The geometric mean is a statistical measure that is particularly useful in situations where you are dealing with quantities that have a multiplicative relationship. It is commonly used in various fields including finance, biology, engineering, and more. Here are some scenarios where using the geometric mean is appropriate:

    Growth Rates and Compound Interest: When calculating growth rates or compound interest over multiple periods, the geometric mean is preferred. This is because it accurately reflects the compounding effect of growth over time, as opposed to the arithmetic mean, which doesn't account for compounding.

    Average Rates of Change: If you're analyzing data that involves rates of change, such as percentage changes in stock prices or population growth rates, the geometric mean can provide a more representative average.

    Data with Multiplicative Relationships: When dealing with quantities that interact multiplicatively, such as ratios, indices, and relative changes, the geometric mean is more appropriate. For example, if you want to find the average rate of return for an investment over several periods, you would use the geometric mean.

    Normalized Data: If you're comparing values that have different scales and you want to remove the effect of scale differences, the geometric mean can be useful. It tends to give more balanced results when dealing with diverse data points.

    Ratios and Proportions: Situations where you're dealing with ratios or proportions, like in analyzing scientific measurements, the geometric mean can provide a more accurate representation of the central tendency.

    Certain Biological and Environmental Data: In some cases, biological and environmental data that involves growth rates, concentrations, or other multiplicative relationships are better analyzed using the geometric mean.

It's important to note that the geometric mean can be sensitive to extreme values. A single very large or very small value in the dataset can heavily influence the geometric mean. Additionally, the geometric mean is not suitable for data that includes negative values, as it involves the calculation of logarithms.

If you're dealing with data that doesn't have a clear multiplicative relationship or you're unsure whether the geometric mean is appropriate, it's a good idea to consider other measures of central tendency like the arithmetic mean or the median. The choice of which measure to use ultimately depends on the nature of your data and the specific goals of your analysis.

In [13]:
import numpy as np

# Given annual growth rates
growth_rates = [0.10, 0.20, -0.05]

# Compute arithmetic mean
arithmetic_mean = sum(growth_rates) / len(growth_rates)

# Compute compounded growth for an initial investment of $100
initial_investment = 100
investment_using_arithmetic_mean = initial_investment
for rate in growth_rates:
    investment_using_arithmetic_mean += investment_using_arithmetic_mean * arithmetic_mean

investment_with_actual_growth = initial_investment
for rate in growth_rates:
    investment_with_actual_growth *= (1 + rate)

geometric_mean = np.prod(np.array([1 + r for r in growth_rates]))**(1/len(growth_rates))

investment_using_geometric_mean = initial_investment
for rate in growth_rates:
    investment_using_geometric_mean *= geometric_mean


# Print results
print(f"Arithmetic Mean of Growth Rates: {arithmetic_mean*100:.2f}%")
print(f"Investment using Arithmetic Mean after 3 years: ${investment_using_arithmetic_mean:.2f}")
print(f"Geometric Mean: {geometric_mean:.4f}")
print(f"Investment using Geometric Mean after 3 years:  ${investment_using_geometric_mean:.2f}")
print(f"Investment with Actual Growth after 3 years:    ${investment_with_actual_growth:.2f}")

Arithmetic Mean of Growth Rates: 8.33%
Investment using Arithmetic Mean after 3 years: $127.14
Geometric Mean: 1.0784
Investment using Geometric Mean after 3 years:  $125.40
Investment with Actual Growth after 3 years:    $125.40


In [18]:
from scipy.stats import hmean

harmonic_mean = hmean([1 + rate for rate in growth_rates])

investment_using_harmonic_mean = initial_investment
for rate in growth_rates:
    investment_using_harmonic_mean *= harmonic_mean
print(f"Harmonic Mean of Growth Multipliers: {harmonic_mean:.4f}")
print(f"Investment using Harmonic Mean after 3 years:  ${investment_using_harmonic_mean:.2f}")
print(f"Investment with Actual Growth after 3 years:   ${investment_with_actual_growth:.2f}")

Harmonic Mean of Growth Multipliers: 1.0733
Investment using Harmonic Mean after 3 years:  $123.65
Investment with Actual Growth after 3 years:   $125.40


---

The harmonic mean is particularly useful in situations where the average of rates is needed. Let's use a data science example related to performance evaluation, particularly the evaluation of classifiers: the F1 score.

In binary classification, two commonly used metrics are precision and recall:

- Precision: It represents the number of correct positive results divided by the number of all positive results.
- Recall: It represents the number of correct positive results divided by the number of positive results that should have been returned.

The F1 score is the harmonic mean of precision and recall:

F1=2×Precision×RecallPrecision+RecallF1=2×Precision+RecallPrecision×Recall​

Why do we use the harmonic mean here and not the arithmetic mean? The answer lies in the nature of the precision and recall metrics. If either precision or recall is extremely low, it will significantly affect the F1 score. The harmonic mean is sensitive to extremely low or high values more than the arithmetic mean. Therefore, using the harmonic mean ensures that classifiers are not considered to have a good F1 score unless both precision and recall are reasonable.

Example:

Suppose you've built a binary classifier to predict whether a given email is spam or not. After testing, you get the following results:

    TP=90: 90 spam emails correctly classified as spam
    FP=10: 10 non-spam emails incorrectly classified as spam
    TN=800: 800 non-spam emails correctly classified as non-spam
    FN=100: 100 spam emails incorrectly classified as non-spam

From this:

Precision​=0.9
Recall=0.4737

Now, compute the F1 score using the formula:

F1≈0.623

Thus, even if precision is high at 0.9, a lower recall of 0.4737 causes the F1 score to drop to around 0.623, highlighting the importance of achieving a balance between precision and recall.

Harmonic Mean: `n / ((1 / x_1) + (1 / x_2) + ... + (1 / x_n))`

In [28]:
hmean([0.9, 0.4737])

0.6207032103079275

In [29]:
np.mean([0.9, 0.4737])

0.68685

---