# Imports

In [1]:
import numpy as np 
import pandas as pd 

In [2]:
target=["Rings"]

# Mathematical Generalization

<font size="3"> We can use different types of means such as Geometric Mean, Arithmetic Mean, and Harmonic Mean</font>

 * **Geometric Mean:**
The Geometric Mean is calculated by taking the nth root of the product of n numbers.

Geometric Mean = (x₁ * x₂ * x₃ * ... * xₙ)^(1/n)

* **Arithmetic Mean:**
The Arithmetic Mean is the sum of all the numbers in a series divided by the total number of values.

Arithmetic Mean = (x₁ + x₂ + x₃ + ... + xₙ) / n

* **Harmonic Mean:**
The Harmonic Mean is calculated by taking the reciprocal of the arithmetic mean of the reciprocals of the numbers in the series.

Harmonic Mean = n / ((1/x₁) + (1/x₂) + (1/x₃) + ... + (1/xₙ))

where x₁ , x₂ , x₃ , ... ,xₙ are vectors predicted from each submission

<font size="3">For a set of positive predictions, the Harmonic Mean is the smallest, followed by the Geometric Mean, and then the Arithmetic Mean with the largest values on average</font>

**When to use:**
1. Geometric Mean is less sensitive to extreme values (outliers) in the data compared to the arithmetic mean. This can give a better results and reduces overfitted predictions.
2. Arithmetic Mean is sensitive to extreme values. One very large or very small number can significantly affect the mean. Applicable for cases where large predictions are important. 
3. Harmonic Mean is rarely used and this can give more weightage to smaller numbers among the predictions

**Weights:** Weights are assigned by repeating the predictions in the computation, a general method is to assign rank, least RMSE gets a larger # as rank

<font size="3"> My notebook on solvong the problem statement with detailed feature engineering  and modeling is located [here](https://www.kaggle.com/code/arunklenin/ps4e4-abalone-age-prediction-regression)</font>

<font size="3">Two external notebook results have been considered</font>
 1. [https://www.kaggle.com/code/igorvolianiuk/abalone-rings-ensemble](https://www.kaggle.com/code/igorvolianiuk/abalone-rings-ensemble) 
 2. [https://www.kaggle.com/code/mfmfmf3/clean-code-voting-regressor-base-3-models](https://www.kaggle.com/code/mfmfmf3/clean-code-voting-regressor-base-3-models)

In [3]:
sub_external1=pd.read_csv("/kaggle/input/clean-code-voting-regressor-base-3-models/submission_0.14550.csv")
sub_external2=pd.read_csv("/kaggle/input/abalone-rings-ensemble/submission.csv")
submission1=pd.read_csv("/kaggle/input/ps4e4-ensemble-ancillary/sub_pure_14556.csv")
submission2=pd.read_csv("/kaggle/input/ps4e4-ensemble-ancillary/sub_pure_14563.csv")
submission3=pd.read_csv("/kaggle/input/ps4e4-ensemble-ancillary/sub_pure_14572.csv")
submission4=pd.read_csv("/kaggle/input/ps4e4-ensemble-ancillary/sub_pure_14648.csv")
submission5=pd.read_csv("/kaggle/input/ps4e4-ensemble-ancillary/sub_pure_14651.csv")

"""
Select submissions from different versions of your work or other's predictions. Different frameworks result in better generalization
"""


sub_list=[sub_external1,sub_external2,submission1,submission2,submission3, submission4, submission5] # list all the results

"""
Since there are 7 predictions, I'm assigning a weight 5 for the best three results and 1 each for the others
"""
weights=[6,6,6,1,1,1,1]

if len(sub_list)==len(weights):
    weighted_list = [item for sublist, weight in zip(sub_list, weights) for item in [sublist] * weight]
    

def ensemble_mean(sub_list,cols, mean="AM"):
    
    """
    The function computes Arithmetic Mean/Geometric Mean/Harmonic Mean given a list of results with specific results.
    """
    
    sub_out=sub_list[0].copy()
    if mean=="AM":
        for col in cols:
            sub_out[col]=sum(df[col] for df in sub_list)/len(sub_list)
    elif mean=="GM":
        for df in sub_list[1:]:
            for col in cols:
                sub_out[col]*=df[col]
        for col in cols:
            sub_out[col]=(sub_out[col])**(1/len(sub_list))
    elif mean=="HM":
        for col in cols:
            sub_out[col]=len(sub_list)/sum(1/df[col] for df in sub_list)
    
    return sub_out
    
sub_ensemble=ensemble_mean(weighted_list,target,mean="HM")
sub_ensemble.head()

Unnamed: 0,id,Rings
0,90615,9.721186
1,90616,9.688203
2,90617,9.685821
3,90618,10.545066
4,90619,7.597252


In [4]:
sub_ensemble.to_csv('submission.csv',index=False)
sub_ensemble.head()

Unnamed: 0,id,Rings
0,90615,9.721186
1,90616,9.688203
2,90617,9.685821
3,90618,10.545066
4,90619,7.597252
