# Median wins against weighted mean

In this competition, we have seen public notebooks which compute a weighted arithmetic mean of other submissions. Here we use the same input as the currently top-scoring public notebook, but compute the median rather than the mean (and get a better score).

In a competition which is scored by mean absolute error (MAE), optimizing weights is a waste of time: The median is better and doesn't need any weights.

See https://www.kaggle.com/c/ventilator-pressure-prediction/discussion/280573 for a discussion of the topic.

<font size="1">There is one exception to the rule: If you have only two inputs, a weighted arithmetic mean give a better result than the median.<font>

In [None]:
import pandas as pd
import numpy as np

files = ['../input/gb-vpp-pulp-fiction/median_submission.csv',
         '../input/basic-ensemble-of-public-notebooks/submission_median.csv',
         '../input/gaps-features-tf-lstm-resnet-like-ff/sub.csv']

sub = pd.read_csv('../input/ventilator-pressure-prediction/sample_submission.csv')
sub['pressure'] = np.median(np.concatenate([pd.read_csv(f)['pressure'].values.reshape(-1, 1) for f in files], axis=1), axis=1)

In [None]:
data = pd.read_csv('../input/ventilator-pressure-prediction/train.csv', usecols=['pressure', 'u_out'])

pressure_sorted = np.sort(data['pressure'].unique())
PRESSURE_MIN = pressure_sorted[0]
PRESSURE_MAX = pressure_sorted[-1]
PRESSURE_STEP = pressure_sorted[1] - pressure_sorted[0]

def post_process(pressure):
    pressure = np.round((pressure - PRESSURE_MIN) / PRESSURE_STEP) * PRESSURE_STEP + PRESSURE_MIN
    pressure = np.clip(pressure, PRESSURE_MIN, PRESSURE_MAX)
    return pressure

sub.pressure = post_process(sub.pressure)

In [None]:
sub.to_csv('submission.csv', index=False)
sub.head(5)