# Combining Lyft Multimode Models

### <font color="red">Notice</font>

> <b><font color="blue">This notebook is intended to help you combining your internal models, not for public test data probing ! Please use it wisely.</font></b>

# The main idea

At the early beginning of this competition, we found that <font color="blue">combining solutions from different single-mode models considerably ameliorates results</font>. Furthermore, the combining process was very easy when dealing with **single-mode** models we just have to take 3 models, one for each channel, and tune their probabilities accordingly.

Things become harder when talking about **multi-mode models** where there is no obvious order between diffrent models predictions. We come accross this order absence issue by sorting the 3 channels such that the one with the highest probability will be the first one for all the models. Then, the channel with the second highest proba will be the second channel for all the models; same for the third channel. Hence, we introduce a somehow consistent order which allow to combine different models.

As the whole process was slow with **pure pandas**, we switch to **numpy** to speed up  everything.

Let's see it in practice !

**In the rest of my code, I will be using [this public dataset](https://www.kaggle.com/kneroma/lyft-best-performing-public-kernels) for demo only**

In [None]:
import pandas as pd, numpy as np
pd.options.display.max_columns=305

In [None]:
paths = [
    "../input/lyft-best-performing-public-kernels/lyft-ensembling-raster-sizes.csv", 
    "../input/lyft-best-performing-public-kernels/lyft-prediction-with-multi-mode-confidence.csv",
]
weights = [0.4, 0.6]

In [None]:
conf_cols = np.array(["conf_0", "conf_1", "conf_2"])

In [None]:
xy_cols = [[],[],[]]
for i in range(50):
    for j in range(3):
        xy_cols[j].append(f"coord_x{j}{i}")
        xy_cols[j].append(f"coord_y{j}{i}")
xy_cols[0][:10]

In [None]:
COLUMNS = ["timestamp", "track_id"] + list(conf_cols) + xy_cols[0] + xy_cols[1] + xy_cols[2]

# Sorting

In [None]:
def sort_df(df, sort_timestamp_track_id=True):
    
    conf_orders = np.argsort(-df[conf_cols].values,1)
    XY = np.stack([df[xy_cols[0]].values,df[xy_cols[1]].values, df[xy_cols[2]].values], axis=1)
    XY = XY[np.arange(len(XY))[:, None], conf_orders]

    df2 = pd.DataFrame(columns = COLUMNS)
    df2["timestamp"] = df["timestamp"].values
    df2["track_id"] = df["track_id"].values
    df2[xy_cols[0] + xy_cols[1] + xy_cols[2]] = XY.reshape(-1,300)
    df2[conf_cols] = df[conf_cols].values[np.arange(len(df))[:, None], conf_orders]
    
    if sort_timestamp_track_id:
        df2.sort_values(["timestamp", "track_id"], inplace=True)
        df2.reset_index(inplace=True, drop=True)
    return df2

# Combining

In [None]:
%%time

df = None
for path,w in zip(paths,weights):
    print(w, path)
    temp = pd.read_csv(path)
    temp = sort_df(temp)
    temp[COLUMNS[5:]] *= w
    if df is None:
        df = temp
    else:
        df[COLUMNS[2:]] += temp[COLUMNS[2:]]
df[conf_cols] /= df[conf_cols].sum(1).values[:, None]

sample = pd.read_csv("../input/lyft-motion-prediction-autonomous-vehicles/multi_mode_sample_submission.csv")

df = sample[["timestamp", "track_id"]].merge(df, on=["timestamp", "track_id"])
sample.shape, df.shape

In [None]:
df.head()

In [None]:
df.to_csv("submission.csv", index=False, float_format='%.6f')