
# Voting Baseline - Solving the Synthanic using Democracy
This notebook provides a simple baseline to ensemble submissions using voting.


### Credits
I used a random selection of public notebooks. All credit goes to the creators:

@andreshg: https://www.kaggle.com/andreshg/tps-apr-data-visualization-and-engineering

@Alexander Ryzhkov: https://www.kaggle.com/alexryzhkov/n3-tps-april-21-lightautoml-starter

@tomwarrens: https://www.kaggle.com/tomwarrens/tps-april-2021-lgbm-optuna

@springmanndaniel: https://www.kaggle.com/springmanndaniel/bagged-lgbms

## What is voting ensembling?
The answer is fairly simple. 

We just look at each row of our submission and retrive the prediction of each model (the votes). Then we count the votes for "Survived" (1) or "Not Survived" (0) and the prediction with the most votes wins.

In [None]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# all our submissions paths
voters = [
    "../input/tps-apr-data-visualization-and-engineering/lightautoml_utilized_300s_f1_metric.csv",
    "../input/n3-tps-april-21-lightautoml-starter/submission_N3.csv",
    "../input/tps-april-2021-lgbm-optuna/submission.csv",
    "../input/bagged-lgbms/submission_prob.csv"
]

# Our voters
voter_tags = ["AndresHG", "alexryzhkov", "Tommaso Guerrini", "danzel"]

In [None]:
combined_votes = pd.DataFrame()

for voter in voters:
    d = pd.read_csv(voter)
    combined_votes = pd.concat([combined_votes, d[['Survived']]], 
                             axis=1)
    
combined_votes.columns=voter_tags    

combined_votes_corr = combined_votes.corr()

sns.set(font_scale=1.3)

fig,axes=plt.subplots(figsize=(12,12))

sns.heatmap(combined_votes_corr,
           annot=True,
           vmin=0.7,
           vmax=1,
           fmt='.3f',
           linewidth=1,
         annot_kws={"fontsize":8})

plt.title('Vote Correlations')
plt.tight_layout()

In [None]:
combined_votes["Results"] = combined_votes.sum(axis=1)

In [None]:
# predict 1 if the majority of our voters say so
combined_votes["Survived"] = combined_votes["Results"].apply(lambda x: 1 if x > len(voters)/2 else 0)

In [None]:
# create our submission
sub_df = pd.read_csv("../input/tabular-playground-series-apr-2021/sample_submission.csv")
sub_df["Survived"] = combined_votes["Survived"]
sub_df.to_csv("submission.csv", index=False)