# Ensembling

On this notebook, the best predictions from the models presented before, are combined to one dataframe. Then the mean of predicted prices is calculated for each testset item and used as a new prediction. Ensembling results a root-squared-mean-error of 0.147. So ensembling didn't cause improvement, since the best result was 0.1335.

In [8]:
import os
import pandas as pd

Combine all the model predictions into a single dataframe.

In [9]:
folder = "csvs/"
predictions = pd.DataFrame()
column_count = 0

for index, filename in enumerate(os.listdir(folder)):
  if filename.endswith('.csv'):
    df = pd.read_csv(folder + filename)

    if "Id" not in predictions:
      predictions = df
    else:
      predictions["col_" + str(index)] = df['SalePrice']

    column_count = index + 1


predictions.rename(columns={"SalePrice": "col_0"}, inplace=True)
predictions.head()


Unnamed: 0,Id,col_0,col_1
0,1461,130043.7525,127734.846667
1,1462,161584.36,154519.313333
2,1463,184692.07,182991.743333
3,1464,194002.5,188732.326667
4,1465,190903.7525,189549.58


In [10]:
# This function gets the "majority vote" of all the model predictions for each row. 
def sale_price(row):
  return row.drop('Id').mean()

In [11]:
# Get the ensembled prediction for each testset item.
predictions['SalePrice'] = predictions.apply(sale_price, axis=1)

In [12]:
predictions.head()

Unnamed: 0,Id,col_0,col_1,SalePrice
0,1461,130043.7525,127734.846667,128889.299583
1,1462,161584.36,154519.313333,158051.836667
2,1463,184692.07,182991.743333,183841.906667
3,1464,194002.5,188732.326667,191367.413333
4,1465,190903.7525,189549.58,190226.66625


Separate the final ensembled predictions into a dataframe.

In [13]:
ensembled_predictions = predictions[['Id', 'SalePrice']]
ensembled_predictions.head()

Unnamed: 0,Id,SalePrice
0,1461,128889.299583
1,1462,158051.836667
2,1463,183841.906667
3,1464,191367.413333
4,1465,190226.66625


In [14]:
ensembled_predictions.to_csv('ensemble_preds_01.csv', index=None)