Model Stacking

This repository contains a demonstration of stacking three different models for more accurate prediction.

The data used for this demo is from the Kaggle Employee Access Challenge.

The code is self-contained within the file complete_pipeline.py and can be run directly by placing it in a folder together with train.csv and test.csv.

Description

The three models that are stacked together are

A 3-layer neural network with 100, 1000 and 1 layers, respectively. Additionally, there is a dropout with dropout probability 0.3 between each layer to avoid overfitting
An XGBoost classifier with parameters max_depth = 6 and eta = 0.8
A KNN classifier using 10 nearest neighbors, with the influence of each neighbor on the final prediction weighted by the distance from the point to be predicted

A so-called "meta-learner" then takes the predictions from these three models and uses these as predictors to make a final prediction. Here, a simple logistic regression with L2 regularization is used as the meta-learner.

To train the ensemble model, the training data set is first divided up into 10 disjoint folds. Then, the three models, which we train on the data not in the fold, make a prediction for the data points in the fold. We assemble all of these predictions into a new matrix with three columns and one prediction for each point in the training set from the three individual models. Finally, the meta-learner is trained on this new matrix.

It is important to train the individual models using the disjoint fold approach to ensure that the model generalizes properly.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
ensemble		ensemble
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
complete_pipeline.py		complete_pipeline.py
data_processing.py		data_processing.py
ensemble_probabilities.py		ensemble_probabilities.py
knn_fit.py		knn_fit.py
main.py		main.py
nn_fit.py		nn_fit.py
test.csv		test.csv
test_pred.csv		test_pred.csv
train.csv		train.csv
xgboost_fit.py		xgboost_fit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model Stacking

Description

About

Releases

Packages

Languages

License

leandreeberhard/Model-Stacking

Folders and files

Latest commit

History

Repository files navigation

Model Stacking

Description

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages