Kaggle Pipeline

The design of pipeline is based on my experience in Kaggle competitions and other data challenges. In this pipeline, I try to decouple different steps from data IO, feature extraction, building model, validation to stacking. The main purpose is for fast protytyping (not scalable). In terms of stacking, I am not a fan of multiple level stacking so the Stacker only provides basic stacking.

File Structure

├── input	<-- train and test data
├── README.md
├── submission	<--- final submission folder
├── src	<-- where code is stored
│   ├── main.py	<--- main process
│   ├── DataWarehouse.py	<--- handle data IO
│   ├── FeatureGenerator.py	<--- convert raw data to feature matrix
│   ├── Model.py	<--- used to initialize different machine learning models
│   ├── Stacker.py	<--- stacking different models together with feature matrix
│   ├── Validator.py	<--- k-fold validation of Model/Stacker
│   ├── common.py	<--- define where files are stored and setup logging
│   ├── config	<--- storing config files for stacking, models, features to extract

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kaggle Pipeline

File Structure

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
input		input
src		src
submission		submission
README.md		README.md

randxie/kaggle-pipeline

Folders and files

Latest commit

History

Repository files navigation

Kaggle Pipeline

File Structure

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages