Sklearn and XGBoost algorithms and groupby features

Seedling 🌱

Run experiment!

Random search in the hyper-parameters space with XGBoost model

Put correct paths to data: neptune_random_search.yaml:L21.
Put your neptune.ml user-name (you specify it upon registration 😉) in the first line: neptune_random_search.yaml:L1.
Consider your own ranges of hyper-parameters and number of runs. We put some reasonable values, hence setup as-is should give you decent results.
Run experiment 🚀

neptune run --config neptune_random_search.yaml main.py train_evaluate_predict --pipeline_name XGBoost

How to train other models?

Once you put your paths to data and neptune.ml user-name, you have five end-to-end pipelines available:

the model	run command
lightGBM	neptune run --config neptune_random_search.yaml main.py train_evaluate_predict --pipeline_name lightGBM
XGBoost	neptune run --config neptune_random_search.yaml main.py train_evaluate_predict --pipeline_name XGBoost
random_forest	neptune run --config neptune_random_search.yaml main.py train_evaluate_predict --pipeline_name random_forest
log_reg	neptune run --config neptune_random_search.yaml main.py train_evaluate_predict --pipeline_name log_reg
svc	neptune run --config neptune_random_search.yaml main.py train_evaluate_predict --pipeline_name svc

'log_reg' is logistic regression and 'svc' is Support Vector Classifier

All of them are parameterizable in the neptune_random_search.yaml, so you can easily run multiple experiments 😃

Single run

Make sure that you put correct paths to data: neptune.yaml:L21 and neptune.ml user-name (you specify it upon registration) in the first line: neptune.yaml:L1.
run experiment 🚀

neptune run --config neptune.yaml main.py train_evaluate_predict --pipeline_name lightGBM

Solution Overview

New features 📓

We have added substantial number of group-by based featurs. The recipe for extracting them is showed here: aggregation_recipes Transformed that calculated features is implemented here: GroupbyAggregations