Skip to content
This repository has been archived by the owner on Jun 22, 2022. It is now read-only.

Sklearn and XGBoost algorithms and groupby features

Kamil A. Kaczmarek edited this page Jul 11, 2018 · 7 revisions

Seedling 🌱

🌱 code

Run experiment!

Random search in the hyper-parameters space with XGBoost model

  • Put correct paths to data: neptune_random_search.yaml:L21.
  • Put your neptune.ml user-name (you specify it upon registration πŸ˜‰) in the first line: neptune_random_search.yaml:L1.
  • Consider your own ranges of hyper-parameters and number of runs. We put some reasonable values, hence setup as-is should give you decent results.
  • Run experiment πŸš€
neptune run --config neptune_random_search.yaml main.py train_evaluate_predict --pipeline_name XGBoost

How to train other models?

Once you put your paths to data and neptune.ml user-name, you have five end-to-end pipelines available:

the model run command
lightGBM neptune run --config neptune_random_search.yaml main.py train_evaluate_predict --pipeline_name lightGBM
XGBoost neptune run --config neptune_random_search.yaml main.py train_evaluate_predict --pipeline_name XGBoost
random_forest neptune run --config neptune_random_search.yaml main.py train_evaluate_predict --pipeline_name random_forest
log_reg neptune run --config neptune_random_search.yaml main.py train_evaluate_predict --pipeline_name log_reg
svc neptune run --config neptune_random_search.yaml main.py train_evaluate_predict --pipeline_name svc

'log_reg' is logistic regression and 'svc' is Support Vector Classifier

All of them are parameterizable in the neptune_random_search.yaml, so you can easily run multiple experiments πŸ˜ƒ

Single run

neptune run --config neptune.yaml main.py train_evaluate_predict --pipeline_name lightGBM

Solution Overview

New features πŸ““

We have added substantial number of group-by based featurs. The recipe for extracting them is showed here: aggregation_recipes Transformed that calculated features is implemented here: GroupbyAggregations

New models πŸ’»

XGBoost model is defined in the models.py:L12 and its parameters are all listed in the configuration file:

  1. random search: neptune_random_search.yaml:L71
  2. single run: neptune.yaml:L71

Besides this, your have three sklearn-based algorithm available, namely: random forest, logistic regression and support Vector Classifier.