Skip to content
This repository has been archived by the owner on Jun 22, 2022. It is now read-only.

Sklearn and XGBoost algorithms and groupby features

Kamil A. Kaczmarek edited this page Jun 20, 2018 · 7 revisions

Run experiment!

Random search in the hyper-parameters space with XGBoost model

  • Put correct paths to data: neptune_random_search.yaml:L21.
  • Put your neptune user-name in the first line: neptune_random_search.yaml:L1.
  • Consider your own ranges of hyper-parameters and number of runs. We put some reasonable values, hence setup as-is should give you decent results.
  • neptune run --config neptune_random_search.yaml main.py train_evaluate_predict --pipeline_name XGBoost

How to train other models?

Once you put your paths to data and neptune user-name (you specify it upon registration) in the neptune_random_search.yaml you have five end-to-end pipelines available:

the model run command
lightGBM neptune run --config neptune_random_search.yaml main.py train_evaluate_predict --pipeline_name lightGBM
XGBoost neptune run --config neptune_random_search.yaml main.py train_evaluate_predict --pipeline_name XGBoost
random_forest neptune run --config neptune_random_search.yaml main.py train_evaluate_predict --pipeline_name random_forest
log_reg neptune run --config neptune_random_search.yaml main.py train_evaluate_predict --pipeline_name log_reg
svc neptune run --config neptune_random_search.yaml main.py train_evaluate_predict --pipeline_name svc

'log_reg' is logistic regression and 'svc' is Support Vector Classifier

All of them are parameterizable in the neptune_random_search.yaml file, so you can easily run multiple experiments 😃

Single run

  • Make sure that you put correct paths to data: neptune.yaml:L21
  • neptune run --config neptune.yaml main.py train_evaluate_predict --pipeline_name lightGBM

In both cases pipeline is called lightGBM

Solution Overview

New features

We have added substantial number of group-by based featurs. The recipe for extracting them is showed here: aggregation_recipes Transformed that calculated features is implemented here: GroupbyAggregations

New models

XGBoost model is defined in the models.py:L12 and its parameters are all listed in the configuration file:

  1. random search: neptune_random_search.yaml:L71
  2. single run: neptune.yaml:L71