Skip to content
missing blocks for sklearn pipelines
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
doc fixed the docs on metamodles (#129) May 6, 2019
images rename Mar 1, 2019
notebooks Debug pipeline (#65) Mar 22, 2019
sklego debug pipeline tests fixs for new sklearn version (#140) May 22, 2019
.appveyor.yml added appveyor Mar 1, 2019
.flake8 Add EstimatorTransformer and minor api touchups (#33) Mar 17, 2019
.gitignore Readthedocs (#86) Mar 29, 2019
.travis.yml sklearn update fix (#134) May 20, 2019 Create Mar 1, 2019
LICENSE Create LICENSE Mar 1, 2019
Makefile sklearn update fix (#134) May 20, 2019
autograd.ipynb Update May 6, 2019 codecov (#126) Apr 29, 2019

Build status Documentation Status


We love scikit learn but very often we find ourselves writing custom transformers, metrics and models. The goal of this project is to attempt to consolidate these into a package that offers code quality/testing. This project is a collaboration between multiple companies in the Netherlands. Note that we're not formally affiliated with the scikit-learn project at all.


Install scikit-lego via pip with

pip install scikit-lego

Alternatively, to edit and contribute you can fork/clone and run:

pip install -e ".[dev]"
python develop


The documentation can be found here.


from sklego.preprocessing import RandomAdder
from sklego.mixture import GMMClassifier

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline


mod = Pipeline([
    ("scale", StandardScaler()),
    ("random_noise", RandomAdder()),
    ("model", GMMClassifier())



Here's a list of features that this library currently offers:

  • sklego.preprocessing.PatsyTransformer applies a patsy formula
  • sklego.preprocessing.RandomAdder adds randomness in training
  • sklego.preprocessing.PandasTypeSelector selects columns based on pandas type
  • sklego.preprocessing.ColumnSelector selects columns based on column name
  • sklego.dummy.RandomRegressor benchmark that predicts random values
  • sklego.mixture.GMMClassifier classifies by training a GMM per class
  • sklego.mixture.GMMOutlierDetector detects outliers based on a trained GMM
  • sklego.pandas_utils.log_step a simple logger-decorator for pandas pipeline steps
  • sklego.pandas_utils.add_lags adds lag values of certain columns in pandas
  • sklego.pipeline.DebugPipeline adds debug information to make debugging easier
  • sklego.meta.GroupedEstimator can split the data into runs and run a model on each
  • sklego.meta.EstimatorTransformer adds a model output as a feature
  • sklego.datasets.load_chicken loads in the joyful chickweight dataset

New Features

We want to be rather open here in what we accept but we do demand three things before they become added to the project:

  1. any new feature contributes towards a demonstratable real-world usecase
  2. any new feature passes standard unit tests (we have a few for transformers and predictors)
  3. the feature has been discussed in the issue list beforehand
You can’t perform that action at this time.