MLBox, Machine Learning Box

MLBox is a powerful Automated Machine Learning python library. It provides the following features:

Fast reading and distributed data preprocessing/cleaning/formatting
Highly robust feature selection and leak detection
Accurate hyper-parameter optimization in high-dimensional space
State-of-the art predictive models for classification and regression (Deep Learning, Stacking, LightGBM,...)
Prediction with models interpretation

To get it installed, please refer to https://github.com/AxeldeRomblay/MLBox/blob/master/python-package/README.md

For more details, please refer to docs

Experiments : https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries/leaderboard | Rank : 85/2488

Getting started: 30 seconds to MLBox

MLBox main package is divided into 3 sub-packages : preprocessing, optimisation and prediction. Each one of them are respectively aimed at reading and preprocessing data, testing and optimising a wide range of learners and predicting the target on a test dataset.

Here are a few lines to import the MLBox:

from mlbox.preprocessing import *
from mlbox.optimisation import *
from mlbox.prediction import *

Then, all you need to give is :

the list of paths to your train datasets and test datasets
the name of the target you try to predict (classification or regression)

paths = ["<file_1>.csv", "<file_2>.csv", ..., "<file_n>.csv"] #to modify
target_name = "<my_target>" #to modify

Now, let the MLBox do the job !

... to read and preprocess your files :

data = Reader(sep=",").train_test_split(paths, target_name)  #reading
data = Drift_thresholder().fit_transform(data)  #deleting non-stable variables

... to evaluate models (here default configuration):

Optimiser().evaluate(None, data)

... or to test and optimize the whole Pipeline [OPTIONAL]:

missing data encoder, aka 'ne'
categorical variables encoder, aka 'ce'
feature selector, aka 'fs'
meta-features stacker, aka 'stck'
final estimator, aka 'est'

NB : please have a look at all the possibilities you have to configure the Pipeline (steps, parameters and values...)

space = {
        'ne__numerical_strategy' : {"search":"choice", "space":[0, 'mean']},
                              
        'ce__strategy' : {"search":"choice", "space":["label_encoding", "random_projection"]},
                          
        'fs__strategy' : {"search":"choice", "space":["variance", "l1"]},
        'fs__threshold': {"search":"choice", "space":[0.1,0.2,0.3]},             
        
        'est__strategy' : {"search":"choice", "space":["XGBoost"]},
        'est__max_depth' : {"search":"choice", "space":[5,6]},
        'est__subsample' : {"search":"uniform", space":[0.6,0.9]}
        }
        
best = opt.optimise(space, data, max_evals = 5)

... finally to predict on the test set with the best parameters (or None for default configuration):

Predictor().fit_predict(best, data)

That's all ! You can have a look at the folder "save" where you can find :

your predictions
feature importances
drift coefficients of your variables (0.5 = very stable, 1. = not stable at all)

How to Contribute

MLBox has been developed and used by many active community members. Your help is very valuable to make it better for everyone.

Check out call for contributions to see what can be improved, or open an issue if you want something.
Contribute to the tests to make it more reliable.
Contribute to the documents to make it clearer for everyone.
Contribute to the examples to share your experience with other users.
Open issue if you met problems during development.

For more details, please refer to CONTRIBUTING.

Name		Name	Last commit message	Last commit date
Latest commit History 417 Commits
docker		docker
docs		docs
examples		examples
python-package		python-package
tests		tests
.gitignore		.gitignore
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.rst		HISTORY.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
logo.png		logo.png
tox.ini		tox.ini
travis_pypi_setup.py		travis_pypi_setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MLBox, Machine Learning Box

Getting started: 30 seconds to MLBox

How to Contribute

About

Releases

Packages

Languages

License

mathematixy/MLBox

Folders and files

Latest commit

History

Repository files navigation

MLBox, Machine Learning Box

Getting started: 30 seconds to MLBox

How to Contribute

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages