# MLJAR AutoML 

MLJAR is an Automated Machine Learning framework. It is available as Python package with code at GitHub: https://github.com/mljar/mljar-supervised

The MLJAR AutoML can work in several modes:
- Explain - ideal for initial data exploration
- Perform - perfect for production-level ML systems
- Compete - mode for ML competitions under restricted time budget. By the default, it performs advanced feature engineering like golden features search, kmeans features, feature selection. It does model stacking.
- Optuna - uses Optuna to highly tune algorithms: Random Forest, Extra Trees, Xgboost, LightGBM, CatBoost, Neural Network. Each algorithm is tuned with `Optuna` hperparameters framework with selected time budget (controlled with `optuna_time_budget`). By the default feature engineering is not enabled (you need to manually swtich it on, in AutoML() parameter).


## Explain

The example useage of `Explain` with `MLJAR`:

```python

automl = AutoML(mode="Explain")
automl.fit(X, y)
```

The best choice to get initial information about your data. This mode will produce a lot of explanations for your data. All details can be viewed in the Notebook by calling the `automl.report()` method.


## Compete

The example useage of `Compete` with `MLJAR`:

```python

automl = AutoML(mode="Compete",
                total_time_limit=8*3600)
automl.fit(X, y)
```

That's it. It will train: Random Forest, Extra Trees, Xgboost, LightGBM, CatBoost, Neural Network, Ensemble, and stack all the models. Feature engineering will be applied (if enough training time). 


## Optuna

The example useage of `Optuna` with `MLJAR`:

```python

automl = AutoML(mode="Optuna", 
                optuna_time_budget=1800, 
                optuna_init_params={}, 
                algorithms=["LightGBM", "Xgboost", "Extra Trees"], 
                total_time_limit=24*3600)
automl.fit(X, y)
```

Description of parameters:
- `optuna_time_budget` - time budget for `Optuna` to tune each algorithm,
- `optuna_init_params` - if you have precomputed parameters for `Optuna` they can be passed here, then for already optimized models `Optuna` will not be used.
- `algorithms` - the algorithms that we will check,
- `total_time_limit` - the total time limit for AutoML training.

(In the `Optuna` mode, only first fold is used for model tuning.)

---

MLJAR GitHub: https://github.com/mljar/mljar-supervised

<img src="https://raw.githubusercontent.com/mljar/visual-identity/main/media/kaggle_banner_white.png" style="width: 70%;"/>

In [1]:
!pip install -q -U git+https://github.com/mljar/mljar-supervised.git@dev

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
yellowbrick 1.3.post1 requires numpy<1.20,>=1.16.0, but you have numpy 1.20.2 which is incompatible.
tensorflow 2.4.1 requires numpy~=1.19.2, but you have numpy 1.20.2 which is incompatible.
pdpbox 0.2.1 requires matplotlib==3.1.1, but you have matplotlib 3.4.1 which is incompatible.
mxnet 1.8.0.post0 requires graphviz<0.9.0,>=0.8.1, but you have graphviz 0.16 which is incompatible.
matrixprofile 1.1.10 requires protobuf==3.11.2, but you have protobuf 3.15.8 which is incompatible.
distributed 2021.4.0 requires cloudpickle>=1.5.0, but you have cloudpickle 1.3.0 which is incompatible.
autogluon-core 0.1.0 requires graphviz<0.9.0,>=0.8.1, but you have graphviz 0.16 which is incompatible.
autogluon-core 0.1.0 requires numpy==1.19.5, but you have numpy 1.20.2 which is incompatible.
autogluon-core 0.1.0 re

In [2]:
import numpy as np
import pandas as pd
from supervised.automl import AutoML # mljar-supervised

In [3]:
train = pd.read_csv("../input/tabular-playground-series-may-2021/train.csv")
test = pd.read_csv("../input/tabular-playground-series-may-2021/test.csv")

In [4]:
train.head()

Unnamed: 0,id,feature_0,feature_1,feature_2,feature_3,feature_4,feature_5,feature_6,feature_7,feature_8,...,feature_41,feature_42,feature_43,feature_44,feature_45,feature_46,feature_47,feature_48,feature_49,target
0,0,0,0,1,0,1,0,0,0,0,...,0,0,21,0,0,0,0,0,0,Class_2
1,1,0,0,0,0,2,1,0,0,0,...,0,0,0,0,0,0,0,0,0,Class_1
2,2,0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,13,2,0,Class_1
3,3,0,0,0,0,0,0,0,3,0,...,0,0,0,0,0,0,0,1,0,Class_4
4,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,Class_2


In [5]:
x_cols = train.columns[1:-1].tolist()
y_col = train.columns[-1]

In [6]:
automl = AutoML(
    mode="Compete", 
    total_time_limit=4*3600
)
automl.fit(train[x_cols], train[y_col])

Linear algorithm was disabled.
AutoML directory: AutoML_1
The task is multiclass_classification with evaluation metric logloss
AutoML will use algorithms: ['Decision Tree', 'Random Forest', 'Extra Trees', 'LightGBM', 'Xgboost', 'CatBoost', 'Neural Network', 'Nearest Neighbors']
AutoML will stack models
AutoML will ensemble availabe models
AutoML steps: ['adjust_validation', 'simple_algorithms', 'default_algorithms', 'not_so_random', 'golden_features', 'kmeans_features', 'insert_random_feature', 'features_selection', 'hill_climbing_1', 'hill_climbing_2', 'boost_on_errors', 'ensemble', 'stack', 'ensemble_stacked']
* Step adjust_validation will try to check up to 1 model
1_DecisionTree logloss 1.114614 trained in 2.28 seconds
Adjust validation. Remove: 1_DecisionTree
Validation strategy: 10-fold CV Shuffle,Stratify
* Step simple_algorithms will try to check up to 3 models
1_DecisionTree logloss 1.114737 trained in 19.17 seconds
2_DecisionTree logloss 1.114112 trained in 19.69 seconds
3_De

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

29_CatBoost_KMeansFeatures logloss 1.092307 trained in 420.66 seconds


OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

6_Default_CatBoost_KMeansFeatures logloss 1.093256 trained in 353.57 seconds
Not enough time to perform features selection. Skip
Time needed for features selection ~ 1835.0 seconds
Please increase total_time_limit to at least (18406 seconds) to have features selection
Skip insert_random_feature because no parameters were generated.
Skip features_selection because no parameters were generated.
* Step hill_climbing_1 will try to check up to 34 models
69_CatBoost logloss 1.091727 trained in 297.81 seconds
70_CatBoost logloss 1.091684 trained in 300.41 seconds
71_CatBoost_GoldenFeatures logloss 1.091889 trained in 295.75 seconds
72_CatBoost_GoldenFeatures logloss 1.091668 trained in 298.45 seconds


OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

73_CatBoost logloss 1.092827 trained in 385.02 seconds


OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

OSError: /opt/conda/lib/python3.7/site-packages/numpy.libs/libopenblasp-r0-09e95953.3.13.so: cannot open shared object file: No such file or directory

74_CatBoost logloss 1.092619 trained in 389.64 seconds
* Step hill_climbing_2 will try to check up to 32 models
75_CatBoost_GoldenFeatures logloss 1.091919 trained in 410.12 seconds
76_CatBoost_GoldenFeatures logloss 1.091947 trained in 247.31 seconds
77_CatBoost logloss 1.091748 trained in 381.81 seconds
78_CatBoost logloss 1.091851 trained in 229.53 seconds
* Step boost_on_errors will try to check up to 1 model
72_CatBoost_GoldenFeatures_BoostOnErrors logloss 1.094246 trained in 257.73 seconds
* Step ensemble will try to check up to 1 model
Ensemble logloss 1.090917 trained in 138.88 seconds
* Step stack will try to check up to 37 models
72_CatBoost_GoldenFeatures_Stacked logloss 1.090532 trained in 254.17 seconds
15_Xgboost_Stacked logloss 1.09152 trained in 384.44 seconds
20_LightGBM_Stacked logloss 1.090562 trained in 199.19 seconds
39_RandomForest_Stacked not trained. Stop training after the first fold. Time needed to train on the first fold 251.0 seconds. The time estimate for t

AutoML(mode='Compete', total_time_limit=14400)

In [7]:
preds = automl.predict_proba(test)

In [8]:
sub = pd.read_csv("../input/tabular-playground-series-may-2021/sample_submission.csv")
sub[sub.columns[1:]] = preds

In [9]:
sub.to_csv("1_submission.csv", index=False)

In [10]:
automl.report()