<div align='center'>
    <h1>AutoML Tutorial</h1>
    <img src='https://github.com/vopani/fortyone/blob/main/images/automl_banner_530_x_455.png?raw=true'>
</div>

**Auto**mated **M**achine **L**earning (**AutoML**) has become widely adopted for building, experimenting and productionizing various types of machine learning models across business use-cases.

There are different open source solutions available and this notebook explores a simple baseline solution for some of them on the [Kaggle TPS (December 2021) competition](https://www.kaggle.com/c/tabular-playground-series-dec-2021).

* [AutoGluon](#AutoGluon)
* [EvalML](#EvalML)
* [FLAML](#FLAML)
* [H2O AutoML](#H2O-AutoML)
* [LightAutoML](#LightAutoML)
* [MLJAR](#MLJAR)

In [None]:
## define configuration
PATH_TRAIN = '../input/tabular-playground-series-dec-2021/train.csv'
PATH_TEST = '../input/tabular-playground-series-dec-2021/test.csv'

PATH_AUTOGLUON_SUBMISSION = 'submission_autogluon.csv'
PATH_EVALML_SUBMISSION = 'submission_evalml.csv'
PATH_FLAML_SUBMISSION = 'submission_flaml.csv'
PATH_H2OAML_SUBMISSION = 'submission_h2oaml.csv'
PATH_LAML_SUBMISSION = 'submission_laml.csv'
PATH_MLJAR_SUBMISSION = 'submission_mljar.csv'

MAX_MODEL_RUNTIME_MINS = 5
MAX_MODEL_RUNTIME_SECS = MAX_MODEL_RUNTIME_MINS * 60

In [None]:
## prepare data
import gc
import os
import shutil

import datatable as dt
import numpy as np
from pathlib import Path
import warnings

warnings.filterwarnings('ignore')

train = dt.fread(PATH_TRAIN)[:100000, :]
test = dt.fread(PATH_TEST)

target = train['Cover_Type'].to_numpy().ravel().astype(object)
test_ids = test['Id']

del train[:, ['Id', 'Cover_Type']]
test = test[:, train.names]

## AutoGluon
<img src='https://user-images.githubusercontent.com/16392542/77208906-224aa500-6aba-11ea-96bd-e81806074030.png' width='250px'>

[AutoGluon](https://auto.gluon.ai/stable/index.html) is an automl library open sourced by [Amazon](http://amazon.com/aws)

In [None]:
## install packages
!python3 -m pip install -q "mxnet<2.0.0"
!python3 -m pip install -q autogluon
!python3 -m pip install -q -U graphviz
!python3 -m pip install -q -U scikit-learn

In [None]:
## import packages
from autogluon.tabular import TabularPredictor

In [None]:
## run model
train['target'] = dt.Frame(target)

model_autogluon = TabularPredictor(label='target')
model_autogluon.fit(train_data=train.to_pandas(), excluded_model_types=['KNN'], time_limit=MAX_MODEL_RUNTIME_SECS)

del train['target']

In [None]:
## check leaderboard
model_autogluon.leaderboard()

In [None]:
## generate predictions
preds_autogluon = model_autogluon.predict(test.to_pandas())

In [None]:
## create submission
submission = dt.Frame(Id=test_ids, Cover_Type=dt.Frame(preds_autogluon))
submission.head()

In [None]:
## save submission
submission.to_csv(PATH_AUTOGLUON_SUBMISSION)

In [None]:
## clear memory
shutil.rmtree('AutogluonModels')
del model_autogluon

gc.collect()

Read more in [Documentation of AutoGluon](https://auto.gluon.ai/stable/index.html)

## EvalML
<img src='https://evalml.alteryx.com/en/stable/_images/evalml_horizontal.svg' width='250px'>

[EvalML](https://evalml.alteryx.com/en/stable) is an automl library open sourced by [Alteryx](http://www.alteryx.com)

In [None]:
## install packages
!python3 -m pip install -q evalml==0.28.0

In [None]:
## import packages
from evalml.automl import AutoMLSearch

In [None]:
## run model
model_evalml = AutoMLSearch(X_train=train.to_pandas(), y_train=target, problem_type='multiclass', objective='accuracy multiclass', max_time=MAX_MODEL_RUNTIME_SECS)
model_evalml.search()

In [None]:
## check leaderboard
model_evalml.rankings

In [None]:
## generate predictions
preds_evalml = model_evalml.best_pipeline.predict(test.to_pandas())

In [None]:
## create submission
submission = dt.Frame(Id=test_ids, Cover_Type=dt.Frame(preds_evalml))
submission.head()

In [None]:
## save submission
submission.to_csv(PATH_EVALML_SUBMISSION)

In [None]:
## clear memory
os.remove('evalml_debug.log')
del model_evalml

gc.collect()

Read more in [Documentation of EvalML](https://evalml.alteryx.com)

## FLAML
<img src='https://github.com/microsoft/FLAML/raw/main/docs/images/FLAML.png' width='150px'>

[FLAML](https://microsoft.github.io/FLAML) is a fast and light automl library open sourced by [Microsoft](https://opensource.microsoft.com)

In [None]:
## install packages
!python3 -m pip install -q flaml
!python3 -m pip install -q -U graphviz
!python3 -m pip install -q -U scikit-learn

In [None]:
## import packages
from flaml import AutoML

In [None]:
## run model
model_flaml = AutoML()
model_flaml.fit(X_train=train.to_pandas(), y_train=target, time_budget=MAX_MODEL_RUNTIME_SECS)

In [None]:
## generate predictions
preds_flaml = model_flaml.predict(test.to_pandas())

In [None]:
## create submission
submission = dt.Frame(Id=test_ids, Cover_Type=preds_flaml.astype(int))
submission.head()

In [None]:
## save submission
submission.to_csv(PATH_FLAML_SUBMISSION)

In [None]:
## clear memory
del model_flaml

gc.collect()

Read more in [Documentation of FLAML](https://microsoft.github.io/FLAML)

## H2O AutoML
<img src='https://docs.h2o.ai/h2o/latest-stable/h2o-docs/_images/h2o-automl-logo.jpg' width='150px'>

[H2O AutoML](https://www.h2o.ai/products/h2o-automl) is an automated machine learning library open sourced by [H2O.ai](https://h2o.ai)

In [None]:
## import packages
import h2o
from h2o.automl import H2OAutoML

In [None]:
## prepare data
h2o.init()

h2o_train = h2o.H2OFrame(train.to_pandas())
h2o_test = h2o.H2OFrame(test.to_pandas())

h2o_train['target'] = h2o.H2OFrame(target).asfactor()

In [None]:
## run model
features = [x for x in h2o_train.columns if x != 'target']

model_h2o = H2OAutoML(stopping_metric='misclassification', max_runtime_secs=MAX_MODEL_RUNTIME_SECS)
model_h2o.train(x=features, y='target', training_frame=h2o_train)

In [None]:
## check leaderboard
model_h2o.leaderboard

In [None]:
## generate predictions
preds_h2o = model_h2o.leader.predict(h2o_test).as_data_frame()['predict']

In [None]:
## create submission
submission = dt.Frame(Id=test_ids, Cover_Type=dt.Frame(preds_h2o))
submission.head()

In [None]:
## save submission
submission.to_csv(PATH_H2OAML_SUBMISSION)

In [None]:
## clear memory
h2o.cluster().shutdown()
del model_h2o

gc.collect()

Read more in [Documentation of H2O AutoML](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html)

## LightAutoML
<img src='https://github.com/sberbank-ai-lab/LightAutoML/blob/master/imgs/LightAutoML_logo_small.png?raw=true' width='150px'>

[LightAutoML](https://github.com/sberbank-ai-lab/LightAutoML) is a framework for automatic classification and regression model creation open sourced by [Sberbank](https://www.sberbank.com) AI Lab.

In [None]:
## install packages
!python3 -m pip install -q lightautoml
!python3 -m pip install -q -U torch
!python3 -m pip install -q -U torchvision

In [None]:
## import packages
from lightautoml.automl.presets.tabular_presets import TabularAutoML
from lightautoml.tasks import Task

In [None]:
## run model
train['target'] = dt.Frame(target)

model_laml = TabularAutoML(task = Task('multiclass'), timeout = MAX_MODEL_RUNTIME_SECS)
model_laml.fit_predict(train_data=train.to_pandas(), roles={'target': 'target'})

del train['target']

In [None]:
## generate predictions
preds_laml = np.vectorize({v: k for k, v in model_laml.reader.class_mapping.items()}.get)(np.argmax(model_laml.predict(test.to_pandas()).data, axis=1))

In [None]:
## create submission
submission = dt.Frame(Id=test_ids, Cover_Type=dt.Frame(preds_laml))
submission.head()

In [None]:
## save submission
submission.to_csv(PATH_LAML_SUBMISSION)

In [None]:
## clear memory
del model_laml

gc.collect()

Read more in [Documentation of LightAutoML](https://lightautoml.readthedocs.io/en/latest/index.html)

## MLJAR
<img src='https://mljar.com/images/logo/mljar_circle3.svg' width='150px'>

[MLJAR](https://mljar.com) is an automated machine learning tool for tabular data

In [None]:
## install packages
!pip install -U --ignore-installed mljar-supervised
!python3 -m pip install -q -U graphviz

In [None]:
## import packages
from supervised import AutoML

In [None]:
## run model
model_mljar = AutoML(eval_metric='accuracy', total_time_limit=MAX_MODEL_RUNTIME_SECS, results_path='./model_mljar')
model_mljar.fit(X=train.to_pandas(), y=target)

In [None]:
## check leaderboard
model_mljar.get_leaderboard()

In [None]:
## generate predictions
preds_mljar = model_mljar.predict(test.to_pandas())

In [None]:
## create submission
submission = dt.Frame(Id=test_ids, Cover_Type=dt.Frame(preds_mljar))
submission.head()

In [None]:
## save submission
submission.to_csv(PATH_MLJAR_SUBMISSION)

In [None]:
## clear memory
shutil.rmtree('model_mljar')

del model_mljar

gc.collect()

Read more in [Documentation of MLJAR](https://supervised.mljar.com)

## Similar Tutorials
Similar tutorials on other Kaggle TPS competitions are published here:

* [AutoML Tutorial: TPS (January 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-january-2021)
* [AutoML Tutorial: TPS (February 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-february-2021)
* [AutoML Tutorial: TPS (March 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-march-2021)
* [AutoML Tutorial: TPS (April 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-april-2021)
* [AutoML Tutorial: TPS (May 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-may-2021)
* [AutoML Tutorial: TPS (June 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-june-2021)
* [AutoML Tutorial: TPS (July 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-july-2021)
* [AutoML Tutorial: TPS (August 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-august-2021)
* [AutoML Tutorial: TPS (September 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-september-2021)
* [AutoML Tutorial: TPS (October 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-october-2021)
* [AutoML Tutorial: TPS (November 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-november-2021)