<div align='center'>
    <h1>AutoML Tutorial</h1>
    <img src='https://github.com/vopani/fortyone/blob/main/images/automl_banner_530_x_455.png?raw=true'>
</div>

**Auto**mated **M**achine **L**earning (**AutoML**) has become widely adopted for building, experimenting and productionizing various types of machine learning models across business use-cases.

There are different open source solutions available and this notebook explores a simple baseline solution for some of them on the [Kaggle TPS (May 2021) competition](https://www.kaggle.com/c/tabular-playground-series-may-2021).

* [Auto-Sklearn](#Auto-Sklearn)
* [H2O AutoML](#H2O-AutoML)

In [None]:
## define configuration
PATH_TRAIN = '../input/tabular-playground-series-may-2021/train.csv'
PATH_TEST = '../input/tabular-playground-series-may-2021/test.csv'

PATH_AUTOSKLEARN_SUBMISSION = 'submission_autosklearn.csv'
PATH_H2OAML_SUBMISSION = 'submission_h2oaml.csv'

## Auto-Sklearn
[auto-sklearn](https://automl.github.io/auto-sklearn) is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator

In [None]:
## install package
!curl https://raw.githubusercontent.com/automl/auto-sklearn/master/requirements.txt | xargs -n 1 -L 1 pip3 install
!pip3 install auto-sklearn

In [None]:
## import packages
import pandas as pd

from autosklearn.classification import AutoSklearnClassifier
from autosklearn.metrics import log_loss

In [None]:
## prepare data
train = pd.read_csv(PATH_TRAIN)
test = pd.read_csv(PATH_TEST)

target = train.target.values
train.drop(['id', 'target'], axis=1, inplace=True)

In [None]:
## run model
autosklearnml = AutoSklearnClassifier(
    time_left_for_this_task=600,
    metric=log_loss,
    scoring_functions=[log_loss]
)

autosklearnml.fit(X=train, y=target, dataset_name='tps_may_2021')

In [None]:
## check statistics
print(autosklearnml.sprint_statistics())

In [None]:
## generate predictions
preds_autosklearnml = autosklearnml.predict_proba(test[train.columns])

In [None]:
## create submission
submission = pd.concat([
    pd.DataFrame({'id': test.id}),
    pd.DataFrame(preds_autosklearnml, columns=autosklearnml.classes_)
], axis=1)

submission.head()

In [None]:
## save submission
submission.to_csv(PATH_AUTOSKLEARN_SUBMISSION, index=False)

This is just a baseline submission over which a lot of improvements can be made. You can read more about Auto-Sklearn's workflow, settings, hyperparameters, optimizations and more here:

* [Documentation of auto-sklearn](https://automl.github.io/auto-sklearn)
* [Deep dive of auto-sklearn](https://github.com/vopani/fortyone#automl-series-)

## H2O AutoML
<img src='https://docs.h2o.ai/h2o/latest-stable/h2o-docs/_images/h2o-automl-logo.jpg' width='150px'>

[H2O AutoML](https://www.h2o.ai/products/h2o-automl) is an automated machine learning library open sourced by [H2O.ai](https://h2o.ai)

In [None]:
## import packages
import pandas as pd

import h2o
from h2o.automl import H2OAutoML

In [None]:
## prepare data
h2o.init()

h2o_train = h2o.import_file(PATH_TRAIN)
h2o_test = h2o.import_file(PATH_TEST)

h2o_train['target'] = h2o_train['target'].asfactor()

In [None]:
## run model
features = [x for x in h2o_train.columns if x not in ['id', 'target']]

h2oaml = H2OAutoML(
    max_runtime_secs=600,
    stopping_metric='logloss',
    sort_metric='logloss'
)

h2oaml.train(x=features, y='target', training_frame=h2o_train)

In [None]:
## check leaderboard
h2oaml.leaderboard

In [None]:
## generate predictions
preds_h2oaml = h2oaml.leader.predict(h2o_test)

In [None]:
## create submission
submission = pd.concat([
    pd.DataFrame({'id': h2o_test['id'].as_data_frame().id}),
    preds_h2oaml.as_data_frame().drop('predict', axis=1)
], axis=1)

submission.head()

In [None]:
## save submission
submission.to_csv(PATH_H2OAML_SUBMISSION, index=False)

This is just a baseline submission over which a lot of improvement can be made. You can read more about H2O AutoML's workflow, settings, hyperparameters, interpretability and more here:

* [Documentation of H2O AutoML](https://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html)
* [Deep dive of H2O AutoML](https://github.com/vopani/fortyone#automl-series-)

## Similar Tutorials
Similar tutorials on other Kaggle TPS competitions are published here:

* [AutoML Tutorial: TPS (January 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-january-2021)
* [AutoML Tutorial: TPS (February 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-february-2021)
* [AutoML Tutorial: TPS (March 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-march-2021)
* [AutoML Tutorial: TPS (April 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-april-2021)
* [AutoML Tutorial: TPS (June 2021)](https://www.kaggle.com/rohanrao/automl-tutorial-tps-june-2021)