__Xgboost__ is one of the most popular machine-learning algorithms but the number of possible parameter combinations goes towards infinity:
- Booster: gbtree, gblinear or dart; gbtree and dart 
- disable_default_eval_metric 
- eta [default=0.3, alias: learning_rate]
- gamma [default=0, alias: min_split_loss]
- max_depth [default=6]
- min_child_weight [default=1]
- max_delta_step [default=0]
- subsample [default=1]
- colsample_bytree 
- colsample_bylevel 
- colsample_bynode 
- lambda [default=1, alias: reg_lambda]
- alpha [default=0, alias: reg_alpha]
- tree_method string [default= auto
- sketch_eps [default=0.03]
- scale_pos_weight [default=1]
- refresh_leaf [default=1]
- process_type [default= default]
- grow_policy [default= depthwise]
- max_leaves [default=0]
- max_bin, [default=256]
- sample_type [default= uniform]
- normalize_type [default= tree]
- rate_drop [default=0.0]
- one_drop [default=0]
- skip_drop [default=0.0]
- updater [default= shotgun]



 - [BigML](https://bigml.com/)
 - [H2O.ai](http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html)
 - [rapidminer](https://rapidminer.com/products/go/)
 - [DataRobot](https://www.datarobot.com/solutions/data-scientists/)
 - [Microsoft Azure](https://azure.microsoft.com/en-us/services/machine-learning/automatedml/)
 - [Google Cloud AutoML](https://cloud.google.com/automl)
 - [Amazon AutoML](https://aws.amazon.com/blogs/machine-learning/code-free-machine-learning-automl-with-autogluon-amazon-sagemaker-and-aws-lambda/)

A competitor from Zurich:
 - [Modulus.ai](https://www.modulos.ai/)

<img alt="" caption="Bayesian Optimization: surrogate function (black, blue) and acquisition function (green)" 
id="bayesian_optimization" src="../images/image4.png" width="320" height="320">


<img alt="" caption="Auto-Sklearn" 
id="auto-sklearn" src="../images/image3.png" width="720" height="520">


__SMAC__ (sequential model-based algorithm configuration)

 - Data Set gets divided into n folds
 - For each fold, characteristics of the data are determined and a signature for this fold is calculated with PCA
 - A hyperparameter configuration applied to a fold leads to the following result c (cost) `[h1, h2, h3, h4, h5][s1, s2, s3] -> c`
 - Initially, random combinations of hyperparameters and data folds are evaluated to obtain measurement points
 - For these combinations random forests are calculated
 - New configurations (candidates) are combined with all data-folds signatures and classified by the random forest
 - The predictions of the end-leaves of the random forest are averaged over all data-fold signatures and these results are summed up over all trees in the forest. This results in mean values and variances that are used in the acquisition function (max. objective, min uncertainty).
 - In this way, many different parameter combinations can be tested without having to teach the actual ML algorithm with the new parameter configurations.
 - The hyperparameter combinations with the highest values in the acquisition function are tested against the incumbent (best combination so far) on the ML algorithm. Thus, new measuring points are created and the random forest is relearned.


[install auto-sklearn](https://automl.github.io/auto-sklearn/master/installation.html)

In [1]:
# curl https://raw.githubusercontent.com/automl/auto-sklearn/master/requirements.txt | xargs -n 1 -L 1 pip3 install
! cat auto-sklearn-requirements.txt | xargs -n 1 -L 1 pip3 install

Collecting dask
  Using cached dask-2021.3.0-py3-none-any.whl (925 kB)
Collecting pyyaml
  Downloading PyYAML-5.4.1-cp37-cp37m-manylinux1_x86_64.whl (636 kB)
[K     |████████████████████████████████| 636 kB 6.2 MB/s eta 0:00:01
[?25hInstalling collected packages: pyyaml, dask
Successfully installed dask-2021.3.0 pyyaml-5.4.1
Collecting distributed>=2.2.0
  Using cached distributed-2021.3.0-py3-none-any.whl (675 kB)
Collecting msgpack>=0.6.0
  Downloading msgpack-1.0.2-cp37-cp37m-manylinux1_x86_64.whl (273 kB)
[K     |████████████████████████████████| 273 kB 4.3 MB/s eta 0:00:01
Collecting psutil>=5.0
  Downloading psutil-5.8.0-cp37-cp37m-manylinux2010_x86_64.whl (296 kB)
[K     |████████████████████████████████| 296 kB 7.1 MB/s eta 0:00:01
Collecting toolz>=0.8.2
  Using cached toolz-0.11.1-py3-none-any.whl (55 kB)
Collecting cloudpickle>=1.5.0
  Using cached cloudpickle-1.6.0-py3-none-any.whl (23 kB)
Collecting click>=6.6
  Using cached click-7.1.2-py2.py3-none-any.whl (82 kB)
Col

Building wheels for collected packages: smac, lazy-import
  Building wheel for smac (setup.py) ... [?25ldone
[?25h  Created wheel for smac: filename=smac-0.13.1-py3-none-any.whl size=252181 sha256=161a519b699add7c1b47d45d48d3db74b403fcf362a994d54d440823bf8c537a
  Stored in directory: /home/martin/.cache/pip/wheels/35/b9/6b/17b5f3d627b1be6cdcc5357f797bd9e4ea8cbae3d1ff00e621
  Building wheel for lazy-import (setup.py) ... [?25ldone
[?25h  Created wheel for lazy-import: filename=lazy_import-0.2.2-py2.py3-none-any.whl size=16486 sha256=a0eeb389a827cda86e1b1ef7baa0b6383100c786e20f4ac691a5aa6ec0802f7f
  Stored in directory: /home/martin/.cache/pip/wheels/e6/8e/c7/c338956a635caa3b3153cd8e49b183badb75230ecf19144dff
Successfully built smac lazy-import
Installing collected packages: lazy-import, smac
Successfully installed lazy-import-0.2.2 smac-0.13.1


In [2]:
!pip install auto-sklearn

Collecting auto-sklearn
  Using cached auto-sklearn-0.12.4.tar.gz (6.1 MB)
Building wheels for collected packages: auto-sklearn
  Building wheel for auto-sklearn (setup.py) ... [?25ldone
[?25h  Created wheel for auto-sklearn: filename=auto_sklearn-0.12.4-py3-none-any.whl size=6367618 sha256=3b6a3d1c9c393eba0823d1b7f0b0415fe83932d33d8e4054b867ad711050542e
  Stored in directory: /home/martin/.cache/pip/wheels/ab/00/0d/f7edef58d6fce191e5f8f9396a7a2e1fd860e5a9256b2dc7d4
Successfully built auto-sklearn
Installing collected packages: auto-sklearn
Successfully installed auto-sklearn-0.12.4


In [6]:
import sklearn.metrics
import autosklearn.regression
import pandas as pd

## let's attack our house-prices example

In [5]:
!pwd

/home/martin/python/fhnw_lecture/notebooks


In [7]:
train = pd.read_csv('../data/train.csv', sep=",")
test = pd.read_csv('../data/test.csv')

# autogluon.tabular

In [8]:
!pip install mxnet==1.7.0.post1

Collecting mxnet==1.7.0.post1
  Downloading mxnet-1.7.0.post1-py2.py3-none-manylinux2014_x86_64.whl (55.0 MB)
[K     |████████████████████████████████| 55.0 MB 5.7 MB/s eta 0:00:01    |███                             | 5.3 MB 3.3 MB/s eta 0:00:16     |███▎                            | 5.6 MB 3.3 MB/s eta 0:00:16     |███▌                            | 6.0 MB 3.3 MB/s eta 0:00:16     |██████████▉                     | 18.7 MB 6.4 MB/s eta 0:00:06     |███████████                     | 19.0 MB 6.4 MB/s eta 0:00:06     |███████████▎                    | 19.4 MB 6.4 MB/s eta 0:00:06     |████████████                    | 20.7 MB 6.9 MB/s eta 0:00:05     |█████████████▉                  | 23.7 MB 6.9 MB/s eta 0:00:05     |███████████████▌                | 26.7 MB 7.9 MB/s eta 0:00:04     |██████████████████▊             | 32.2 MB 7.9 MB/s eta 0:00:03     |███████████████████             | 32.5 MB 7.9 MB/s eta 0:00:03     |██████████████████████▊         | 39.0 MB 8.3 MB/s eta 0:00:02     |█

In [9]:
!pip install autogluon-core==0.0.16b20210114 autogluon-tabular==0.0.16b20210114

Collecting autogluon-core==0.0.16b20210114
  Downloading autogluon.core-0.0.16b20210114-py3-none-any.whl (246 kB)
[K     |████████████████████████████████| 246 kB 3.9 MB/s eta 0:00:01
[?25hCollecting autogluon-tabular==0.0.16b20210114
  Downloading autogluon.tabular-0.0.16b20210114-py3-none-any.whl (322 kB)
[K     |████████████████████████████████| 322 kB 4.4 MB/s eta 0:00:01
[?25hCollecting matplotlib
  Downloading matplotlib-3.3.4-cp37-cp37m-manylinux1_x86_64.whl (11.5 MB)
[K     |████████████████████████████████| 11.5 MB 3.9 MB/s eta 0:00:01     |████████████████████████████▉   | 10.4 MB 3.9 MB/s eta 0:00:01
[?25hCollecting paramiko>=2.4
  Downloading paramiko-2.7.2-py2.py3-none-any.whl (206 kB)
[K     |████████████████████████████████| 206 kB 6.0 MB/s eta 0:00:01
Collecting scipy<1.5.0,>=1.3.3
  Downloading scipy-1.4.1-cp37-cp37m-manylinux1_x86_64.whl (26.1 MB)
[K     |████████████████████████████████| 26.1 MB 777 kB/s eta 0:00:012    |█████████▎                      | 7.5 