Copyright (c) Microsoft Corporation. All rights reserved. 

Licensed under the MIT License.

# Run FLAML in AzureML


## 1. Introduction

FLAML is a Python library (https://github.com/microsoft/FLAML) designed to automatically produce accurate machine learning models 
with low computational cost. It is fast and economical. The simple and lightweight design makes it easy 
to use and extend, such as adding new learners. FLAML can 
- serve as an economical AutoML engine,
- be used as a fast hyperparameter tuning tool, or 
- be embedded in self-tuning software that requires low latency & resource in repetitive
   tuning tasks.

In this notebook, we use one real data example (binary classification) to showcase how to use FLAML library together with AzureML.

FLAML requires `Python>=3.8`. To run this notebook example, please install flaml with the [automl,azureml] option:
```bash
pip install flaml[automl,azureml]
```

In [None]:
%pip install flaml[automl,azureml]

### Enable mlflow in AzureML workspace

In [None]:
import mlflow
from azureml.core import Workspace

ws = Workspace.from_config()
mlflow.set_tracking_uri(ws.get_mlflow_tracking_uri())

## 2. Classification Example
### Load data and preprocess

Download [Airlines dataset](https://www.openml.org/d/1169) from OpenML. The task is to predict whether a given flight will be delayed, given the information of the scheduled departure.

In [None]:
from flaml.automl.data import load_openml_dataset
X_train, X_test, y_train, y_test = load_openml_dataset(dataset_id=1169, data_dir='./')

### Run FLAML
In the FLAML automl run configuration, users can specify the task type, time budget, error metric, learner list, whether to subsample, resampling strategy type, and so on. All these arguments have default values which will be used if users do not provide them. For example, the default ML learners of FLAML are `['lgbm', 'xgboost', 'catboost', 'rf', 'extra_tree', 'lrl1']`. 

In [None]:
''' import AutoML class from flaml package '''
from flaml import AutoML
automl = AutoML()

In [None]:
settings = {
    "time_budget": 60,  # total running time in seconds
    "metric": 'accuracy',  
                    # check the documentation for options of metrics (https://microsoft.github.io/FLAML/docs/Use-Cases/Task-Oriented-AutoML#optimization-metric)
    "estimator_list": ['lgbm', 'rf', 'xgboost'],  # list of ML learners
    "task": 'classification',  # task type    
    "sample": False,  # whether to subsample training data
    "log_file_name": 'airlines_experiment.log',  # flaml log file
}


In [None]:
experiment = mlflow.set_experiment("flaml")
with mlflow.start_run() as run:
    automl.fit(X_train=X_train, y_train=y_train, **settings)
    # log the model
    mlflow.sklearn.log_model(automl, "automl")


### Load the model

In [None]:
automl = mlflow.sklearn.load_model(f"{run.info.artifact_uri}/automl")
print(automl.predict_proba(X_test))
print(automl.predict(X_test))

### Retrieve logs

In [None]:
mlflow.search_runs(experiment_ids=[experiment.experiment_id], filter_string="params.learner = 'xgboost'")