
<div style="text-align: center; line-height: 0; padding-top: 9px;">
  <img src="https://databricks.com/wp-content/uploads/2018/03/db-academy-rgb-1200px.png" alt="Databricks Learning">
</div>



# LAB - AutoML

Welcome to the AutoML Lab! In this lab, you will explore the capabilities of AutoML using the Databricks AutoMl UI and AutoML API. 


**Lab Outline:**

In this lab, you will need to complete the following tasks;

* **Task 1 :** Load data set.

* **Task 2 :** Create a classification experiment using the AutoML UI.

* **Task 3 :** Create a classification experiment with the AutoML API.

* **Task 4 :** Retrieve the best run and show the model URI.

* **Task 5 :** Import the notebook for a run.



## Requirements

Please review the following requirements before starting the lesson:

* To run this notebook, you need to use one of the following Databricks runtime(s): **13.3.x-cpu-ml-scala2.12 13.3.x-scala2.12**


## Classroom Setup

Before starting the demo, run the provided classroom setup script. This script will define configuration variables necessary for the demo. Execute the following cell:

In [0]:
%run ../Includes/Classroom-Setup-03.LAB

[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m
[43mNote: you may need to restart the kernel using dbutils.library.restartPython() to use updated packages.[0m


Resetting the learning environment:
| dropping the catalog "labuser8100238_1734917322_mvxq_da"...(1 seconds)

Skipping install of existing datasets to "dbfs:/mnt/dbacademy-datasets/machine-learning-model-development/v01"

Validating the locally installed datasets:
| listing local files...(0 seconds)
| validation completed...(0 seconds total)
Creating & using the catalog "labuser8100238_1734917322_mvxq_da"...(2 seconds)

Predefined tables in "labuser8100238_1734917322_mvxq_da.default":
| bank_loan

Predefined paths variables:
| DA.paths.working_dir: dbfs:/mnt/dbacademy-users/labuser8100238_1734917322@vocareum.com/machine-learning-model-development
| DA.paths.datasets:    dbfs:/mnt/dbacademy-datasets/machine-learning-model-development/v01

Setup completed (36 seconds)


**Other Conventions:**

Throughout this demo, we'll refer to the object `DA`. This object, provided by Databricks Academy, contains variables such as your username, catalog name, schema name, working directory, and dataset locations. Run the code block below to view these details:

In [0]:
print(f"Username:          {DA.username}")
print(f"Catalog Name:      {DA.catalog_name}")
print(f"Schema Name:       {DA.schema_name}")
print(f"Working Directory: {DA.paths.working_dir}")
print(f"User DB Location:  {DA.paths.datasets}")

Username:          labuser8100238_1734917322@vocareum.com
Catalog Name:      labuser8100238_1734917322_mvxq_da
Schema Name:       default
Working Directory: dbfs:/mnt/dbacademy-users/labuser8100238_1734917322@vocareum.com/machine-learning-model-development
User DB Location:  dbfs:/mnt/dbacademy-datasets/machine-learning-model-development/v01


## Task 1 : Load data set

Load the dataset that will be used for the AutoML experiment.
+ Load the dataset where the table name is `bank_loan`.
+ Display the dataset.

In [0]:
loan_data= spark.sql("SELECT * FROM bank_loan")
display(loan_data)

ID,Age,Experience,Income,ZIP_Code,Family,CCAvg,Education,Mortgage,Personal_Loan,Securities_Account,CD_Account,Online,CreditCard
1,25,1,49,91107,4,1.6,1,0,0,1,0,0,0
2,45,19,34,90089,3,1.5,1,0,0,1,0,0,0
3,39,15,11,94720,1,1.0,1,0,0,0,0,0,0
4,35,9,100,94112,1,2.7,2,0,0,0,0,0,0
5,35,8,45,91330,4,1.0,2,0,0,0,0,0,1
6,37,13,29,92121,4,0.4,2,155,0,0,0,1,0
7,53,27,72,91711,2,1.5,2,0,0,0,0,1,0
8,50,24,22,93943,1,0.3,3,0,0,0,0,0,1
9,35,10,81,90089,3,0.6,2,104,0,0,0,1,0
10,34,9,180,93023,1,8.9,3,0,1,0,0,0,0


## Task 2: Create Classification Experiment Using AutoML UI

Follow these steps to create an AutoML experiment using the  UI:

  ***Step 1.*** Navigate to the **Experiments** section.

  ***Step 2.*** Click on **Create AutoML Experiment** in the top-right corner.

  ***Step 3.*** Choose a cluster for experiment execution.

  ***Step 4.*** For the ML problem type, select **`Classification`**.

  ***Step 5.*** Select the input training dataset as **`catalog > schema > bank_loan`**.

  ***Step 6.*** Specify **`Personal_Loan`** as the prediction target.

  ***Step 7.*** Deselect the **`ID`**, **`ZIP_Code`** field as it's not needed as a feature.

  ***Step 8.*** In the **Advanced Configuration** section, set the **Timeout** to **5 minutes**.

  ***Step 9.*** Enter a name for your experiment, like `Bank_Loan_Prediction_AutoML_Experiment`.


## Task 3: Create a Classification Experiment with the AutoML API

Utilize the AutoML API to set up and run a classification experiment. Follow these steps:

1. **Setting up the Experiment:**
   - **Specify the Dataset:** Specify the dataset using the Spark table name, which is `bank_loan`.
   - **Set Target Column:** Assign the target_col to the column you want to predict, which is `Personal_Loan`.
   - **Adjust Exclude Columns:** Provide a list of columns to exclude from the modeling process after reviewing the displayed dataset.
   - **Set Timeout Duration:** Determine the timeout_minutes for the AutoML experiment. such as `5` minutes   

2. **Running AutoML:**
   - Use the AutoML API to explore various machine learning models.



In [0]:
from databricks import automl
from datetime import datetime
summary = automl.classify(
    dataset=spark.table("bank_loan"),
    target_col="Personal_Loan",
    exclude_cols=["ID", "ZIP_Code"],  # Exclude columns as needed
    timeout_minutes=5
)

2024/12/23 01:36:30 INFO databricks.automl.client.manager: AutoML will optimize for F1 score metric, which is tracked as val_f1_score in the MLflow experiment.
2024/12/23 01:36:31 INFO databricks.automl.client.manager: MLflow Experiment ID: 1648172618107916
2024/12/23 01:36:31 INFO databricks.automl.client.manager: MLflow Experiment: https://dbc-d9456cf0-b376.cloud.databricks.com/?o=1569305309123318#mlflow/experiments/1648172618107916
2024/12/23 01:38:27 INFO databricks.automl.client.manager: Data exploration notebook: https://dbc-d9456cf0-b376.cloud.databricks.com/?o=1569305309123318#notebook/1648172618107934


## Task 4: Retrieve the best run and show the model URI

Identify the best model generated by AutoML based on a chosen metric. Retrieve information about the best run, including the model URI, to further explore and analyze the model.
 + Find the experiment id associated with your AutoML run experiment. 
 + Define a search term to filter for runs. Adjust the search term based on the desired status, such as `FINISHED` or `ACTIVE`. 
 + Specify the run view type to view only active runs or to view all runs.
 + Provide the metric you want to use for ordering  and Specify whether you want to order the runs in descending or ascending order.

In [0]:
import mlflow
from mlflow.entities import ViewType

# Find the best run ...
automl_runs_pd = mlflow.search_runs(
  experiment_ids=[summary.experiment.experiment_id], 
  filter_string=f"attributes.status = 'FINISHED'", 
  run_view_type=ViewType.ACTIVE_ONLY, 
  order_by=["metrics.val_f1_score DESC"] 
)

In [0]:
# Print information about the best trial
print(summary.best_trial)


## Task 5: Import Notebook for a Run

AutoML automatically generates the best run's notebook and makes it available for you. If you want to access to other runs' notebooks, you need to import them.

In this task, you will import the **5th run's notebook** to the **`destination_path`**. 

Show the `url` and `path` of the imported notebook.

In [0]:
destination_path = f"/Users/{DA.username}/imported_notebooks/lab.3-{datetime.now().strftime('%Y%m%d%H%M%S')}"

# Get the path and url for the generated notebook
result = automl.import_notebook(summary.trials[5].artifact_uri, destination_path)
print(result.path)
print(result.url)


## Clean up Classroom

Run the following cell to remove lessons-specific assets created during this lesson.

In [0]:
DA.cleanup()


## Conclusion

In this lab, you got hands-on with Databricks AutoML. You started by loading a dataset and creating a classification experiment using the AutoMl UI and AutoML API. You then learned how to summarize the best model by applying specific filters and explored the process of retrieving the best model along with its Model URI.


&copy; 2024 Databricks, Inc. All rights reserved.<br/>
Apache, Apache Spark, Spark and the Spark logo are trademarks of the 
<a href="https://www.apache.org/">Apache Software Foundation</a>.<br/>
<br/><a href="https://databricks.com/privacy-policy">Privacy Policy</a> | 
<a href="https://databricks.com/terms-of-use">Terms of Use</a> | 
<a href="https://help.databricks.com/">Support</a>