In [1]:
%load_ext autoreload
%autoreload 2

# MLflow Regression Pipeline Notebook

This notebook runs the MLflow Regression Pipeline on Databricks and inspects its results. For more information about the MLflow Regression Pipeline, including usage examples, see the [Regression Pipeline overview documentation](https://mlflow.org/docs/latest/pipelines.html#regression-pipeline) the [Regression Pipeline API documentation](https://mlflow.org/docs/latest/python_api/mlflow.pipelines.html#module-mlflow.pipelines.regression.v1.pipeline).

In [2]:
from mlflow.recipes import Recipe

r = Recipe(profile="local")

2022/11/02 15:31:41 INFO mlflow.recipes.recipe: Creating MLflow Recipe 'mlp-regression-example' with profile: 'local'


In [3]:
r.clean()

In [4]:
r.inspect()

In [5]:
r.run("ingest")

2022/11/02 15:32:12 INFO mlflow.recipes.step: Running step ingest...
Loading dataset CSV using `pandas.read_csv()` with default arguments and assumed index column 0 which may not produce the desired schema. If the schema is not correct, you can adjust it by modifying the `load_file_as_dataframe()` function in `steps/ingest.py`


name,type
sepal_length,number
sepal_width,number
petal_length,number
petal_width,number
species,string

sepal_length,sepal_width,petal_length,petal_width,species
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa


In [6]:
r.run("split")

2022/11/02 15:32:17 INFO mlflow.recipes.utils.execution: ingest: No changes. Skipping.


2022/11/02 15:32:18 INFO mlflow.recipes.step: Running step split...


In [7]:
r.run("transform")

2022/11/02 15:32:20 INFO mlflow.recipes.utils.execution: ingest, split: No changes. Skipping.


2022/11/02 15:32:20 INFO mlflow.recipes.step: Running step transform...


Name,Type
sepal_length,float64
sepal_width,float64
petal_length,float64
petal_width,float64
species,object

Name,Type
std_scaler__sepal_length,float64
std_scaler__sepal_width,float64
std_scaler__petal_length,float64
std_scaler__petal_width,float64
species,object

std_scaler__sepal_length,std_scaler__sepal_width,std_scaler__petal_length,std_scaler__petal_width,species
-0.674094,0.9135,-1.103207,-1.108894,Iris-setosa
-1.004532,-0.124568,-1.103207,-1.108894,Iris-setosa
-0.839313,1.121114,-1.103207,-1.108894,Iris-setosa
-0.839313,0.705887,-1.034083,-1.108894,Iris-setosa
-1.830627,-0.332182,-1.103207,-1.108894,Iris-setosa


In [8]:
r.run("train")

2022/11/02 15:32:21 INFO mlflow.recipes.utils.execution: ingest, split, transform: No changes. Skipping.


2022/11/02 15:32:22 INFO mlflow.recipes.step: Running step train...
2022/11/02 15:32:23 INFO mlflow.recipes.steps.train: Training data has less than 5000 rows, skipping rebalancing.
2022/11/02 15:32:29 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2022/11/02 15:32:29 INFO mlflow.models.evaluation.default_evaluator: The evaluation dataset is inferred as binary dataset, positive label is Iris-setosa, negative label is Iris-versicolor.
2022/11/02 15:32:30 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2022/11/02 15:32:30 INFO mlflow.models.evaluation.default_evaluator: The evaluation dataset is inferred as binary dataset, positive label is Iris-setosa, negative label is Iris-versicolor.


Metric,training,validation
f1_score,1.0,1.0
accuracy_score,1.0,1.0
example_count,75.0,14.0
false_negatives,0.0,0.0
false_positives,0.0,0.0
log_loss,8.20587e-06,0.00517063
precision_recall_auc,0.279811,0.398828
precision_score,1.0,1.0
recall_score,1.0,1.0
roc_auc,1.0,1.0

Name,Type
sepal_length,double
sepal_width,double
petal_length,double
petal_width,double

Name,Type
-,"Tensor('str', (-1,))"

absolute_error,prediction,species,sepal_length,sepal_width,petal_length,petal_width
False,Iris-versicolor,Iris-versicolor,5.7,2.8,4.1,1.3
False,Iris-setosa,Iris-setosa,5.0,3.4,1.6,0.4
False,Iris-setosa,Iris-setosa,5.2,3.4,1.4,0.2
False,Iris-setosa,Iris-setosa,4.7,3.2,1.6,0.2
False,Iris-setosa,Iris-setosa,5.4,3.4,1.5,0.4
False,Iris-setosa,Iris-setosa,5.2,4.1,1.5,0.1
False,Iris-setosa,Iris-setosa,5.5,4.2,1.4,0.2
False,Iris-setosa,Iris-setosa,4.9,3.1,1.5,0.1
False,Iris-setosa,Iris-setosa,5.0,3.2,1.2,0.2
False,Iris-setosa,Iris-setosa,5.5,3.5,1.3,0.2

Unnamed: 0,Latest,Best,2nd Best
Model Rank,1,1,1
f1_score,1,1,1
accuracy_score,1,1,1
false_negatives,0,0,0
false_positives,0,0,0
precision_score,1,1,1
recall_score,1,1,1
true_negatives,8,8,8
true_positives,6,6,6
Run Time,2022-11-02 15:32:23,2022-10-31 16:00:33,2022-10-31 15:53:01


In [38]:
r.run("evaluate")

2022/10/31 16:09:43 INFO mlflow.pipelines.utils.execution: ingest, split, transform, train: No changes. Skipping.


2022/10/31 16:09:44 INFO mlflow.pipelines.step: Running step evaluate...
2022/10/31 16:09:45 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.
2022/10/31 16:09:45 INFO mlflow.models.evaluation.default_evaluator: Shap explainer _PatchedKernelExplainer is used.

  0%|          | 0/10 [00:00<?, ?it/s]
100%|██████████| 10/10 [00:00<00:00, 96.28it/s]
100%|██████████| 10/10 [00:00<00:00, 96.17it/s]
elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
2022/10/31 16:09:48 INFO mlflow.models.evaluation.base: Evaluating the model with the default evaluator.


Metric,validation,test
root_mean_squared_error,3.92421,2.261969
weighted_mean_squared_error,17.6921,3.478776
example_count,959.0,990.0
max_error,52.7717,17.987546
mean_absolute_error,1.78055,1.567069
mean_absolute_percentage_error,0.158525,0.598125
mean_on_target,13.1517,12.15102
mean_squared_error,15.3994,5.116504
r2_score,0.870413,0.947016
score,0.870413,0.947016

metric,greater_is_better,value,threshold,validated
root_mean_squared_error,False,2.26197,10,✅
mean_absolute_error,False,1.56707,50,✅
weighted_mean_squared_error,False,3.47878,20,✅


In [39]:
r.run("register")

2022/10/31 16:09:48 INFO mlflow.pipelines.utils.execution: ingest, split, transform, train, evaluate: No changes. Skipping.


2022/10/31 16:09:49 INFO mlflow.pipelines.step: Running step register...
Registered model 'taxi_fare_regressor' already exists. Creating a new version of this model...
2022/10/31 16:09:49 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation.                     Model name: taxi_fare_regressor, version 6
Created version '6' of model 'taxi_fare_regressor'.


In [None]:
r.inspect("train")

In [None]:
training_data = r.get_artifact("training_data")
training_data.describe()

In [None]:
trained_model = r.get_artifact("model")
print(trained_model)