# Cloud Workshop Azure Databricks
## 14. Azure Databricks et ML Flow - Batch

## Setup

1. Ensure you are using or create a cluster specifying 
  * **Databricks Runtime Version:** Databricks Runtime 5.0 or above 
  * **Python Version:** Python 3
1. Install required libraries or if using Databricks Runtime 5.1 or above (but not Databricks Runtime for ML), run Cmd 6.
   1. Create required libraries.
    * Source **PyPI** and enter `mlflow`.
    * Source **PyPI** and enter `scikit-learn==0.19.1`.
   1. Install the libraries into the cluster.
1. Attach this notebook to the cluster.

In [4]:
dbutils.library.installPyPI("mlflow")
dbutils.library.installPyPI("scikit-learn", "0.19.1")
dbutils.library.restartPython()

## Review the experiment

1. Open the experiment `/Shared/experiments/Test2` in the workspace.
1. Click a date to view a run.

Choose a run ID associated with an ElasticNet training run from of the Quick Start training and logging. You can find a run ID and model path from the experiment run, which can be found on the run details page:

![image](https://docs.databricks.com/_static/images/mlflow/mlflow-deployment-example-run-info.png)

## Load MLflow Model as a scikit-learn Model
You can use the MLflow API to load the model from the MLflow server that was produced by a given run. To do so, specify the `run_id`.

Once you load it, it is a just a scikit-learn model and you can explore it or use it.

### Pour utiliser votre modèle disponible dans MLFlow il est nécessaire de récupérer le `run_id` associé.
1. Aller dans l'expérimentation `/Shared/experiments/Test2` MLFlow
2. Coper le run_id du modèle à exploiter et faire un copier/coller de ce run_id dans la ligne de commande suivante

In [9]:
import mlflow.sklearn

model = mlflow.sklearn.load_model(path="model", run_id="96f31984d9454f02ba23fd4529bf2804") # Récupérer le Run ID du modèle depuis la page MLFlow

model.coef_

In [10]:
from sklearn import datasets
import numpy as np
import pandas as pd

# Load Diabetes datasets
diabetes = datasets.load_diabetes()
X = diabetes.data
y = diabetes.target

# Create pandas DataFrame for sklearn ElasticNet linear_model
Y = np.array([y]).transpose()
d = np.concatenate((X, Y), axis=1)
cols = ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6', 'progression']
data = pd.DataFrame(d, columns=cols)

In [11]:
#Get a prediction for a row of the dataset
model.predict(data[0:1].drop(["progression"], axis=1))

## Use an MLflow Model for Batch Inference
You can get a PySpark UDF to do some batch inference using one of the models.

In [13]:
# Create a Spark DataFrame from the original pandas DataFrame minus the column you want to predict.
# Use this to simulate what this would be like if you had a big data set e.g. click logs that was 
# regularly being updated that you wanted to score.
dataframe = spark.createDataFrame(data.drop(["progression"], axis=1))

Use the **MLflow API** to create a PySpark **UDF** from a run. See [Export a python_function model as an Apache Spark UDF](https://mlflow.org/docs/latest/models.html#export-a-python-function-model-as-an-apache-spark-udf).

### Récupérer le Run ID du modèle depuis la page MLFlow et le mettre à jour dans la ligne suivante

In [16]:
import mlflow.pyfunc
pyfunc_udf = mlflow.pyfunc.spark_udf(spark, "model", run_id="96f31984d9454f02ba23fd4529bf2804")

Add a column to the data by applying the PySpark UDF to the DataFrame.

In [18]:
predicted_df = dataframe.withColumn("prediction", pyfunc_udf('age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'))
display(predicted_df)

age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,prediction
0.0380759064334241,0.0506801187398187,0.0616962065186885,0.0218723549949558,-0.0442234984244464,-0.0348207628376986,-0.0434008456520269,-0.0025922619981828,0.0199084208763183,-0.0176461251598052,199.95454016293704
-0.001882016527791,-0.044641636506989,-0.0514740612388061,-0.0263278347173518,-0.0084487241112169,-0.019163339748222,0.0744115640787594,-0.0394933828740919,-0.0683297436244215,-0.09220404962683,67.53374220060797
0.0852989062966783,0.0506801187398187,0.0444512133365941,-0.0056706105549342,-0.0455994512826475,-0.0341944659141195,-0.0323559322397657,-0.0025922619981828,0.0028637705189401,-0.0259303389894746,167.07994136589085
-0.0890629393522603,-0.044641636506989,-0.0115950145052127,-0.0366564467985606,0.0121905687618,0.0249905933641021,-0.0360375700438527,0.0343088588777263,0.0226920225667445,-0.0093619113301358,173.65759671436638
0.005383060374248,-0.044641636506989,-0.0363846922044735,0.0218723549949558,0.0039348516125931,0.0155961395104161,0.0081420836051921,-0.0025922619981828,-0.0319914449413559,-0.0466408735636482,126.43804053293005
-0.0926954778032799,-0.044641636506989,-0.0406959404999971,-0.0194420933298793,-0.0689906498720667,-0.0792878444118122,0.0412768238419757,-0.076394503750001,-0.0411803851880079,-0.0963461565416647,110.17208957619022
-0.0454724779400257,0.0506801187398187,-0.0471628129432825,-0.015999222636143,-0.040095639849843,-0.0248000120604336,0.000778807997017968,-0.0394933828740919,-0.0629129499162512,-0.0383566597339788,86.87053008755379
0.063503675590561,0.0506801187398187,-0.0018947058402846,0.0666296740135272,0.0906198816792644,0.108914381123697,0.0228686348215404,0.0177033544835672,-0.0358167281015492,0.0030644094143683,122.21922463575306
0.0417084448844436,0.0506801187398187,0.0616962065186885,-0.0400993174922969,-0.0139525355440215,0.0062016856567301,-0.0286742944356786,-0.0025922619981828,-0.0149564750249113,0.0113486232440377,160.24072898975876
-0.0709002470971626,-0.044641636506989,0.0390621529671896,-0.0332135761048244,-0.0125765826858204,-0.034507614375909,-0.0249926566315915,-0.0025922619981828,0.0677363261102861,-0.0135040182449705,219.1750832622509


In [19]:
display(predicted_df)

age,sex,bmi,bp,s1,s2,s3,s4,s5,s6,prediction
0.0380759064334241,0.0506801187398187,0.0616962065186885,0.0218723549949558,-0.0442234984244464,-0.0348207628376986,-0.0434008456520269,-0.0025922619981828,0.0199084208763183,-0.0176461251598052,199.95454016293704
-0.001882016527791,-0.044641636506989,-0.0514740612388061,-0.0263278347173518,-0.0084487241112169,-0.019163339748222,0.0744115640787594,-0.0394933828740919,-0.0683297436244215,-0.09220404962683,67.53374220060797
0.0852989062966783,0.0506801187398187,0.0444512133365941,-0.0056706105549342,-0.0455994512826475,-0.0341944659141195,-0.0323559322397657,-0.0025922619981828,0.0028637705189401,-0.0259303389894746,167.07994136589085
-0.0890629393522603,-0.044641636506989,-0.0115950145052127,-0.0366564467985606,0.0121905687618,0.0249905933641021,-0.0360375700438527,0.0343088588777263,0.0226920225667445,-0.0093619113301358,173.65759671436638
0.005383060374248,-0.044641636506989,-0.0363846922044735,0.0218723549949558,0.0039348516125931,0.0155961395104161,0.0081420836051921,-0.0025922619981828,-0.0319914449413559,-0.0466408735636482,126.43804053293005
-0.0926954778032799,-0.044641636506989,-0.0406959404999971,-0.0194420933298793,-0.0689906498720667,-0.0792878444118122,0.0412768238419757,-0.076394503750001,-0.0411803851880079,-0.0963461565416647,110.17208957619022
-0.0454724779400257,0.0506801187398187,-0.0471628129432825,-0.015999222636143,-0.040095639849843,-0.0248000120604336,0.000778807997017968,-0.0394933828740919,-0.0629129499162512,-0.0383566597339788,86.87053008755379
0.063503675590561,0.0506801187398187,-0.0018947058402846,0.0666296740135272,0.0906198816792644,0.108914381123697,0.0228686348215404,0.0177033544835672,-0.0358167281015492,0.0030644094143683,122.21922463575306
0.0417084448844436,0.0506801187398187,0.0616962065186885,-0.0400993174922969,-0.0139525355440215,0.0062016856567301,-0.0286742944356786,-0.0025922619981828,-0.0149564750249113,0.0113486232440377,160.24072898975876
-0.0709002470971626,-0.044641636506989,0.0390621529671896,-0.0332135761048244,-0.0125765826858204,-0.034507614375909,-0.0249926566315915,-0.0025922619981828,0.0677363261102861,-0.0135040182449705,219.1750832622509


> Fin