# MLFLOW in Jupyter

MLFLOW is an opensource framework to manage all the lifecycle of a machine learning project. This is a summary of the functionality it offers:

<img src="data/mlflow.png" width=500 height=500 />

As Jupyter is fully integrated with mlflow, in this notebook some examples will be shown describing the way they can work together.

## Libraries

Load libraries and get basic settings information

In [3]:
import os
from pathlib import Path
import mlflow
from mlflow import log_metric, log_param, log_artifact

In [4]:
# Get configured tracking uri and artifact uri and generated run_id and experiment
print ("MLFlow Tracking: ", mlflow.tracking.get_tracking_uri())
print ("MLFlow Artifacts: ", mlflow.get_artifact_uri())
print ("Mlflow rund-id: " + mlflow.active_run().info.run_id)

MLFlow Tracking:  http://jupyterhubmlflow-mlflow:5000
MLFlow Artifacts:  hdfs://jupyterhubmlflow-namenode:8020/mlflow-artifacts/0/7c2e8e5f96d94dd7bafdd95473c959ea/artifacts
Mlflow rund-id: 7c2e8e5f96d94dd7bafdd95473c959ea


## Basic Tracking

Here an example of the way to upload to MLFLOW of the data and artifacts related to a ML experiment will be shown:

In [None]:
# Set experiment name
mlflow.set_experiment("Experiment1")

# End any running experiment
mlflow.end_run()

with mlflow.start_run():
    
    # Set tags in the experiment
    mlflow.set_tag("Experiment", "test")
    
    # Log metrics in the experiment
    mlflow.log_metrics({"Score": 1, "Recall": 2, "Precission": 3, "F": 4})
    mlflow.log_metrics({"Area Under ROC": 1, "Area Under PR": 2})
    mlflow.log_metrics({"r2": 1, "rmse": 2, "mse": 2, "mae": 3})
    
    # Log parameters in the experiment
    mlflow.log_param('param1', 1)

Visit now MLFLOW tracking site and validate that the Experiment1 has been created

## Model tracking

In [None]:
# Set experiment name
mlflow.set_experiment("Experiment2")

# End any running experiment
mlflow.end_run()

# Run the experiment
with mlflow.start_run():

    # Load mleap model
    mlflow.mleap.log_model(spark_model=featureModel, sample_input=sample, artifact_path="mleappath")
    
    # Load spark model    
    mlflow.spark.log_model(bestModel, "spark-model")
    
    # Load flask model    
    mlflow.???.log_model(???")

## Getting parameters from Spark for MLFLOW tracking

In [None]:
# Set experiment name
mlflow.set_experiment("Experiment3")

# Run the experiment
with mlflow.start_run():
    
    (dfTraining, dfValidation) = df.randomSplit([90.0, 10.0])
    crossvalPredictionModel = crossval.fit(dfTraining)
    
    #Get the best model, validate and get random forest stage
    bestModel = crossvalPredictionModel.bestModel
    bestModelPredictions = bestModel.transform(dfValidation)
    rfModel = bestModel.stages[2]
    
    #### Log model parameters
    for param in rfModel.extractParamMap():
        mlflow.log_param(param.name, rfModel.extractParamMap()[param])