# Leveraging MLflow with SASCTL and Model Manager for SKLearn
[MLflow](https://mlflow.org/) is an open-source platform used to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. 

While MLflow and Model Manager overlap in functionality, there are places where MLflow can strengthen Model Manager. For example, by leveraging MLflow, Model Manager can better support various complex model architectures. We will continue to make additions to our SASCTL integrations with MLflow, but currently we support models developed in sklearn, statsmodel, scipy, and numpy.

In this notebook, we will push a model generated in MLflow into the Model Manager registry.
***
## Getting Started
To import MLflow models into SAS Model Manager, there are a few lines that need to be included in the MLflow script. First, include the infer_signature function in the import statements. We will need to include the signature inference after any parameter logging is defined and include a signature argument in the model logging.


In [None]:
from mlflow.models.signature import infer_signature

Next, adjust any data columns which are not valid Python variable names.

In [None]:
import pandas as pd
data = pd.read_csv('./data/hmeq.csv')
data.columns = data.columns.str.replace('\W|^(?=\d)', '_', regex=True)

***
## Building a Model
Next, let's build a logistic regression. First, we will prepare our data. 

In [None]:
# Impute missing values 
data = data.fillna(value={'MORTDUE': 65019, 'VALUE': 89235, 'YOJ': 7, 'DEROG': 0, 'DELINQ': 0, 'CLAGE': 173, 'NINQ': 1, 'CLNO': 20, 'DEBTINC': 35})

# One-hot-encode job
one_hot_job = pd.get_dummies(data["JOB"], prefix = "JOB", drop_first=True)
data = data.join(one_hot_job)
data = data.drop('JOB', axis = 1)

# One-hot-encode reason
one_hot_reason = pd.get_dummies(data["REASON"], prefix = "REASON", drop_first=True)
data = data.join(one_hot_reason)
data = data.drop('REASON', axis = 1)

# Separate target 
y = data.pop('BAD').values

Next, we will build our SKLearn model. 

In [None]:
from sklearn.linear_model import LogisticRegression
model = LogisticRegression().fit(data, y)

Now, let’s generate our signature. For this simple example, I’m assuming that this model will not encounter missing values, so I am ignoring MLflow’s warning about missing values. 

In [None]:
import warnings
warnings.filterwarnings("ignore")

signature = infer_signature(data, model.predict(data))

Finally, let’s log our MLflow model and include our signature. 

In [None]:
import mlflow
import os
os.chdir("./data/MLFlowModels/")
    
score = model.score(data, y)

print("Score: %s" % score)
mlflow.log_metric("score", score)

mlflow.sklearn.log_model(model, "model", signature=signature)
print("Model saved in run %s" % mlflow.active_run().info.run_uuid)

os.chdir("../../")

## Register Model
Now, let’s use SASCTL to register our MLflow SKLearn model. First, let’s install the necessary packages. 

In [None]:
# Pathing support
from pathlib import Path

# sasctl interface for importing models
import sasctl.pzmm as pzmm 
from sasctl import Session

And point SASCTL to the MLflow model files. 

In [None]:
mlPath = Path(f'./data/MLFlowModels/mlruns/0/{mlflow.active_run().info.run_uuid}/artifacts/model')
varDict, inputsDict, outputsDict = pzmm.MLFlowModel.read_mlflow_model_file(m_path=mlPath)

Next, let’s create a folder for our SASCTL assets and pickle our model. 

In [None]:
modelPrefix = 'MLFlowDemo'
zipFolder = Path.cwd() / f'data/MLFlowModels/{modelPrefix}'
pzmm.PickleModel.pickle_trained_model(model_prefix=modelPrefix, pickle_path=zipFolder, mlflow_details=varDict)

We can leverage the information from MLflow to generate metadata files for SASCTL. 

In [None]:
J = pzmm.JSONFiles()
J.write_var_json(inputsDict, isInput=True, jPath=zipFolder)
J.write_var_json(outputsDict, isInput=False, jPath=zipFolder)

In [None]:
# Write model properties to a json file
J.write_model_properties_json(modelName=modelPrefix,
                            modelDesc='MLFlow Model ',
                            targetVariable='BAD',
                            modelType='Logistic Regression',
                            modelPredictors='',
                            targetEvent=1,
                            numTargetCategories=1,
                            eventProbVar='tensor',
                            jPath=zipFolder,
                            modeler='sasdemo')

# Write model metadata to a json file
J.write_file_metadata_json(modelPrefix, jPath=zipFolder)

We have generated our metadata and modeling assets. Next, we will need our SAS Viya host, username, and password to create a session within SASCTL.

In [None]:
import getpass
username = getpass.getpass("Username: ")
password = getpass.getpass("Password: ")
host = getpass.getpass("Hostname: ")
sess = Session(host, username, password, protocol='http')

We can use our session to push our modeling assets into Model Manager. 

In [None]:
I = pzmm.ImportModel()
I.import_model(zipFolder, modelPrefix, 'MLFlowTest', inputsDict, None, '{}.predict({})', metrics=['tensor'], force=True)

Success! Now we can view our model score code, pickle file, and metadata within Model Manager. 
***