<a href="https://colab.research.google.com/github/rnomadic/Databricks_ML/blob/main/MLFlow-Model-Registry.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
"""
*******************    MLFlow Model Registry    *********************
Save as MLFlow-Model-Registry.ipynb


The MLflow Model Registry component is a centralized model store, set of APIs, 
and UI, to collaboratively manage the full lifecycle of an MLflow Model. 
It provides 
1> model lineage (which MLflow Experiment and Run produced the model), 
2> model versioning, 
3> stage transitions (e.g. from staging to production), 
4> annotations (e.g. with comments, tags), and 
5> deployment management (e.g. which production jobs have requested a specific model version).


Central Repository: Register MLflow models with the MLflow Model Registry. 
A registered model has a unique name, version, stage, and other metadata.

Model Versioning: Automatically keep track of versions for registered models when updated.

Model Stage: Assigned preset or custom stages to each model version, 
like “Staging” and “Production” to represent the lifecycle of a model.

Model Stage Transitions: Record new registration events or changes as activities 
that automatically log users, changes, and additional metadata such as comments.

CI/CD Workflow Integration: Record stage transitions, request, review 
and approve changes as part of CI/CD pipelines for better control and governance.

https://mlflow.org/docs/latest/registry.html
https://docs.databricks.com/applications/mlflow/databricks-autologging.html

here are a few ways to use autologging:

Call mlflow.autolog() before your training code. This will enable autologging 
for each supported library you have installed as soon as you import it.

Enable autologging at the workspace level from the admin console

Use library-specific autolog calls for each library you use in your code. 
(e.g. mlflow.spark.autolog())
"""

import mlflow
import mlflow.sklearn
from mlflow.models.signature import infer_signature

import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

df = pd.read_csv(f"{datasets_dir}/airbnb/sf-listings/airbnb-cleaned-mlflow.csv".replace("dbfs:/", "/dbfs/"))

X_train, X_test, y_train, y_test =  train_test_split(df.drop(["price"], axis=1), df[["price"]].values.ravel(), random_state=42)

with mlflow.start_run(run_name="LR Model") as run:
    mlflow.sklearn.autolog(log_input_example=True, log_model_signatures=True, log_models=True)
    lr = LinearRegression()
    lr.fit(X_train, y_train)
    signature = infer_signature(X_train, lr.predict(X_train))

## Create a unique model name so you don't clash with other workspace users.
model_name = f"{cleaned_username}_sklearn_lr"
run_id = run.info.run_id
model_uri= f"runs:/{run_id}/model"
model_details = mlflow.register_model(model_uri=model_uri, name=model_name)

from mlflow.tracking.client import MlflowClient
client = MlflowClient()
model_version_details = client.get_model_version(name=model_name, version=1)
model_verison_details.status

client.update_registered_model(name=model_details.name, description=)

client.update_model_version(name=model_details.name, version=model_details.version, 
                            version="This model version was built using OLS linear regression with sklearn.")

"""
Deploying a model:
The MLflow Model Registry defines several model stages: **`None`**, **`Staging`**, **`Production`**, and **`Archived`**. 
Each stage has a unique meaning. For example, **`Staging`** is meant for model testing, while **`Production`** is for models that 
have completed the testing or review processes and have been deployed to applications.

"""
import time
time.sleep(10) ## In case registration is still pending

client.transition_model_version_stage(
                                      name = model_details.name,
                                      version=model_details.version,
                                      stage="Production" )

model_version_details = client.get_model_version(
    name=model_details.name
    version = model_details.version
)

print(f"current model stage is : '{model_version_details.current_stage}' ")

"""
Fetch the latest model using a **`pyfunc`**.  Loading the model in this way 
allows us to use the model regardless of the package that was used to train it.
"""

import mlflow.pyfunc

model_version_uri = f"models:/{model_name}/1"
print(f"Loading registered model version from URI: '{model_version_uri}'")
model_version_1 = mlflow.pyfunc.load_model(model_version_uri)

