# Model Lifecycle in Databricks Machine Learning

Thursday, May 29, 2025

[Invitation on Luma](https://lu.ma/2c4p2267), [LinkedIn](https://www.linkedin.com/feed/update/urn:li:activity:7332401688881815553/), [Meetup](https://www.meetup.com/warsaw-data-engineering/events/308041585/)


# 📚 Agenda

1. Demo: Model Lifecycle (From Training to Serving)

⏰ Całkowity czas trwania spotkania: **1h 15min**


# 📈 LinkedIn Poll

[Czy weźmiesz udział w spotkaniu stacjonarnym?](https://www.linkedin.com/feed/update/urn:li:groupPost:9307761-7327398491633156096/)

![Poll Results](./poll-meetup-stationary.png)


# 🙋‍♀️ Event Question

[O czym chciał(a)byś usłyszeć podczas meetupu? Rzuć ciekawym pomysłem na kolejne edycje](https://www.meetup.com/warsaw-data-engineering/events/308041585//attendees/) 🙏

1. dużo przykładów użycia

# 📢 News

Things worth watching out for...


## New members in Warsaw Data Engineering!

[You now have 597 members!](https://www.meetup.com/warsaw-data-engineering/)

Co zainteresowało Cię w Warsaw Data Engineering Meetup, że zdecydowałaś/-eś się przyłączyć?

1. Tematyka, która mnie interesuje
1. Tematyka ML


## 🚀 New Versions

What has changed in the tooling space since we last met? I.e. hunting down the features to learn more about.

* ✨✨ [Apache Spark 4.0.0](https://spark.apache.org/releases/spark-release-4-0-0.html) ✨✨
* [PydanticAI 0.2.12](https://github.com/pydantic/pydantic-ai/releases/tag/v0.2.12)
* [Dagster 1.10.17](https://github.com/dagster-io/dagster/releases/tag/1.10.17)


# 🧑‍💻 Demo: Model Lifecycle (From Training to Serving)

## Create Schema

⚠️ This is Unity Catalog at play here. No MLflow yet.

A model is a directory that...FIXME

In [0]:
%sql

CREATE SCHEMA IF NOT EXISTS jacek_laskowski.mlflow


👉 [jacek_laskowski.mlflow](https://curriculum-dev.cloud.databricks.com/explore/data/jacek_laskowski/mlflow)

## Train Model

⚠️ This is a pure scikit-learn. No MLflow. No Databricks.

In [0]:
from sklearn.datasets import make_regression
help(make_regression)

In [0]:
# Generate a random regression problem.
from sklearn.datasets import make_regression

X, y = make_regression(n_features=4, n_informative=2, random_state=0, shuffle=False)

In [0]:
from sklearn.ensemble import RandomForestRegressor

params = {"n_estimators": 3, "random_state": 42}
rfr = RandomForestRegressor(**params).fit(X, y)
rfr

In [0]:
rfr.predict(X)

## Register Model

This is what `mlflow.log_model` is all about.

This is MLflow! ❤️

In [0]:
%pip install --upgrade mlflow-skinny[databricks]
dbutils.library.restartPython()

In [0]:
import mlflow

help(mlflow.sklearn.log_model)

# Log a scikit-learn model as an MLflow artifact for the current run.
# Produces an MLflow Model containing the following two flavors:
# 1. mlflow.sklearn
# 2. mlflow.pyfunc (only for scikit-learn models that define `predict()` that is required for pyfunc model inference)

In [0]:
from mlflow.models import infer_signature
from sklearn.ensemble import RandomForestRegressor

from sklearn.datasets import make_regression
X, y = make_regression(n_features=4, n_informative=2, random_state=0, shuffle=False)

params = {"n_estimators": 3, "random_state": 42}

with mlflow.start_run() as run:
    rfr = RandomForestRegressor(**params).fit(X, y)
    signature = infer_signature(X, rfr.predict(X))
    mlflow.log_params(params)
    mlflow.log_param("chyba_nazwa_tutaj", False)
    mlflow.log_param("chyba_nazwa_tutaj_v2", True)
    mlflow.sklearn.log_model(
        sk_model=rfr,
        artifact_path="sklearn-model",
        signature=signature,
        registered_model_name="jacek_laskowski.mlflow.sklearn_model",
    )


## Registered Models

### Databricks UI


Review [Registered Models](https://curriculum-dev.cloud.databricks.com/ml/models?o=3551974319838082) (Owned by me)

There could be one from my previous experiments.


![Registered Models](./databricks_ml_registered_models.png)

### Databricks CLI


#### No models command 😢

<br>

```
❯ databricks models
Error: unknown command "models" for "databricks"
```

#### databricks registered-models

* Databricks provides a hosted version of MLflow Model Registry in Unity Catalog
* Models in Unity Catalog provide centralized access control, auditing, lineage, and discovery of ML models across Databricks workspaces.
* An **MLflow registered model** resides in the third layer of Unity Catalog’s three-level namespace.
* Registered models contain model versions, which correspond to actual ML models (MLflow models).

⚠️ Creating new model versions currently requires use of the MLflow Python client.

⚠️ The securable type for models is **FUNCTION**. When using REST APIs (e.g. tagging, grants) that specify a securable type, use "FUNCTION" as the securable type.

Once model versions are created:

* For **batch inference**, load them using MLflow Python Client API.
* For **real-time serving**, deploy them using Databricks Model Serving.

```
❯ databricks registered-models list --catalog-name jacek_laskowski --schema-name mlflow
[
  {
    "catalog_name": "jacek_laskowski",
    "created_at": 1746973647937,
    "created_by": "jacek@japila.pl",
    "full_name": "jacek_laskowski.mlflow.sklearn_model",
    "metastore_id": "6820268e-1541-4b52-b49e-e7135e528491",
    "name": "sklearn_model",
    "owner": "jacek@japila.pl",
    "schema_name": "mlflow",
    "storage_location": "s3://curriculum-storage/6820268e-1541-4b52-b49e-e7135e528491/models/debf1786-b6b4-4119-8f31-940dac8036de",
    "updated_at": 1747324659304,
    "updated_by": "jacek@japila.pl"
  }
]
```

## Model Versions

### Databricks UI


Review [Model Versions](https://curriculum-dev.cloud.databricks.com/explore/data/models/jacek_laskowski/mlflow/sklearn_model)

There could be many from my previous experiments.


![Model Versions](./databricks_ml_model_versions.png)

### Databricks CLI


|Databricks Command|Description|
|-|-|
|`databricks model-registry`|⛔️ This API reference documents APIs for the Workspace Model Registry.|
|`databricks model-versions`|👍 Databricks provides a hosted version of MLflow Model Registry in Unity Catalog.|


```
❯ databricks model-versions list jacek_laskowski.mlflow.sklearn_model | jq 'sort_by(.version)| .[] | [.version, .storage_location]'
[
  1,
  "s3://curriculum-storage/6820268e-1541-4b52-b49e-e7135e528491/models/debf1786-b6b4-4119-8f31-940dac8036de/versions/51854a98-c4ff-460c-9b60-4883ef748022"
]
[
  2,
  "s3://curriculum-storage/6820268e-1541-4b52-b49e-e7135e528491/models/debf1786-b6b4-4119-8f31-940dac8036de/versions/1b810a18-852e-4016-a706-595a32385ec7"
]
[
  3,
  "s3://curriculum-storage/6820268e-1541-4b52-b49e-e7135e528491/models/debf1786-b6b4-4119-8f31-940dac8036de/versions/92489608-a3ab-4994-8b39-b9ab4a227a1a"
]
```

## Experiments


An **experiment** is an top-level organizational unit in MLflow.

An experiment is a collection of one or many **Runs**.

A **Run** (_training_) is an execution of a ML code that is a part of a single experiment. A run is a collection of model training metadata and artifacts.

An experiment run in MLflow has always the default [PyFunc](https://mlflow.org/docs/latest/api_reference/python_api/mlflow.pyfunc.html) flavor, a MLflow wrapper (code) around a native model.

Experiments are divided into:

* Notebook experiments that are code in a notebook
* Workspace experiments that you specify the path manually (it could be a notebook, too).


### Experiment Tracking

[Databricks UI](https://curriculum-dev.cloud.databricks.com/ml/experiments?o=3551974319838082) and [MLflowClient.search_runs](https://mlflow.org/docs/latest/getting-started/logging-first-model/step2-mlflow-client)

In [0]:
help(mlflow.search_runs)

In [0]:
mlflow.search_runs(filter_string="tags.`mlflow.source.name` = 'Databricks'")

## Create Serving Endpoint

[Create custom model serving endpoints](https://docs.databricks.com/aws/en/machine-learning/model-serving/create-manage-serving-endpoints)


https://curriculum-dev.cloud.databricks.com/ml/endpoints/jacek_laskowski_demo?o=3551974319838082


`databricks serving-endpoints list | jq '.[].name'`


Once up and running, get the query schema of the serving endpoint in OpenAPI format.

<br>

```
databricks serving-endpoints get-open-api jacek_laskowski_demo
```

## ☠️ Delete Model


```
> databricks registered-models delete --help
Delete a Registered Model.

  Deletes a registered model and all its model versions from the specified
  parent catalog and schema.

  Arguments:
    FULL_NAME: The three-level (fully qualified) name of the registered model
```


```
❯ databricks registered-models delete jacek_laskowski.mlflow.sklearn_model
Error: Function 'jacek_laskowski.mlflow.sklearn_model' is not empty. The function has 3 model versions(s)
```

⚠️ Note **Function** in the error message!


What a trick! 💡

<br>

```
databricks functions delete jacek_laskowski.mlflow.sklearn_model --force
```


⚠️ Beside **Model** and **Function**, there's also **Routine** 😬

<br>

```
❯ databricks model-versions list jacek_laskowski.mlflow.sklearn_model
Error: Routine or Model 'jacek_laskowski.mlflow.sklearn_model' does not exist.
```

# That's all Folks! 👋

![Warner Bros., Public domain, via Wikimedia Commons](https://upload.wikimedia.org/wikipedia/commons/e/ea/Thats_all_folks.svg)


# 🙋 Questions and Answers


# 💡 Ideas for Future Events

➡️ [Ideas for Future Events]($./Ideas for Future Events)