
# Meetup Next

This notebook should help you set up your next great meetup ❤️

1. Fill in the fields with **FIXME** marker
1. Announce the meetup
    * [lu.ma](https://lu.ma/warsaw-data-engineering)
    * [LinkedIn](https://www.linkedin.com/groups/9307761/)
    * (Optionally) [meetup](https://www.meetup.com/warsaw-data-engineering/) for a greater visibility
1. Remove this cell once all the action items are done

# Model Lifecycle in Databricks Machine Learning

Thursday, May 29, 2025

[Invitation on Luma](https://lu.ma/2c4p2267), [LinkedIn](https://www.linkedin.com/feed/update/urn:li:activity:7332401688881815553/), [Meetup](https://www.meetup.com/warsaw-data-engineering/events/308041585/)


# 📚 Agenda

1. Demo: Model Lifecycle (From Training to Serving)
1. Demo: Running MLflow's examples/databricks (and Editable Install in Python)
1. Demo: MLflow's dev/pyproject.py and uv
1. Bonus Demo 😉

⏰ Całkowity czas trwania spotkania: **1h 15min**


# 📈 LinkedIn Poll

[Czy weźmiesz udział w spotkaniu stacjonarnym?](https://www.linkedin.com/feed/update/urn:li:groupPost:9307761-7327398491633156096/)

![Poll Results](./poll-meetup-stationary.png)


# FIXME 🙋‍♀️ Event Question

[O czym chciał(a)byś usłyszeć podczas meetupu? Rzuć ciekawym pomysłem na kolejne edycje](https://www.meetup.com/warsaw-data-engineering/events/308041585//attendees/) 🙏

1. FIXME

# 📢 News

Things worth watching out for...


## FIXME New members in Warsaw Data Engineering!

[You now have 595 members!](https://www.meetup.com/warsaw-data-engineering/)

Co zainteresowało Cię w Warsaw Data Engineering Meetup, że zdecydowałaś/-eś się przyłączyć?

1. FIXME


## FIXME 🚀 New Versions

What has changed in the tooling space since we last met? I.e. hunting down the features to learn more about.

* [MLflow 3.0.0rc2](https://github.com/mlflow/mlflow/releases/tag/v3.0.0rc2)
    * [MLflow 3.0 (Preview)](https://mlflow.org/docs/3.0.0rc2/mlflow-3/) will be the topic of the next meetup! 🤞
* [OpenAI Agents SDK 0.0.16](https://github.com/openai/openai-agents-python/releases/tag/v0.0.16)
    * [feat: pass extra_body through to LiteLLM acompletion #638](https://github.com/openai/openai-agents-python/pull/638)
    * [feat: Streamable HTTP support #643](https://github.com/openai/openai-agents-python/pull/643)
    * Uses [Makefile and uv](https://github.com/openai/openai-agents-python/pull/707/files) 🔥
        * Executed `make format` and got "interesting" result! 😜
* [DSPy 2.6.24](https://github.com/stanfordnlp/dspy/releases/tag/2.6.24)
    * Programming—not prompting—Foundation Models
    * [Make it easier to do sync streaming #8183](https://github.com/stanfordnlp/dspy/pull/8183)
* [fast-agent 0.2.25](https://github.com/evalstate/fast-agent/releases/tag/v0.2.25)
    * [feat: Add Azure OpenAI Service Support to FastAgent #160](https://github.com/evalstate/fast-agent/pull/160)
* [Dagster 1.10.15](https://github.com/dagster-io/dagster/releases/tag/1.10.15)
* [PydanticAI 0.2.6](https://github.com/pydantic/pydantic-ai/releases/tag/v0.2.6)


# 🧑‍💻 Demo: Model Lifecycle (From Training to Serving)

## Create Schema

⚠️ This is Unity Catalog at play here. No MLflow yet.

A model is a directory that...FIXME

In [0]:
%sql

CREATE SCHEMA IF NOT EXISTS jacek_laskowski.mlflow


👉 [jacek_laskowski.mlflow](https://curriculum-dev.cloud.databricks.com/explore/data/jacek_laskowski/mlflow)

## Train Model

⚠️ This is a pure scikit-learn. No MLflow. No Databricks.

In [0]:
from sklearn.datasets import make_regression
help(make_regression)

In [0]:
# Generate a random regression problem.
from sklearn.datasets import make_regression

X, y = make_regression(n_features=4, n_informative=2, random_state=0, shuffle=False)

In [0]:
from sklearn.ensemble import RandomForestRegressor

params = {"n_estimators": 3, "random_state": 42}
rfr = RandomForestRegressor(**params).fit(X, y)
rfr

In [0]:
rfr.predict(X)

## Register Model

This is what `mlflow.log_model` is all about.

This is MLflow! ❤️

In [0]:
%pip install mlflow-skinny[databricks]

In [0]:
%restart_python

In [0]:
import mlflow

help(mlflow.sklearn.log_model)

# Log a scikit-learn model as an MLflow artifact for the current run.
# Produces an MLflow Model containing the following two flavors:
# 1. mlflow.sklearn
# 2. mlflow.pyfunc (only for scikit-learn models that define `predict()` that is required for pyfunc model inference)

In [0]:
from mlflow.models import infer_signature

with mlflow.start_run() as run:
    rfr = RandomForestRegressor(**params).fit(X, y)
    signature = infer_signature(X, rfr.predict(X))
    mlflow.log_params(params)
    mlflow.sklearn.log_model(
        sk_model=rfr,
        artifact_path="sklearn-model",
        signature=signature,
        registered_model_name="jacek_laskowski.mlflow.sklearn_model",
    )


## Registered Models

### Databricks UI


Review [Registered Models](https://curriculum-dev.cloud.databricks.com/ml/models?o=3551974319838082) (Owned by me)

There could be one from my previous experiments.


![Registered Models](./databricks_ml_registered_models.png)

### Databricks CLI


#### No models command 😢

<br>

```
❯ databricks models
Error: unknown command "models" for "databricks"
```

#### databricks registered-models

* Databricks provides a hosted version of MLflow Model Registry in Unity Catalog
* Models in Unity Catalog provide centralized access control, auditing, lineage, and discovery of ML models across Databricks workspaces.
* An **MLflow registered model** resides in the third layer of Unity Catalog’s three-level namespace.
* Registered models contain model versions, which correspond to actual ML models (MLflow models).

⚠️ Creating new model versions currently requires use of the MLflow Python client.

⚠️ The securable type for models is **FUNCTION**. When using REST APIs (e.g. tagging, grants) that specify a securable type, use "FUNCTION" as the securable type.

Once model versions are created:

* For **batch inference**, load them using MLflow Python Client API.
* For **real-time serving**, deploy them using Databricks Model Serving.

```
❯ databricks registered-models list --catalog-name jacek_laskowski --schema-name mlflow
[
  {
    "catalog_name": "jacek_laskowski",
    "created_at": 1746973647937,
    "created_by": "jacek@japila.pl",
    "full_name": "jacek_laskowski.mlflow.sklearn_model",
    "metastore_id": "6820268e-1541-4b52-b49e-e7135e528491",
    "name": "sklearn_model",
    "owner": "jacek@japila.pl",
    "schema_name": "mlflow",
    "storage_location": "s3://curriculum-storage/6820268e-1541-4b52-b49e-e7135e528491/models/debf1786-b6b4-4119-8f31-940dac8036de",
    "updated_at": 1747324659304,
    "updated_by": "jacek@japila.pl"
  }
]
```

## Model Versions

### Databricks UI


Review [Model Versions](https://curriculum-dev.cloud.databricks.com/explore/data/models/jacek_laskowski/mlflow/sklearn_model)

There could be many from my previous experiments.


![Model Versions](./databricks_ml_model_versions.png)

### Databricks CLI


|Databricks Command|Description|
|-|-|
|`databricks model-registry`|⛔️ This API reference documents APIs for the Workspace Model Registry.|
|`databricks model-versions`|👍 Databricks provides a hosted version of MLflow Model Registry in Unity Catalog.|


```
❯ databricks model-versions list jacek_laskowski.mlflow.sklearn_model | jq 'sort_by(.version)| .[] | [.version, .storage_location]'
[
  1,
  "s3://curriculum-storage/6820268e-1541-4b52-b49e-e7135e528491/models/debf1786-b6b4-4119-8f31-940dac8036de/versions/51854a98-c4ff-460c-9b60-4883ef748022"
]
[
  2,
  "s3://curriculum-storage/6820268e-1541-4b52-b49e-e7135e528491/models/debf1786-b6b4-4119-8f31-940dac8036de/versions/1b810a18-852e-4016-a706-595a32385ec7"
]
[
  3,
  "s3://curriculum-storage/6820268e-1541-4b52-b49e-e7135e528491/models/debf1786-b6b4-4119-8f31-940dac8036de/versions/92489608-a3ab-4994-8b39-b9ab4a227a1a"
]
```

## Experiments


An **experiment** is an top-level organizational unit in MLflow.

An experiment is a collection of one or many **Runs**.

A **Run** (_training_) is an execution of a ML code that is a part of a single experiment. A run is a collection of model training metadata and artifacts.

An experiment run in MLflow has always the default [PyFunc](https://mlflow.org/docs/latest/api_reference/python_api/mlflow.pyfunc.html) flavor, a MLflow wrapper (code) around a native model.

Experiments are divided into:

* Notebook experiments that are code in a notebook
* Workspace experiments that you specify the path manually (it could be a notebook, too).


### Experiment Tracking

[Databricks UI](https://curriculum-dev.cloud.databricks.com/ml/experiments?o=3551974319838082) and [MLflowClient.search_runs](https://mlflow.org/docs/latest/getting-started/logging-first-model/step2-mlflow-client)

## Create Serving Endpoint

[Create custom model serving endpoints](https://docs.databricks.com/aws/en/machine-learning/model-serving/create-manage-serving-endpoints)


https://curriculum-dev.cloud.databricks.com/ml/endpoints/jacek_laskowski_demo?o=3551974319838082


```
databricks serving-endpoints list | grep jacek_laskowski
```


Once up and running, get the query schema of the serving endpoint in OpenAPI format.

<br>

```
databricks serving-endpoints get-open-api jacek_laskowski_demo
```

## ☠️ Delete Model


```
> databricks registered-models delete --help
Delete a Registered Model.

  Deletes a registered model and all its model versions from the specified
  parent catalog and schema.

  Arguments:
    FULL_NAME: The three-level (fully qualified) name of the registered model
```


```
❯ databricks registered-models delete jacek_laskowski.mlflow.sklearn_model
Error: Function 'jacek_laskowski.mlflow.sklearn_model' is not empty. The function has 3 model versions(s)
```

⚠️ Note **Function** in the error message!


What a trick! 💡

<br>

```
databricks functions delete jacek_laskowski.mlflow.sklearn_model --force
```


⚠️ Beside **Model** and **Function**, there's also **Routine** 😬

<br>

```
❯ databricks model-versions list jacek_laskowski.mlflow.sklearn_model
Error: Routine or Model 'jacek_laskowski.mlflow.sklearn_model' does not exist.
```


# 🧑‍💻 Demo: Running MLflow's examples/databricks (and Editable Install in Python)

[examples/databricks](https://github.com/mlflow/mlflow/tree/master/examples/databricks)


## Step 0. Clone MLflow Repo

`git clone` https://github.com/mlflow/mlflow


## Step 1. Install Dependencies


```
uv pip install databricks-connect
uv pip install scikit-learn
```


## Step 2. Run Experiment


```
❯ python examples/databricks/dbconnect.py --cluster-id xxx
2025/05/08 17:51:04 INFO mlflow.tracking.fluent: Experiment with name '/Users/jacek@japila.pl/dbconnect' does not exist. Creating a new experiment.
🏃 View run smiling-ox-667 at: https://curriculum-dev.cloud.databricks.com/ml/experiments/1275781889574864/runs/b88fd8406e7d410bac8992258093ef5d
🧪 View experiment at: https://curriculum-dev.cloud.databricks.com/ml/experiments/1275781889574864
Traceback (most recent call last):
  File "/Users/jacek/oss/mlflow/examples/databricks/dbconnect.py", line 56, in <module>
    main()
    ~~~~^^
  File "/Users/jacek/oss/mlflow/examples/databricks/dbconnect.py", line 37, in main
    model_info = mlflow.sklearn.log_model(model, name="model", signature=signature)
TypeError: log_model() got an unexpected keyword argument 'name'
```


## Step 3. Editable Install

[Development Mode (a.k.a. “Editable Installs”)](https://setuptools.pypa.io/en/latest/userguide/development_mode.html)


```
uv pip install -e .
```


```
❯ python examples/databricks/dbconnect.py --cluster-id xxx
🏃 View run carefree-duck-680 at: https://curriculum-dev.cloud.databricks.com/ml/experiments/1275781889574864/runs/89a774fb54da4a5c844764d3e40ad638
🧪 View experiment at: https://curriculum-dev.cloud.databricks.com/ml/experiments/1275781889574864
Traceback (most recent call last):
  File "/Users/jacek/oss/mlflow/examples/databricks/dbconnect.py", line 56, in <module>
    main()
    ~~~~^^
  File "/Users/jacek/oss/mlflow/examples/databricks/dbconnect.py", line 37, in main
    model_info = mlflow.sklearn.log_model(model, name="model", signature=signature)
  File "/Users/jacek/oss/mlflow/mlflow/sklearn/__init__.py", line 426, in log_model
    return Model.log(
           ~~~~~~~~~^
        artifact_path=artifact_path,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<18 lines>...
        model_id=model_id,
        ^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/jacek/oss/mlflow/mlflow/models/model.py", line 928, in log
    model = mlflow.initialize_logged_model(
        # TODO: Update model name
    ...<6 lines>...
        else None,
    )
  File "/Users/jacek/oss/mlflow/mlflow/tracking/fluent.py", line 2122, in initialize_logged_model
    model = _create_logged_model(
        name=name,
    ...<4 lines>...
        experiment_id=experiment_id,
    )
  File "/Users/jacek/oss/mlflow/mlflow/tracking/fluent.py", line 2232, in _create_logged_model
    return MlflowClient().create_logged_model(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        experiment_id=experiment_id,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<4 lines>...
        model_type=model_type,
        ^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/jacek/oss/mlflow/mlflow/tracking/client.py", line 5218, in create_logged_model
    return self._tracking_client.create_logged_model(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        experiment_id, name, source_run_id, tags, params, model_type
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/jacek/oss/mlflow/mlflow/tracking/_tracking_service/client.py", line 815, in create_logged_model
    return self.store.create_logged_model(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        experiment_id=experiment_id,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<8 lines>...
        model_type=model_type,
        ^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/jacek/oss/mlflow/mlflow/store/tracking/rest_store.py", line 904, in create_logged_model
    response_proto = self._call_endpoint(CreateLoggedModel, req_body)
  File "/Users/jacek/oss/mlflow/mlflow/store/tracking/rest_store.py", line 129, in _call_endpoint
    return call_endpoint(
        self.get_host_creds(),
    ...<4 lines>...
        retry_timeout_seconds=retry_timeout_seconds,
    )
  File "/Users/jacek/oss/mlflow/mlflow/utils/rest_utils.py", line 474, in call_endpoint
    response = verify_rest_response(response, endpoint)
  File "/Users/jacek/oss/mlflow/mlflow/utils/rest_utils.py", line 261, in verify_rest_response
    raise RestException(json.loads(response.text))
mlflow.exceptions.RestException: BAD_REQUEST: This API is not enabled.
```

## Step 4. BAD_REQUEST: This API is not enabled.

Hunting down the root cause of the exception.


Modify `mlflow/utils/rest_utils.py:261`


```
❯ python examples/databricks/dbconnect.py --cluster-id xxx
>>> endpoint /api/2.0/mlflow/experiments/get-by-name
>>> endpoint /api/2.0/mlflow/runs/create
>>> endpoint /api/2.0/mlflow/runs/get
>>> endpoint /api/2.0/mlflow/logged-models
>>> endpoint /api/2.0/mlflow/runs/get
🏃 View run zealous-worm-360 at: https://curriculum-dev.cloud.databricks.com/ml/experiments/1275781889574864/runs/8cad690b9bdd45ab96658987f4039180
🧪 View experiment at: https://curriculum-dev.cloud.databricks.com/ml/experiments/1275781889574864
>>> endpoint /api/2.0/mlflow/runs/update
Traceback (most recent call last):
  File "/Users/jacek/oss/mlflow/examples/databricks/dbconnect.py", line 56, in <module>
    main()
    ~~~~^^
...
  File "/Users/jacek/oss/mlflow/mlflow/utils/rest_utils.py", line 262, in verify_rest_response
    raise RestException(json.loads(response.text))
mlflow.exceptions.RestException: BAD_REQUEST: This API is not enabled.
```


## Step 5. MLflow API reference

[MLflow API reference](https://docs.databricks.com/aws/en/reference/mlflow-api)


### Experiments

[Experiments](https://docs.databricks.com/api/workspace/experiments)

1. **Experiments** are the primary unit of organization in MLflow.
1. All **MLflow runs** belong to an experiment.
1. Each experiment lets you visualize, search, and compare runs, as well as download run artifacts or metadata for analysis in other tools.
1. Experiments are maintained in a Databricks-hosted MLflow tracking server.
1. Experiments are located in the workspace file tree.
1. You manage experiments using the same tools you use to manage other workspace objects such as folders, notebooks, and libraries.

### Databricks CLI

<br>

```
❯ databricks | more
...
Machine Learning
  experiments                            Experiments are the primary unit of organization in MLflow; all MLflow runs belong to an experiment.
  model-registry                         Note: This API reference documents APIs for the Workspace Model Registry.
Real-time Serving
  serving-endpoints                      The Serving Endpoints API allows you to create, update, and delete model serving endpoints.
Unity Catalog
  model-versions                         Databricks provides a hosted version of MLflow Model Registry in Unity Catalog.
  registered-models                      Databricks provides a hosted version of MLflow Model Registry in Unity Catalog.
...
```


```
❯ databricks experiments list-experiments | jq '.[].name' | grep 'jacek@japila.pl'
"/Users/jacek@japila.pl/dbconnect"
"/Users/jacek@japila.pl/demo-experiment"
```


```
databricks registered-models list | jq '.[].full_name'
```

# 🧑‍💻 Demo: MLflow's dev/pyproject.py and uv

1. What I learnt while reviewing the source code of MLflow and having found [dev/pyproject.py](https://github.com/mlflow/mlflow/blob/master/dev/pyproject.py) to execute locally.
1. And how uv helped.

Why it even matters?! 🤨


## Step 0. Clone MLflow Repo

`git clone` https://github.com/mlflow/mlflow


## Step 1. uvx python dev/pyproject.py

<br>

```
❯ uvx python dev/pyproject.py
Traceback (most recent call last):
  File "/Users/jacek/oss/mlflow/./dev/pyproject.py", line 10, in <module>
    import toml
ModuleNotFoundError: No module named 'toml'
```


## Step 2. Set Up Dev Env


`uv venv .dev_pyproject_py_deep_dive`

`source .dev_pyproject_py_deep_dive/bin/activate`


## Step 3. Virtual Envs in Python

Please note that I'm a JVM dev (and only very recently switched to Python).


`uv pip install toml`

`python ./dev/pyproject.py`

`type python` and it finally clicked how virtual envs work 🔥

[venv — Creation of virtual environments](https://docs.python.org/3/library/venv.html)

## Step 4. It Works 🥳


`uv pip install pyyaml`

> ⚠️ NOTE
>
> All the dev deps are in [dev/requirements.txt](https://github.com/mlflow/mlflow/blob/master/dev/requirements.txt)

`uv pip install packaging`

`brew install taplo`

`python ./dev/pyproject.py` seems to change nothing, huh?! 🤨

💎 Think what the script does and you will know why nothing seems changed 😉

# ✨ Bonus Demo ✨

[RestException: INVALID_PARAMETER_VALUE while searching for registered models from model registry](https://stackoverflow.com/q/79630371/1305344)

# That's all Folks! 👋

![Warner Bros., Public domain, via Wikimedia Commons](https://upload.wikimedia.org/wikipedia/commons/e/ea/Thats_all_folks.svg)


# 🙋 Questions and Answers


# 💡 Ideas for Future Events

➡️ [Ideas for Future Events]($./Ideas for Future Events)