# Databricks Machine Learning and MLflow Client API

Thursday, May 22, 2025

[Invitation on Luma](https://lu.ma/mnl5suva), [LinkedIn](https://www.linkedin.com/feed/update/urn:li:activity:7329845559609978881/), [Meetup](https://www.meetup.com/warsaw-data-engineering/events/307894572/)


# 📚 Agenda

1. Demo: Serve Model
1. Demo: Running MLflow's `examples/databricks` (and Editable Install in Python)
1. Demo: MLflow's `dev/pyproject.py` and uv

⏰ Całkowity czas trwania spotkania: **1h 15min**


# 📈 LinkedIn Poll

[Czy weźmiesz udział w spotkaniu stacjonarnym?](https://www.linkedin.com/feed/update/urn:li:groupPost:9307761-7327398491633156096/)


# Event Question

[O czym chciał(a)byś usłyszeć podczas meetupu? Rzuć ciekawym pomysłem na kolejne edycje](https://www.meetup.com/warsaw-data-engineering/events/307894572/attendees/) 🙏

* testing i data quality. może rozwinięcie pydantic i zobaczyć co tam DQX oferuje
* GenAI in databricks
* Więcej na temat MLflow
* jak zbudować efektywny model danych w Data Lakehouse na podstawie danych z systemu transakcyjnego

# 📢 News

Things worth watching out for...


## 🎉 New members joined Warsaw Data Engineering

[You now have 595 members!](https://www.meetup.com/warsaw-data-engineering/)

Co zainteresowało Cię w Warsaw Data Engineering Meetup, że zdecydowałaś/-eś się przyłączyć?

1. databricks


## 🚀 New Versions

What has changed in the tooling space since we last met? I.e. hunting down the features to learn more about.

* [MLflow 3.0.0rc2](https://github.com/mlflow/mlflow/releases/tag/v3.0.0rc2)
    * [MLflow 3.0 (Preview)](https://mlflow.org/docs/3.0.0rc2/mlflow-3/) will be the topic of the next meetup! 🤞
* [OpenAI Agents SDK 0.0.16](https://github.com/openai/openai-agents-python/releases/tag/v0.0.16)
    * [feat: pass extra_body through to LiteLLM acompletion #638](https://github.com/openai/openai-agents-python/pull/638)
    * [feat: Streamable HTTP support #643](https://github.com/openai/openai-agents-python/pull/643)
    * Uses [Makefile and uv](https://github.com/openai/openai-agents-python/pull/707/files) 🔥
        * Executed `make format` and got "interesting" result! 😜
* [DSPy 2.6.24](https://github.com/stanfordnlp/dspy/releases/tag/2.6.24)
    * Programming—not prompting—Foundation Models
    * [Make it easier to do sync streaming #8183](https://github.com/stanfordnlp/dspy/pull/8183)
* [fast-agent 0.2.25](https://github.com/evalstate/fast-agent/releases/tag/v0.2.25)
    * [feat: Add Azure OpenAI Service Support to FastAgent #160](https://github.com/evalstate/fast-agent/pull/160)
* [Dagster 1.10.15](https://github.com/dagster-io/dagster/releases/tag/1.10.15)
* [PydanticAI 0.2.6](https://github.com/pydantic/pydantic-ai/releases/tag/v0.2.6)


# 🧑‍💻 Demo: Install MLflow 3.0.0rc2

```
uv add --upgrade mlflow==3.0.0rc2 mlflow-skinny==3.0.0rc2
```


```
❯ uv tree --depth 1
Resolved 77 packages in 0.96ms
mlflow-sandbox v0.1.0
├── mlflow v3.0.0rc2
└── mlflow-skinny v3.0.0rc2
```

```
❯ uv run python -c 'import mlflow; print(mlflow.__version__)'
3.0.0rc2
```


# 🧑‍💻 Demo: Model Deployment

Aka _model serving_

## Train Random Forest Model (scikit-learn)


[RandomForestRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html):

1. A random forest regressor.
1. A **random forest** is a meta estimator that fits a number of decision tree regressors on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.


Quite a few examples to learn from:

* [mlflow/tracking/_model_registry/fluent.py](https://github.com/mlflow/mlflow/blob/v2.22.0/mlflow/tracking/_model_registry/fluent.py#L20-L49)
    * scikit-learn's `RandomForestRegressor`
* [Register a model to Unity Catalog using autologging](https://docs.databricks.com/aws/en/machine-learning/manage-model-lifecycle/#register-a-model-to-unity-catalog-using-autologging)
    * scikit-learn's `RandomForestClassifier`
* scikit-learn's [1.11.2. Random forests and other randomized tree ensembles](https://scikit-learn.org/stable/modules/ensemble.html#forest)
* Databricks Machine Learning's [Example notebook](https://docs.databricks.com/aws/en/machine-learning/manage-model-lifecycle/#example-notebook)

In [0]:
from sklearn.datasets import make_regression
help(make_regression)

In [0]:
# Generate a random regression problem.
from sklearn.datasets import make_regression

X, y = make_regression(n_features=4, n_informative=2, random_state=0, shuffle=False)

In [0]:
from sklearn.ensemble import RandomForestRegressor

params = {"n_estimators": 3, "random_state": 42}
rfr = RandomForestRegressor(**params).fit(X, y)
rfr

In [0]:
rfr.predict(X)


⚠️ **NOTE**

Up to this cell, the code was all scikit-learn-specific.


## Registered Models

### Databricks UI


Review [Registered Models](https://curriculum-dev.cloud.databricks.com/ml/models?o=3551974319838082) (Owned by me)

There could be one from my previous experiments.


![Registered Models](./databricks_ml_registered_models.png)

### Databricks CLI


#### No models command 😢

<br>

```
❯ databricks models
Error: unknown command "models" for "databricks"
```

#### databricks registered-models

* Databricks provides a hosted version of MLflow Model Registry in Unity Catalog
* Models in Unity Catalog provide centralized access control, auditing, lineage, and discovery of ML models across Databricks workspaces.
* An **MLflow registered model** resides in the third layer of Unity Catalog’s three-level namespace.
* Registered models contain model versions, which correspond to actual ML models (MLflow models).

⚠️ Creating new model versions currently requires use of the MLflow Python client.

⚠️ The securable type for models is **FUNCTION**. When using REST APIs (e.g. tagging, grants) that specify a securable type, use "FUNCTION" as the securable type.

Once model versions are created:

* For **batch inference**, load them using MLflow Python Client API.
* For **real-time serving**, deploy them using Databricks Model Serving.

```
❯ databricks registered-models list --catalog-name jacek_laskowski --schema-name mlflow
[
  {
    "catalog_name": "jacek_laskowski",
    "created_at": 1746973647937,
    "created_by": "jacek@japila.pl",
    "full_name": "jacek_laskowski.mlflow.sklearn_model",
    "metastore_id": "6820268e-1541-4b52-b49e-e7135e528491",
    "name": "sklearn_model",
    "owner": "jacek@japila.pl",
    "schema_name": "mlflow",
    "storage_location": "s3://curriculum-storage/6820268e-1541-4b52-b49e-e7135e528491/models/debf1786-b6b4-4119-8f31-940dac8036de",
    "updated_at": 1747324659304,
    "updated_by": "jacek@japila.pl"
  }
]
```

## Model Versions

### Databricks UI


Review [Model Versions](https://curriculum-dev.cloud.databricks.com/explore/data/models/jacek_laskowski/mlflow/sklearn_model)

There could be many from my previous experiments.


![Model Versions](./databricks_ml_model_versions.png)

### Databricks CLI


|Databricks Command|Description|
|-|-|
|`databricks model-registry`|⛔️ This API reference documents APIs for the Workspace Model Registry.|
|`databricks model-versions`|👍 Databricks provides a hosted version of MLflow Model Registry in Unity Catalog.|


```
❯ databricks model-versions list jacek_laskowski.mlflow.sklearn_model | jq 'sort_by(.version)| .[] | [.version, .storage_location]'
[
  1,
  "s3://curriculum-storage/6820268e-1541-4b52-b49e-e7135e528491/models/debf1786-b6b4-4119-8f31-940dac8036de/versions/51854a98-c4ff-460c-9b60-4883ef748022"
]
[
  2,
  "s3://curriculum-storage/6820268e-1541-4b52-b49e-e7135e528491/models/debf1786-b6b4-4119-8f31-940dac8036de/versions/1b810a18-852e-4016-a706-595a32385ec7"
]
[
  3,
  "s3://curriculum-storage/6820268e-1541-4b52-b49e-e7135e528491/models/debf1786-b6b4-4119-8f31-940dac8036de/versions/92489608-a3ab-4994-8b39-b9ab4a227a1a"
]
```

# That's all Folks! 👋

![Warner Bros., Public domain, via Wikimedia Commons](https://upload.wikimedia.org/wikipedia/commons/e/ea/Thats_all_folks.svg)


# 🙋 Questions and Answers


# 💡 Ideas for Future Events

➡️ [Ideas for Future Events]($./Ideas for Future Events)