
# Intro to MLflow (with uv)

Thursday, March 6, 2025

[lu.ma](https://lu.ma/ew7alfw1) ([meetup](https://www.meetup.com/warsaw-data-engineering/events/306501683/))


# Agenda

1. **10 minut** Ogłoszenia. Czas na szalone pomysły na przyszłe meetupy 👻
    * News (new versions, new features, etc.)
1. **55 minut** Live coding session z MLflow i uv:
    * ​Konfiguracja projektu
    * [​MLflow Tracking Quickstart](https://mlflow.org/docs/latest/getting-started/intro-quickstart/index.html)
    * ​(potencjalnie) MLflow i Databricks Asset Bundles
1. **10 minut** Q&A i zbieranie pomysłów na kolejne edycje

Całkowity czas trwania meetupu: **1h 15min**


# LinkedIn Poll

[Poll](https://www.linkedin.com/feed/update/urn:li:activity:7302063410647638016?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAutA8BdJM8iCUgt2VrqrjL8sihttmw9FE)


# Event Question

O czym chciał(a)byś usłyszeć podczas meetupu? Rzuć ciekawym pomysłem na kolejne edycje 🙏

1. Apache Flink
1. streaming - checkpoints, offsets, commits


# 📢 News

Things worth watching out for...


## New members in Warsaw Data Engineering!

1. Rozwój wiedzy
1. data enginnering


## New Versions

What has changed in the tooling space we keep an eye on since we last met?

* [Databricks CLI 0.243.0](https://github.com/databricks/cli/releases/tag/v0.243.0)
* [MLflow 2.20.3](https://github.com/mlflow/mlflow/releases/tag/v2.20.3)
* [uv 0.6.4](https://github.com/astral-sh/uv/releases/tag/0.6.4)
* [Pydantic 2.10.6](https://github.com/pydantic/pydantic/releases/tag/v2.10.6)
* [brickflow 1.3.2](https://github.com/Nike-Inc/brickflow/releases/tag/v1.3.2)
* [dqx 0.1.13](https://github.com/databrickslabs/dqx/releases/tag/v0.1.13)
* [ruff 0.9.9](https://github.com/astral-sh/ruff/releases/tag/0.9.9)


# 👀 In the spotlight: `pytest`

It is one of the regular sections in our schedule until we run out of...interest to dig deeper and learn more.


## Parametrizing tests

[Parametrizing tests](https://docs.pytest.org/en/stable/example/parametrize.html)

1. Create a brand new python sandbox project (`uv init --bare`)
1. Add a dev / test dependency on `pytest`
1. Write a test to demo the feature


```py
import pytest


@pytest.mark.parametrize("version", [0, "0"])
def test_one(version):
    print(f"version={version} (type: {type(version)})")
    assert True
```


# Live Coding Session


Asked Copilot:

> Imagine you're an expert in mlflow. Please give an advice on how I can get started with mlflow? Write your advice in 5 action items.

And Copilot gave me 5 action items that are the headings in the document.

According to Copilot, by following these steps, you'll be well on your way to leveraging MLflow for managing your machine learning lifecycle.


## Create MLflow Project

In [0]:
%sh

echo uv init --bare mlflow-sandbox


## Install MLflow

Copilot:

> Installing MLflow is the first step. Ensure you have Python and pip installed on your system. This will set up the necessary environment to start using MLflow.

We'll use `uv` instead.

In [0]:
%sh

echo uv add mlflow


## Set Up a Tracking Server

Copilot:

> Start the MLflow tracking server.
> 
> Setting up a tracking server allows you to log and visualize your experiments.
> Using SQLite for the backend store is a simple way to get started, and you can store artifacts locally.

Again, we'll use `uv` for this.

> [!NOTE]
> 
> MLflow tracking server is also known as [MLflow Experiment Tracking Server](https://github.com/mlflow/mlflow/issues/6348).

In [0]:
%sh

echo uv run mlflow server \
    --host 127.0.0.1 \
    --port 8080


## Log Your First Experiment

Copilot:

> Use MLflow to log parameters, metrics, and models in your code.
> 
> Logging experiments helps you keep track of different runs and their outcomes.
> This is crucial for comparing models and understanding what works best.

Run the Python interpreter and mimic a regular experiment job.

<br>

```bash
uv run python
```

<br>

Type the following code in the Python interpreter.

<br>

```py
mlflow.set_tracking_uri(uri="http://127.0.0.1:8080")

# Create a new MLflow Experiment
mlflow.set_experiment("Meetup Demo")

with mlflow.start_run():
     mlflow.log_params(params)
     mlflow.log_metric("accuracy", accuracy)
     mlflow.set_tag("Training Info", "Basic LR model for iris data")
     signature = infer_signature(X_train, lr.predict(X_train))
     model_info = mlflow.sklearn.log_model(sk_model=lr,
         artifact_path="iris_model",
         signature=signature,
         input_example=X_train,
         registered_model_name="tracking-quickstart",
     )
```


## Explore the MLflow UI

Copilot:

> Access the MLflow UI by navigating to http://localhost:5000 in your browser.
> 
> The MLflow UI provides a visual interface to explore your experiments.
> You can compare runs, visualize metrics, and manage artifacts, making it easier to analyze your results.


## Integrate MLflow with Your Workflow

Copilot:

> Incorporate MLflow into your existing machine learning workflow.
> 
> MLflow can be integrated with various ML libraries like TensorFlow, PyTorch, and Scikit-learn.
> This allows you to log models and parameters seamlessly within your existing projects, enhancing reproducibility and collaboration.


## Follow-Up Prompts

1. Can you explain how to deploy models with MLflow?
    * Save (and Register) Your Model
    * Serve the Model
    * Create a Docker Image
    * Deploy to a Cloud Service / Deploy the Model to SageMaker
    * Monitor and Update the Model


## Learn More

1. https://mlflow.org/docs/latest/deployment/deploy-model-to-sagemaker.html
1. https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow-track-experiments-model-deployment.html
1. https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow-tutorials.html
1. https://aws.amazon.com/blogs/aws/manage-ml-and-generative-ai-experiments-using-amazon-sagemaker-with-mlflow/
1. https://aws.amazon.com/blogs/machine-learning/managing-your-machine-learning-lifecycle-with-mlflow-and-amazon-sagemaker/


# uv sync -U


# uv as the learning environment

<br>

```
uv init --bare uv-spark
uv add spark
uv run spark
```


# 💡 Ideas for Future Events

1. [Delta Live Tables](https://docs.databricks.com/en/delta-live-tables/index.html) with uv and pydantic
1. Explore more [Pydantic](https://docs.pydantic.dev/latest/) features
1. Create a new DAB template with `uv` as the project management tool (based on `default-python` template). Start from `databricks bundle init --help`.
