# MLflow Tracking

The MLflow Tracking is an API and UI for logging parameters, code versions, metrics, and output files when running your machine learning code and for later visualizing the results.

## Concepts

![Taken from MLflow Docs](https://mlflow.org/docs/latest/_images/tracking-basics.png)

**Runs**

MLflow Tracking is organized around the concept of runs, which are executions of some piece of data science code, for example, a single python train.py execution.


**Experiments** 

An experiment groups together runs for a specific task. 




In [1]:
import mlflow 
from mlflow_for_ml_dev.utils.utils import get_root_project

# Create an MLflow Experiment

We can create a MLflow experiment using:

```python
mlflow.create_experiment(name: str, artifact_location: Optional[str] = None, tags: Optional[Dict[str, Any]] = None) → str
```

Parameters:

* **name** – The experiment name, which must be a unique string.

* **artifact_location** – The location to store run artifacts. If not provided, the server picks an appropriate default.

* **tags** – An optional dictionary of string keys and values to set as tags on the experiment.

In [2]:
# creating a new experiment providing the name
experiment_id = mlflow.create_experiment(name="test-experiment")
print(experiment_id)

962178648349282381


In [3]:
with mlflow.start_run(run_name="test-run") as run:
    print(f"Run ID: {run.info.run_id}")
    mlflow.log_param("param1", 1)

Run ID: 4566c1fc7d9d46dd94a726f8362a8524


## Set an experiment as the active experiment

We can set an experiment as the active experiment using:

```python
mlflow.set_experiment(experiment_name: Optional[str] = None, experiment_id: Optional[str] = None) → Experiment
```

Parameters

* **experiment_name** – Case sensitive name of the experiment to be activated.

* **experiment_id** – ID of the experiment to be activated. If an experiment with this ID does not exist, an exception is thrown.

In [4]:
experiment = mlflow.set_experiment(experiment_name="test-experiment")

In [5]:
print(f"Object Type: {type(experiment).__name__} \n")
print(experiment.to_proto())

Object Type: Experiment 

experiment_id: "962178648349282381"
name: "test-experiment"
artifact_location: "file:///c:/Users/manue/projects/mlflow_for_ml_dev/mlflow_for_ml_dev/notebooks/experiments/mlruns/962178648349282381"
lifecycle_stage: "active"
last_update_time: 1726370197024
creation_time: 1726370197024



In [6]:
with mlflow.start_run(run_name="test-run") as run:
    print(f"Run ID: {run.info.run_id}")
    mlflow.log_param("param1", 1)

Run ID: ea113fca822f4a638fb26c55768d4aa5


In [7]:
# Loggin an artifact
with mlflow.start_run(run_name="test-run") as run:
    print(f"Run ID: {run.info.run_id}")
    mlflow.log_text("This is a text artifact", "artifact.txt")


Run ID: 6ecfdcd3dbec4259ae85a54dc47d9652


## Creating an experiment using a different artifact location

In [8]:
experiment_id = mlflow.create_experiment(name="test-experiment-2", artifact_location=get_root_project().as_uri())
print(experiment_id)

378796031032188106


In [9]:
experiment = mlflow.set_experiment(experiment_name="test-experiment-2")

In [10]:
# Loggin an artifact
run_ids = []
with mlflow.start_run(run_name="test-run") as run:
    print(f"Run ID: {run.info.run_id}")
    mlflow.log_param("param1", 1)
    mlflow.log_text("This is a text artifact", "artifact.txt")

    run_ids.append(run.info.run_id)

Run ID: bb3e79f219474599b398ef040b2f4f87


In [11]:
# Loggin an artifact
with mlflow.start_run(run_name="test-run") as run:
    print(f"Run ID: {run.info.run_id}")
    mlflow.log_text("This is a text artifact", "artifact.txt")
    mlflow.log_param("param1", 1)

    run_ids.append(run.info.run_id)

Run ID: eea788bbed5a4c3b8cab1888292bc57f


### Clean Up

In [12]:
import shutil

shutil.rmtree("mlruns")
print("mlruns removed")

# remove run ids folders
for run_id in run_ids:
    shutil.rmtree(get_root_project() / run_id)
    print(f"Run {run_id} removed")

mlruns removed
Run bb3e79f219474599b398ef040b2f4f87 removed
Run eea788bbed5a4c3b8cab1888292bc57f removed


# Specify Tracking URI

We can specify the tracking URI Using:

```python
mlflow.set_tracking_uri(uri: Union[str, pathlib.Path]) → None
```
* uri
    * An empty string, or a local file path, prefixed with file:/. Data is stored locally at the provided file (or ./mlruns if empty).

    * An HTTP URI like https://my-tracking-server:5000.

    * A Databricks workspace, provided as the string “databricks” or, to use a Databricks CLI profile, “databricks://<profileName>”.
    
    * A pathlib.Path instance


In [13]:
artifact_location = get_root_project() / "mlruns"

mlflow.set_tracking_uri(artifact_location.as_uri())
print(artifact_location)

C:\Users\manue\projects\mlflow_for_ml_dev\mlruns


In [14]:
# Create experiment
experiment_name = "test-experiment-3"
experiment_id = mlflow.create_experiment(name=experiment_name)
print(experiment_id)

783704071033849895


After creating an experiment is necessary to set it as active experiment, otherwise mlflow would use the Default experiment.

In [15]:
experiment = mlflow.set_experiment(experiment_name=experiment_name)

print(f"Object Type: {type(experiment).__name__} \n")
print(experiment.to_proto())

Object Type: Experiment 

experiment_id: "783704071033849895"
name: "test-experiment-3"
artifact_location: "file:///C:/Users/manue/projects/mlflow_for_ml_dev/mlruns/783704071033849895"
lifecycle_stage: "active"
last_update_time: 1726370197570
creation_time: 1726370197570



In [16]:
with mlflow.start_run(run_name="test-run") as run:
    
    print(f"Run ID: {run.info.run_id}")

    mlflow.log_param("param1", 1)
    mlflow.log_text("This is a text artifact", "artifact.txt")

Run ID: 3bbd4ec48e774c19a697b14cac1d8f2f


### Error!

In [17]:
mlflow.create_experiment(name=experiment_name)

MlflowException: Experiment 'test-experiment-3' already exists.

# Two Options

## Capture the exception

In [20]:
try:
    experiment = mlflow.create_experiment(name=experiment_name)
except Exception as e:
    print(f"Error: {e}")

Error: Experiment 'test-experiment-3' already exists.


## Use `mlflow.set_experiment`

In [21]:
# Since the experiment already exists, we can set it as the active experiment
experiment = mlflow.set_experiment(experiment_name = experiment_name)

# we can also create the experiment if it does not exist
new_experiment_name = "main-concepts"
experiment = mlflow.set_experiment(experiment_name = new_experiment_name)

print(f"Object Type: {type(experiment).__name__} \n")
print(experiment.to_proto())

2024/09/14 22:17:01 INFO mlflow.tracking.fluent: Experiment with name 'main-concepts' does not exist. Creating a new experiment.


Object Type: Experiment 

experiment_id: "830047635022214398"
name: "main-concepts"
artifact_location: "file:///C:/Users/manue/projects/mlflow_for_ml_dev/mlruns/830047635022214398"
lifecycle_stage: "active"
last_update_time: 1726370221634
creation_time: 1726370221634



In [22]:
# adding some data to the experiment
with mlflow.start_run(run_name="test-run") as run:
    
    print(f"Run ID: {run.info.run_id}")

    mlflow.log_param("param1", 1)
    mlflow.log_text("This is a text artifact", "artifact.txt")

Run ID: 47c3f3bdb5e442eeb103c876ec3c2d95


In [23]:
from mlflow_for_ml_dev.experiments.exp_utils import get_or_create_experiment

In [24]:
experiment = get_or_create_experiment(experiment_name = new_experiment_name)

print(f"Object Type: {type(experiment).__name__} \n")
print(experiment.to_proto())

Object Type: Experiment 

experiment_id: "830047635022214398"
name: "main-concepts"
artifact_location: "file:///C:/Users/manue/projects/mlflow_for_ml_dev/mlruns/830047635022214398"
lifecycle_stage: "active"
last_update_time: 1726370221634
creation_time: 1726370221634



## Adding tags

In [25]:
experiment_name = "main-concepts-02"
experiment = get_or_create_experiment(
    experiment_name=experiment_name,
    tags={"topic":"experiment_management", "project_name":"UNKNOWN"}
)

In [26]:
# get the experiment tags
experiment.tags

{'project_name': 'UNKNOWN', 'topic': 'experiment_management'}

## Adding a description

In [27]:
experiment_name = "main-concepts-03"
experiment = get_or_create_experiment(
    experiment_name=experiment_name,
    tags={
        "topic":"experiment_management",
        "project_name":"UNKNOWN",
        "mlflow.note.content":"This is a test experiment"})

In [28]:
# get the experiment tags
experiment.tags

{'mlflow.note.content': 'This is a test experiment',
 'project_name': 'UNKNOWN',
 'topic': 'experiment_management'}

## Update Tags

In [29]:
experiment_name = "main-concepts-04"
experiment = get_or_create_experiment(experiment_name)
# experiment tags
experiment.tags

{}

In [30]:
tags = {
    "tag1": "value1",
    "tag2": "value2"
}
mlflow.set_experiment_tags(tags=tags)


In [31]:
# get the updated experiment object
experiment = get_or_create_experiment(experiment_name)

# get the experiment tags
experiment.tags

{'tag1': 'value1', 'tag2': 'value2'}

In [32]:
# Update Value of tag1
mlflow.set_experiment_tag(key="tag1", value="new_value1")

# get the updated experiment object
experiment = get_or_create_experiment(experiment_name)

In [33]:
# get the experiment tags
experiment.tags

{'tag1': 'new_value1', 'tag2': 'value2'}

## Using the client to set a tag

In [34]:
client = mlflow.MlflowClient()

In [35]:
experiment.experiment_id

'919778303368787007'

In [36]:
client.set_experiment_tag(experiment_id = experiment.experiment_id, key="tag3", value="value3")

In [37]:
experiment = get_or_create_experiment(experiment_name)

# get the experiment tags
experiment.tags

{'tag1': 'new_value1', 'tag2': 'value2', 'tag3': 'value3'}

## Rename Experiment

In [38]:
new_name = "main-concepts-04-renamed"
client.rename_experiment(experiment_id = experiment.experiment_id, new_name=new_name)

In [39]:
experiment = get_or_create_experiment(new_name)

experiment.name

'main-concepts-04-renamed'

## Clean Up

In [40]:
main_concepts_experiments = mlflow.search_experiments(filter_string="name LIKE 'main-concepts%'")
test_experiments = mlflow.search_experiments(filter_string="name LIKE 'test-experiment%'")
experiments = main_concepts_experiments + test_experiments

In [41]:
for exp in experiments:
    print(exp.experiment_id, exp.name, exp.tags)

919778303368787007 main-concepts-04-renamed {'tag1': 'new_value1', 'tag2': 'value2', 'tag3': 'value3'}
515914290847591603 main-concepts-03 {'mlflow.note.content': 'This is a test experiment', 'project_name': 'UNKNOWN', 'topic': 'experiment_management'}
597953269128446408 main-concepts-02 {'project_name': 'UNKNOWN', 'topic': 'experiment_management'}
830047635022214398 main-concepts {}
783704071033849895 test-experiment-3 {}


In [42]:
for experiment in experiments:
    print(f"Deleting: {experiment.name}")
    mlflow.delete_experiment(experiment.experiment_id)

Deleting: main-concepts-04-renamed
Deleting: main-concepts-03
Deleting: main-concepts-02
Deleting: main-concepts
Deleting: test-experiment-3
