# Notebook showing artifact retrieval using mlflow run and experiment tracking and model registry

This notebook highlights the functionality of mlflow artifact versioning per run.

To follow the steps first outlined in the cells, please first enter: 
``` 
mlflow server
```
in the terminal window opened in the subdirectory of the demo (current pwd). I.e here it would be:

In [1]:
import os
import logging

In [2]:
import enginora

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
os.environ['WANDB_DISABLED'] = 'true'

logging.basicConfig(format="[%(filename)s:%(lineno)s - %(funcName)20s() ] %(message)s", level=logging.INFO)

In [4]:
enginora.loop()

[MlflowManager.py:69 -     set_tracking_uri() ] mlflow: Tracking uri set


Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForSequenceClassification: ['cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.dense.weight']
- This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at b

{'loss': 2.6633, 'learning_rate': 0.0005, 'epoch': 1.0}


                                              
 50%|█████     | 5/10 [00:17<00:12,  2.58s/it]Saving model checkpoint to ClassificationBERT\checkpoint-5
Configuration saved in ClassificationBERT\checkpoint-5\config.json


{'eval_loss': 2.6577999591827393, 'eval_f1_score': 0.04761904761904762, 'eval_runtime': 4.2622, 'eval_samples_per_second': 2.346, 'eval_steps_per_second': 1.173, 'epoch': 1.0}


Model weights saved in ClassificationBERT\checkpoint-5\pytorch_model.bin
100%|██████████| 10/10 [00:32<00:00,  3.01s/it]***** Running Evaluation *****
  Num examples = 10
  Batch size = 2


{'loss': 1.703, 'learning_rate': 0.0, 'epoch': 2.0}


                                               
100%|██████████| 10/10 [00:36<00:00,  3.01s/it]Saving model checkpoint to ClassificationBERT\checkpoint-10
Configuration saved in ClassificationBERT\checkpoint-10\config.json


{'eval_loss': 3.1825222969055176, 'eval_f1_score': 0.0, 'eval_runtime': 4.2376, 'eval_samples_per_second': 2.36, 'eval_steps_per_second': 1.18, 'epoch': 2.0}


Model weights saved in ClassificationBERT\checkpoint-10\pytorch_model.bin


Training completed. Do not forget to share your model on huggingface.co/models =)


Loading best model from ClassificationBERT\checkpoint-5 (score: 0.04761904761904762).
100%|██████████| 10/10 [00:39<00:00,  3.92s/it]


{'train_runtime': 39.3932, 'train_samples_per_second': 0.508, 'train_steps_per_second': 0.254, 'train_loss': 2.1831647872924806, 'epoch': 2.0}


Registered model 'my_registered_model' already exists. Creating a new version of this model...
2023/09/01 00:05:59 INFO mlflow.tracking._model_registry.client: Waiting up to 300 seconds for model version to finish creation. Model name: my_registered_model, version 6
Created version '6' of model 'my_registered_model'.
[MlflowManager.py:65 -            log_model() ] mlflow: logging model
***** Running Prediction *****
  Num examples = 10
  Batch size = 2
100%|██████████| 5/5 [00:03<00:00,  1.37it/s][MlflowManager.py:46 -          log_metrics() ] mlflow: logging metrics :test_loss_test test_f1_score_test test_runtime_test test_samples_per_second_test test_steps_per_second_test
***** Running Prediction *****
  Num examples = 10
  Batch size = 2
10it [00:07,  1.21it/s]                      [MlflowManager.py:46 -          log_metrics() ] mlflow: logging metrics :test_loss_cont test_f1_score_cont test_runtime_cont test_samples_per_second_cont test_steps_per_second_cont
10it [00:07,  1.31it/s]

{'train_results': TrainOutput(global_step=10, training_loss=2.1831647872924806, metrics={'train_runtime': 39.3932, 'train_samples_per_second': 0.508, 'train_steps_per_second': 0.254, 'train_loss': 2.1831647872924806, 'epoch': 2.0}),
 'test_results': {'test_loss': 2.2039055824279785,
  'test_f1_score': 0.1111111111111111,
  'test_runtime': 4.1144,
  'test_samples_per_second': 2.431,
  'test_steps_per_second': 1.215},
 'control_results': {'test_loss': 2.461996555328369,
  'test_f1_score': 0.0,
  'test_runtime': 4.2001,
  'test_samples_per_second': 2.381,
  'test_steps_per_second': 1.19}}

In order to see what metrics were logged during particular runs please use the following:

In [7]:
from enginora.utils.mlflow.MlflowManager import MlflowManager
MlflowManager.display_metrics(experiment_names = ["my_experiment_3"]);

The metrics logged which you can retrieve from this experiment are: 

 - loss
 - test_steps_per_second_test
 - learning_rate
 - train_samples_per_second
 - train_steps_per_second
 - eval_runtime
 - eval_f1_score
 - epoch
 - train_loss
 - test_f1_score_cont
 - test_runtime_cont
 - eval_loss
 - total_flos
 - test_f1_score_test
 - test_steps_per_second_cont
 - eval_steps_per_second
 - test_samples_per_second_cont
 - test_loss_test
 - train_runtime
 - test_runtime_test
 - test_samples_per_second_test
 - test_loss_cont
 - eval_samples_per_second


To gain additional knowledge and inspect the runs in the notebook environment:

In [8]:
MlflowManager.display_runs_filtered_on_metric(experiment_names = ["my_experiment_3"], metric = 'test_f1_score_test')

Unnamed: 0,run_id,experiment_id,status,artifact_uri,start_time,end_time,metrics.loss,metrics.test_steps_per_second_test,metrics.learning_rate,metrics.train_samples_per_second,...,params.evaluation_strategy,params.jit_mode_eval,params.num_beam_groups,params.problem_type,params.do_eval,tags.mlflow.source.name,tags.mlflow.user,tags.mlflow.source.type,tags.mlflow.runName,tags.mlflow.log-model.history
0,1e8ed1f1095f4c91a207641bde96c86c,497116012591044347,FINISHED,mlflow-artifacts:/497116012591044347/1e8ed1f10...,2023-08-31 22:04:58.032000+00:00,2023-08-31 22:06:07.589000+00:00,1.703,1.215,0.0,0.508,...,epoch,False,1,,True,c:\Users\ismyn\miniconda3\envs\enginora_env\li...,janina,LOCAL,redolent-hare-140,"[{""run_id"": ""1e8ed1f1095f4c91a207641bde96c86c""..."
1,76af2a4bba124ad1832706d339650aac,497116012591044347,FINISHED,mlflow-artifacts:/497116012591044347/76af2a4bb...,2023-08-27 17:55:21.686000+00:00,2023-08-27 17:56:14.825000+00:00,2.2225,1.367,0.0,0.597,...,epoch,False,1,,True,c:\Users\ismyn\miniconda3\envs\enginora_env\li...,janina,LOCAL,judicious-worm-435,"[{""run_id"": ""76af2a4bba124ad1832706d339650aac""..."
2,f65cd62ed53244b69e243074c7d21aa7,497116012591044347,FINISHED,mlflow-artifacts:/497116012591044347/f65cd62ed...,2023-08-31 21:58:36.889000+00:00,2023-08-31 21:59:52.216000+00:00,2.694,1.264,0.0,0.42,...,epoch,False,1,,True,c:\Users\ismyn\miniconda3\envs\enginora_env\li...,janina,LOCAL,able-flea-223,"[{""run_id"": ""f65cd62ed53244b69e243074c7d21aa7""..."
3,c221b3a836f2427089a1ef81b9d10d29,497116012591044347,FINISHED,mlflow-artifacts:/497116012591044347/c221b3a83...,2023-08-27 17:57:52.725000+00:00,2023-08-27 17:58:44.770000+00:00,2.2308,1.37,0.0,0.58,...,epoch,False,1,,True,c:\Users\ismyn\miniconda3\envs\enginora_env\li...,janina,LOCAL,sneaky-ram-43,"[{""run_id"": ""c221b3a836f2427089a1ef81b9d10d29""..."
4,ee30dc7bfed446c995b491e46b8062af,497116012591044347,FINISHED,mlflow-artifacts:/497116012591044347/ee30dc7bf...,2023-08-27 09:36:34.229000+00:00,2023-08-27 09:37:33.306000+00:00,2.0603,1.207,0.0,0.512,...,epoch,False,1,,True,c:\Users\ismyn\miniconda3\envs\enginora_env\li...,janina,LOCAL,resilient-slug-19,"[{""run_id"": ""ee30dc7bfed446c995b491e46b8062af""..."
5,1fa37633e8f847fb9b83f1cd0e5f5475,497116012591044347,FINISHED,mlflow-artifacts:/497116012591044347/1fa37633e...,2023-08-27 09:04:22.510000+00:00,2023-08-27 09:05:42.797000+00:00,2.4392,0.663,0.0,0.315,...,epoch,False,1,,True,c:\Users\ismyn\miniconda3\envs\enginora_env\li...,janina,LOCAL,amazing-bat-841,
6,de274d22b65940249b04d69f50a5d0b6,497116012591044347,FINISHED,mlflow-artifacts:/497116012591044347/de274d22b...,2023-08-27 12:09:08.269000+00:00,2023-08-27 12:11:40.988000+00:00,2.947,0.54,0.0,0.225,...,epoch,False,1,,True,c:\Users\ismyn\miniconda3\envs\enginora_env\li...,janina,LOCAL,redolent-hog-322,"[{""run_id"": ""de274d22b65940249b04d69f50a5d0b6""..."


For this particular one we need the mlflow manager, so make sure to have run the loop function. As a workaround, if you do not want to, you can copy the initialisation of the MlflowManager, to a lower cell:
```
from enginora.flow import get_configurations
mlflow_config, _, _, _, _ = get_configurations("./config.yaml")
MlflowManager(mlflow_config)
```

In [9]:
MlflowManager.download_artifacts_filtered_on_metric(experiment_names = ["my_experiment_3"], metric = 'test_f1_score_test')

You can also use the run_id to access the artifacts, like this:

In [11]:
my_run_id = '1e8ed1f1095f4c91a207641bde96c86c'
MlflowManager.download_artifacts_from_run(my_run_id)