# Reproducing Experiments

## Installing our packages

Our pipeline makes extensive use of our own Python modules, which are stored in the `mlops` package. We can install this package using the following command:

In [3]:
!pip install ../mlops

Processing c:\users\danie\repos\tec\itesm-mna-mlops-eq40\mlops
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: mlops
  Building wheel for mlops (setup.py): started
  Building wheel for mlops (setup.py): finished with status 'done'
  Created wheel for mlops: filename=mlops-0.1-py3-none-any.whl size=7603 sha256=7a572f84d39680429c437e2ec9d893e774657ebd9b5b47dcbf50ecf3a2112c58
  Stored in directory: C:\Users\danie\AppData\Local\Temp\pip-ephem-wheel-cache-ytx16mh2\wheels\51\fc\f7\c6c3d6aac1b0db5c708c09a6c515ddf81f34136d9edfa66add
Successfully built mlops
Installing collected packages: mlops
  Attempting uninstall: mlops
    Found existing installation: mlops 0.1
    Uninstalling mlops-0.1:
      Successfully uninstalled mlops-0.1
Successfully installed mlops-0.1



[notice] A new release of pip is available: 23.2.1 -> 24.3.1
[notice] To update, run: python.exe -m pip install --upgrade pip


With that in mind, we can now import any of the exported functions and classes from our modules. For example, we can import the `load_data` function from the `data` module like this:

In [4]:
from mlops.load_data import load_data


path = '../data/raw/heart_failure_clinical_records_dataset.csv'
data = load_data(path)

display(data)

Unnamed: 0,age,anaemia,creatinine_phosphokinase,diabetes,ejection_fraction,high_blood_pressure,platelets,serum_creatinine,serum_sodium,sex,smoking,time,DEATH_EVENT
0,75.0,0,582,0,20,1,265000.00,1.9,130,1,0,4,1
1,55.0,0,7861,0,38,0,263358.03,1.1,136,1,0,6,1
2,65.0,0,146,0,20,0,162000.00,1.3,129,1,1,7,1
3,50.0,1,111,0,20,0,210000.00,1.9,137,1,0,7,1
4,65.0,1,160,1,20,0,327000.00,2.7,116,0,0,8,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
294,62.0,0,61,1,38,1,155000.00,1.1,143,1,1,270,0
295,55.0,0,1820,0,38,0,270000.00,1.2,139,0,0,271,0
296,45.0,0,2060,1,60,0,742000.00,0.8,138,0,0,278,0
297,45.0,0,2413,0,38,0,140000.00,1.4,140,1,1,280,0


## Connecting with MLFLow tracking server

We can also leverage our modules to streamline MLFlow tracking. For example, we can call `setup_default_mlflow_connection` to automatically set up the MLFlow connection:

In [5]:
from mlops.model_tracking.connection import setup_default_mlflow_connection

setup_default_mlflow_connection()

## Reproducing an existing experiment

With MLFlow tracking set up, we can easily reproduce experiments. For example, in the following snippet we load a model from a previous run and re-evaluate it with the test data. We also leverage our existing functions for preprocessing and evaluation, which are deterministic and therefore guarantee reproducibility.

In [6]:
from mlops.reproducibility.load_model import load_model
from mlops.split import load_and_split
from mlops.evaluation.classification_metrics import evaluate_classification

# This is from an existing experiment run: http://13.93.214.226:5000/#/experiments/368036918666935739/runs/bc8cf7556c204f698695eef704dfaf8b
mlflow_run_id = 'bc8cf7556c204f698695eef704dfaf8b'

# Expected metrics based on the results of the prior run
expected_accuracy = 0.7333333333333333
expected_recall = 0.64
expected_precision = 0.6956521739130435

# Loads the existing model
model = load_model(mlflow_run_id)

# Loads the original dataset and splits it into training and test sets.
# This will work to reproduce the experiment because the split is deterministic.
x_train, x_test, y_train, y_test = load_and_split(path)

# Predicts the test data
y_pred = model.predict(x_test)

# Re-evaluates the model and compares it with the expected metrics
classification_metrics = evaluate_classification(y_test, y_pred)

print(f'Expected accuracy: {expected_accuracy}, obtained accuracy: {classification_metrics.accuracy}')
print(f'Expected recall: {expected_recall}, obtained recall: {classification_metrics.recall}')
print(f'Expected precision: {expected_precision}, obtained precision: {classification_metrics.precision}')

Downloading artifacts: 100%|██████████| 7/7 [00:00<00:00, 61.40it/s]

Expected accuracy: 0.7333333333333333, obtained accuracy: 0.7333333333333333
Expected recall: 0.64, obtained recall: 0.64
Expected precision: 0.6956521739130435, obtained precision: 0.6956521739130435



