# MLflow: Read data from Parent Runs (Pycaret)

The experiment and model searched for below results from the demonstrations in

- when creating models with pycaret (also `create_regression_model`) runs are organized in parent and child runs
- parents contain data (train/test)
- childs contain specific model trainings
- for experimenting with runs' results it can be handy to read the run's data (which comes from the parent)
- don't think we need implemenation here, this is rather about trying out and presenting an example


## Pycaret Experiment Result Management Class

Below examples illustrate the usage of data and model retrieval from MLflow. This is for Models built with the PyCaret confenience function. The use of this function implies a particular experiment organization, so that the retrieval of models and data can be simplified by this class.

Class: `PyCaretModelManagement`


In [1]:
from fhdw.modelling.tracking import ModelManagement

# name of the registered model
model_name = "test-modelling-tools-basic-workflow"

# "mm" exemplarily stands for model management
mm = ModelManagement(model_name=model_name)

In [2]:
test_data = mm.get_test_data()
test_data

Unnamed: 0.1,Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,941,50,female,46.090,1,no,southeast,9549.5650
1,1219,38,female,30.210,3,no,northwest,7537.1640
2,710,18,male,35.200,1,no,southeast,1727.5400
3,1218,46,female,34.600,1,yes,southwest,41661.6000
4,668,62,male,32.015,0,yes,northeast,45710.2070
...,...,...,...,...,...,...,...,...
397,786,60,male,36.955,0,no,northeast,12741.1670
398,182,22,male,19.950,3,no,northeast,4005.4226
399,1111,38,male,38.390,3,yes,southeast,41949.2420
400,33,63,male,28.310,0,no,northwest,13770.0980


In [3]:
# Stage 'Production' is the default
train_data = mm.get_train_data()
train_data

Unnamed: 0.1,Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,89,55,female,26.98,0,no,northwest,11082.5770
1,309,41,female,33.06,2,no,northwest,7749.1562
2,582,39,male,45.43,2,no,southeast,6356.2705
3,847,23,male,50.38,1,no,southeast,2438.0552
4,11,62,female,26.29,0,yes,southeast,27808.7250
...,...,...,...,...,...,...,...,...
931,153,42,female,23.37,0,yes,northeast,19964.7460
932,425,45,male,24.31,5,no,southeast,9788.8660
933,361,35,male,30.50,1,no,southwest,4751.0700
934,904,60,female,35.10,0,no,southwest,12644.5890


In [4]:
# Stage 'Production' is the default
model = mm.get_model_at_stage()
model

ModuleNotFoundError: No module named 'catboost'

In [None]:
predict_data = test_data.set_index("Unnamed: 0").drop("charges", axis=1)
predictions = model.predict(predict_data)
predictions

array([10933.06430378,  8578.47176518,  4284.54520248, 42203.10380004,
       45595.14619939,  9023.68664342, 41856.10582783, 17948.78513324,
       15043.60671168, 20476.40479361, 12897.42615635,  4363.90721369,
        7974.13470604,  8429.71410773,  3296.11991607,  8359.44147301,
        7350.80568275, 24980.6104901 , 13715.46149288, 20004.85411078,
        2726.3645421 ,  8667.5705888 ,  9604.50576876, 44852.59215914,
       14035.49170983,  2405.66718598, 14622.53317533, 10563.29411235,
        8199.08041675,  9767.30198818, 13010.73713587, 11144.53258178,
       15763.8081674 , 19030.51129148,  7807.98215255,  2791.70364365,
       11349.31997391, 13660.90909874, 13834.78006661,  2755.89059627,
        6004.81871154, 12317.76499306,  2863.79071042, 12948.37822466,
       14483.43189432, 15997.09883761,  7273.62481919, 17408.42739625,
        2959.432491  , 13294.28417641, 11935.39244747, 12712.12311882,
       18084.4678207 , 13961.67332855,  8352.6387974 ,  4312.01826883,
      

## Manual Approach (inner workings)

Below examples illustrate the inner workings of the data and model retrieval.


In [None]:
import mlflow
import pandas as pd

client = mlflow.tracking.MlflowClient()

In [None]:
# get run_id of a Production-staged model; normally we know the name
model_name = "test-modelling-tools-basic-workflow"

production_model = client.get_latest_versions(name=model_name, stages=["Production"])

if len(production_model) != 1:
    raise ValueError(f"unexpected amount, expected 1 got {len(production_model)}")

production_model

[<ModelVersion: creation_timestamp=1704159843917, current_stage='Production', description='', last_updated_timestamp=1704159852255, name='test-modelling-tools-basic-workflow', run_id='3bd4ae8206c740b8b3251dd99b83c3ac', run_link='', source='s3://bucket/artifacts/53/3bd4ae8206c740b8b3251dd99b83c3ac/artifacts/model', status='READY', status_message='', tags={}, user_id='', version='1'>]

In [None]:
run = client.get_run(str(production_model[0].run_id))
run

<Run: data=<RunData: metrics={'MAE': 2518.8546,
 'MAPE': 0.2781,
 'MSE': 20499804.1872,
 'R2': 0.858,
 'RMSE': 4495.3765,
 'RMSLE': 0.4068,
 'TT': 2.99}, params={'CatBoost Regressor': '<catboost.core.CatBoostRegressor object at '
                       '0x7f539b6c6b10>',
 'CatBoost Regressor__border_count': '254',
 'CatBoost Regressor__depth': '6',
 'CatBoost Regressor__eta': '0.05',
 'CatBoost Regressor__l_leaf_reg': '2',
 'CatBoost Regressor__loss_function': 'RMSE',
 'CatBoost Regressor__n_estimators': '80',
 'CatBoost Regressor__random_state': '3244',
 'CatBoost Regressor__random_strength': '0.4',
 'CatBoost Regressor__task_type': 'CPU',
 'CatBoost Regressor__verbose': 'False',
 'Gradient Boosting Regressor': 'GradientBoostingRegressor(random_state=3244)',
 'Gradient Boosting Regressor__alpha': '0.9',
 'Gradient Boosting Regressor__ccp_alpha': '0.0',
 'Gradient Boosting Regressor__criterion': 'friedman_mse',
 'Gradient Boosting Regressor__init': 'None',
 'Gradient Boosting Regressor

In [None]:
parent_id = run.data.tags["mlflow.parentRunId"]
parent_id

'b437c14275bc4b0795b5e7f8d55d4421'

In [None]:
# pycaret always saves data into parent with the names: "Train.csv" or "Test.csv"
data = client.download_artifacts(run_id=parent_id, path="Train.csv")
pd.read_csv(data)

Unnamed: 0.1,Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,89,55,female,26.98,0,no,northwest,11082.5770
1,309,41,female,33.06,2,no,northwest,7749.1562
2,582,39,male,45.43,2,no,southeast,6356.2705
3,847,23,male,50.38,1,no,southeast,2438.0552
4,11,62,female,26.29,0,yes,southeast,27808.7250
...,...,...,...,...,...,...,...,...
931,153,42,female,23.37,0,yes,northeast,19964.7460
932,425,45,male,24.31,5,no,southeast,9788.8660
933,361,35,male,30.50,1,no,southwest,4751.0700
934,904,60,female,35.10,0,no,southwest,12644.5890
