# MLflow: Read data from Parent Runs (Pycaret)

The experiment and model searched for below results from the demonstrations in

- when creating models with pycaret (also `create_regression_model`) runs are organized in parent and child runs
- parents contain data (train/test)
- childs contain specific model trainings
- for experimenting with runs' results it can be handy to read the run's data (which comes from the parent)
- don't think we need implemenation here, this is rather about trying out and presenting an example


In [7]:
import mlflow
import pandas as pd

client = mlflow.tracking.MlflowClient()

In [8]:
# get run_id of a Production-staged model; normally we know the name
model_name = "test-modelling-tools-basic-workflow"

production_model = client.get_latest_versions(name=model_name, stages=["Production"])

if len(production_model) != 1:
    raise ValueError(f"unexpected amount, expected 1 got {len(production_model)}")

production_model

[<ModelVersion: creation_timestamp=1704159843917, current_stage='Production', description='', last_updated_timestamp=1704159852255, name='test-modelling-tools-basic-workflow', run_id='3bd4ae8206c740b8b3251dd99b83c3ac', run_link='', source='s3://bucket/artifacts/53/3bd4ae8206c740b8b3251dd99b83c3ac/artifacts/model', status='READY', status_message='', tags={}, user_id='', version='1'>]

In [9]:
run = client.get_run(str(production_model[0].run_id))
run

<Run: data=<RunData: metrics={'MAE': 2518.8546,
 'MAPE': 0.2781,
 'MSE': 20499804.1872,
 'R2': 0.858,
 'RMSE': 4495.3765,
 'RMSLE': 0.4068,
 'TT': 2.99}, params={'CatBoost Regressor': '<catboost.core.CatBoostRegressor object at '
                       '0x7f539b6c6b10>',
 'CatBoost Regressor__border_count': '254',
 'CatBoost Regressor__depth': '6',
 'CatBoost Regressor__eta': '0.05',
 'CatBoost Regressor__l_leaf_reg': '2',
 'CatBoost Regressor__loss_function': 'RMSE',
 'CatBoost Regressor__n_estimators': '80',
 'CatBoost Regressor__random_state': '3244',
 'CatBoost Regressor__random_strength': '0.4',
 'CatBoost Regressor__task_type': 'CPU',
 'CatBoost Regressor__verbose': 'False',
 'Gradient Boosting Regressor': 'GradientBoostingRegressor(random_state=3244)',
 'Gradient Boosting Regressor__alpha': '0.9',
 'Gradient Boosting Regressor__ccp_alpha': '0.0',
 'Gradient Boosting Regressor__criterion': 'friedman_mse',
 'Gradient Boosting Regressor__init': 'None',
 'Gradient Boosting Regressor

In [10]:
parent_id = run.data.tags["mlflow.parentRunId"]
parent_id

'b437c14275bc4b0795b5e7f8d55d4421'

In [11]:
# pycaret always saves data into parent with the names: "Train.csv" or "Test.csv"
data = client.download_artifacts(run_id=parent_id, path="Train.csv")
pd.read_csv(data)

Unnamed: 0.1,Unnamed: 0,age,sex,bmi,children,smoker,region,charges
0,89,55,female,26.98,0,no,northwest,11082.5770
1,309,41,female,33.06,2,no,northwest,7749.1562
2,582,39,male,45.43,2,no,southeast,6356.2705
3,847,23,male,50.38,1,no,southeast,2438.0552
4,11,62,female,26.29,0,yes,southeast,27808.7250
...,...,...,...,...,...,...,...,...
931,153,42,female,23.37,0,yes,northeast,19964.7460
932,425,45,male,24.31,5,no,southeast,9788.8660
933,361,35,male,30.50,1,no,southwest,4751.0700
934,904,60,female,35.10,0,no,southwest,12644.5890
