# MLflow Notes


![image.png](attachment:2ecfe4b1-31f7-4ef6-86e1-25350bc80fa6.png)

## Setting/Connecting servers and experiments

https://mlflow.org/docs/latest/tracking/server.html

`mlflow server`: Check current running server, running this cmd under different directory in the command prompt will direct you to different dashboard, since it may be accessing different 'mlrun' directory in your local device <br>
`mlflow server --host 127.0.0.1 --port 8080`: Set up tracking server <br>
`mlflow ui`: Link to the ui dashboard 


#### Connect to local host

In [None]:
mlflow.set_tracking_uri(uri="http://127.0.0.1:8080")
client = MlflowClient(tracking_uri="http://127.0.0.1:8080")

or you can do `export MLFLOW_TRACKING_URI=http://127.0.0.1:8080` in your cmd


#### Create an experiment

In [None]:
experiment_description = (
    "Write your description here."
)

# You can set your own experiment tags here
experiment_tags = {
    "project_name": "the project name",
    "your tags": "value",
    "mlflow.note.content": experiment_description,
}

client.create_experiment(name="experiment_name", tags=experiment_tags) # cannot create experiment if experiment name already exist

#### Search for an experiment

In [None]:
all_experiments = client.search_experiments()
# search with tagging
apples_experiment = client.search_experiments(
    filter_string="tags.`project_name` = 'grocery-forecasting'" 
)

#### Set an experiment

In [None]:
apple_experiment = mlflow.set_experiment("Apple_Models") # name of your experiment

## Runs configurations

#### Initialize the run

In [None]:
# If this is not set, a unique name will be auto-generated for your run, noted that two runs can have the same name
run_name = "apples_rf_test"

with mlflow.start_run(run_name=run_name) as run:
    print(run.info.run_id) # run id of the current run

#### Log params and metrics

Key-value pairs

In [None]:
params = {"solver": "lbfgs", "max_iter": 1000, "multi_class": "auto", "random_state": 8888} #key-value pair
mlflow.log_params(params)
mlflow.log_param("solver", "lbfgs")


mlflow.log_metrics(metrics) # key-value pair similar to logging params

#### Log artifact

You can log local directories to mlflow as an artifact

In [None]:
mlflow.log_artifact('local directory path')

#### Set tag

key-value pairs

In [None]:
mlflow.set_tag("Training Info", "Basic LR model for iris data")

#### Set signature

In MLflow, a signature refers to the description of the input and output parameters of a machine learning model or function. It **defines the expected data types and shapes of the inputs and outputs**, allowing MLflow to infer and validate the inputs and outputs when serving the model.

In [None]:
# part of the parameter in log_model, see below
signature = infer_signature(X_train, lr.predict(X_train))

#### Manually log the model

In [None]:
model_info = mlflow.sklearn.log_model(
    sk_model=lr, # the trained model
    artifact_path="iris_model", # name of the artifact
    signature=signature,
    input_example=X_train,
    registered_model_name="tracking-quickstart", # the model will not be registered if you didn't set the parameter
    # but you can register the model manually in the dashboard on the right hand side of the screen "Register Model"
)

#### Autolog

The mlflow.autolog() function is a convenient feature provided by MLflow that automatically logs several aspects of your machine learning code and environment during training or model fitting. It helps simplify the process of tracking and recording important information without the need for explicit logging statements.

In [None]:
# Enable automatic logging
mlflow.autolog()
# disable autolog
mlflow.autolog(disable=True) 

# Start an MLflow run
with mlflow.start_run():

    # Create and fit a scikit-learn model
    model = LogisticRegression()
    model.fit(X_train, y_train)

    # The metrics, parameters, artifacts, and model are automatically logged

#### Query runs

In [None]:
client = mlflow.tracking.MlflowClient()
experiment_id = "0" # experimental id, 0 refers to experiment 'Default'
best_run = client.search_runs(
    experiment_id, order_by=["metrics.val_loss ASC"], max_results=1
)[0]
print(best_run.info)
# {'run_id': '...', 'metrics': {'val_loss': 0.123}, ...}

#### Creating Child runs

https://mlflow.org/docs/latest/tracking/tracking-api.html#creating-child-runs

You can also create multiple runs inside a single run. This is useful for scenario like hyperparameter tuning, cross-validation folds, where you need another level of organization within an experiment. You can create child runs by passing parent_run_id to mlflow.start_run() function. For example:
The nested runs inherit properties from the parent run, such as the experiment ID, tags, and the run's lifecycle. However, they have their own unique run ID and can have their own set of metrics, parameters, and artifacts.

By utilizing nested runs, you can have a hierarchical structure that provides a more detailed view of your experiment, making it easier to analyze and compare different iterations or variations.


In [None]:
# Start parent run
with mlflow.start_run() as parent_run:
    param = [0.01, 0.02, 0.03]

    # Create a child run for each parameter setting
    for p in param:
        with mlflow.start_run(nested=True) as child_run:
            mlflow.log_param("p", p)
            ...
            mlflow.log_metric("val_loss", val_loss)

#### Parellel runs - Multiprocessing

https://mlflow.org/docs/latest/tracking/tracking-api.html#parallel-runs

In [None]:
import multiprocessing as mp

def train_model(params):
    with mlflow.start_run():
        mlflow.log_param("p", params)
        ...

if __name__ == "__main__":
    params = [0.01, 0.02, ...]
    pool = mp.Pool(processes=4)
    pool.map(train_model, params)

#### Parellel runs - Multithreading

https://mlflow.org/docs/latest/tracking/tracking-api.html#parallel-runs

#### Add tags to run - for better organization

https://mlflow.org/docs/latest/tracking/tracking-api.html#adding-tags-to-runs