# Configuring MLflow

[MLflow](https://mlflow.org/) is a project that allows for automatically tracking, logging, and displaying data about your ML models and training algorithms live. It is a critical component of evaluating different models and debugging issues with code. ClimatExML uses MLflow for logging metrics, generating artifacts (like plots and images), as well as saving trained models.

Users should refer to the [MLflow documentation](https://mlflow.org/docs/latest/tracking.html) so that they can fine tune the configuration to match their needs. This documentation will provide a quick overview of two main options: (1) using an S3 bucket and tracking to a remote mlflow instance, and (2) on an HPC system without internet access while training.

## Option 1: Remote MLflow Instance
This option requires a mlflow server with a postgres database configured on an external server. It also requires an object store, like an s3 bucket to be setup. 
```{note}
If you are a member of the UVic climate lab or ECCC, send an email to request access.
```

### Remote Tracking Server

Alliance conveneiently provides persistant storage on arbutus which is accessible here https://arbutus.cloud.computecanada.ca/. You can also make object stores here. There is an openstack user interface for managing cloud resources.

MLflow artifacts can be logged to an instance running an mlflow server. This server must have `mlflow` installed, as well as `boto3`. 

```{note}
You have to manually add other ssh machines to access this server and can't do it from openstack.
```

Use an environment variable file to set up the MLflow environment variables from your training environment called `mlflow.env`:

```bash
export MLFLOW_TRACKING_URI='<public-ip-of-instance-on-arbutus>:5000'
export MLFLOW_S3_ENDPOINT_URL='https://object-arbutus.cloud.computecanada.ca/'
export AWS_ACCESS_KEY_ID="<get-this-from-openstack>"
export AWS_SECRET_ACCESS_KEY="<get-this-from-openstack>"
```

### Keys
To get the keys from openstack you must first download the `...opensrc.sh` file from the openstack client. `...` corresponds to your specific allcoation and instance names automatically generated by openstack.

```
source ...openrc.sh
openstack ec2 credentials create
```

This will create a file and you want the `AWS_ACCESS_KEY_ID` key to be set as `access` and the `AWS_SECRET_ACCESS_KEY` key to be set as `secret` from the generated info. `source` the mlflow environment bash to set the variables.

```{note}
Use the same environment variables on the server running your mlflow server instance. 
```

Then you can spin up your mlflow server remotely at:

```
mlflow server --backend-store-uri postgresql://<username>:<password>@localhost:5432/mlflowdb --host 0.0.0.0 --default-artifact-root s3://<name_of_bucket>
```

You will have to configure a postgresql database named `mlflowdb` and set a user and password beforehand. 

Also note that `<name_of_bucket>` is the name of the object store you create in openstack. 

## Option 2: On Alliance Machines

These instructions are specific to Alliance machines, but the general process should be similar on other machines.

### Spin up MLflow User Interface with SQlite backend

To start with a fresh Python environment and run 
```
virtualenv --no-download ENV
source ENV/bin/activate
pip install --no-index --upgrade pip
module load gcc/9.3.0 arrow/8 python/3.8
pip install --no-index mlflow

# now you can run mlflow server commands! 
```

To start with a fresh Python environment and run 
```
virtualenv --no-download ENV
# python -m venv ENV # for non alliance machines
source ENV/bin/activate
pip install --no-index --upgrade pip
module load gcc/9.3.0 arrow/8 python/3.8
pip install --no-index mlflow
# now you can run mlflow server commands! 
```

Next, and similarly to the remote MLflow setup, use an environment variable file to set up the MLflow environment variables from your training environment called `mlflow.env`. This time we will change the variables to make more sense:

```bash
export MLFLOW_SQLITE_DB_PATH='sqlite:////path/to/database.db'
export MLFLOW_ARTIFACTS_PATH='/path/to/artifacts/storage' # make sure this path is identically replicated in the container
```

```{note}
Choose a logical location for the database and include the .db extension in `MLFLOW_SQLITE_DB_PATH`. The number of slashes in `MLFLOW_SQLITE_DB_PATH` is important. Also, make sure that `MLFLOW_ARTIFACTS_PATH` exists in both your host and container otherwise the artifacts will not be logged or accessible in the MLflow server client
```

Now, launch an MLflow user interface. It doesn't have to be permanent, this is just an easy way to create an SQlite database for MLflow to log metrics to.

```
mlflow server --backend-store-uri $MLFLOW_SQLITE_DB_PATH --default-artifact-root $MLFLOW_ARTIFACTS_PATH --serve-artifacts 
```

Once it runs successfully, you can `ctrl-c` to close the server.