# governance & API

## mlFlow

Making experiments repeatable, bringing structure to it and tracking the best parameters.
https://mlflow.org


In case you run this locally execute:

In [None]:
!mlflow ui

otherwise, directly go to: http://localhost:5000 to view the UI.


Lets log some data:

In [None]:
!mlflow experiments create --experiment-name first_run

In [None]:
!mlflow experiments list

In [None]:
import os
from mlflow import log_metric, log_param, log_artifact

# Log a parameter (key-value pair)
log_param("param1", 5)

# Log a metric; metrics can be updated throughout the run
log_metric("foo", 1)
log_metric("foo", 2)
log_metric("foo", 3)

# Log an artifact (output file)
with open("output.txt", "w") as f:
    f.write("Hello world!")
log_artifact("output.txt")

And come back to the UI to take a look at the result.

Please be aware of: https://github.com/mlflow/mlflow/issues/884 somehow it does not fully work in the docker container. But the artifact is nicely logged when running locally.


## mlFlow projects

Structuring the code https://github.com/mlflow/mlflow-example allows for easy experimentation

- dependencies
- parameters


Launch:

```bash
mlflow ui
```
in a new terminal (in the same folder.

Then, create a new experiment

In [None]:
!mlflow experiments create --experiment-name second_run

In [None]:
!mlflow run --experiment-name second_run https://github.com/mlflow/mlflow-example.git -P alpha=5
!mlflow run --experiment-name second_run https://github.com/mlflow/mlflow-example.git -P alpha=2

go again back to http://localhost:5000/#/

- look at the results
- compare them

## API

- models only bring value when others can use them
- mlflow has a simple and standardized way to serve models. See https://mlflow.org/docs/latest/models.html for a list of supported models.
- you might need more fancy or resilient ways to do so in a very large scale production setup with highly demanding requirements for HA or latency

In [None]:
import numpy as np
from sklearn.linear_model import LogisticRegression

import mlflow
import mlflow.sklearn

X = np.array([-2, -1, 0, 1, 2, 1]).reshape(-1, 1)
y = np.array([0, 0, 1, 1, 1, 0])
lr = LogisticRegression(solver='lbfgs')
lr.fit(X, y)
score = lr.score(X, y)
print("Score: %s" % score)
mlflow.log_metric("score", score)
mlflow.sklearn.log_model(lr, "model")
id = mlflow.active_run().info.run_uuid
print(f"Model saved in run {id}")

To serve execute in a terminal:

```bash
mlflow models serve -m runs:/<RUN_ID>/model --port 1234

# i.e.
mlflow models serve -m runs:/83d6af88e83f45ec9c9edff16a0a94b1/model --port 1234
```

and query it

In [None]:
!curl -d '{"columns":["x"], "data":[[1], [-1]]}' -H 'Content-Type: application/json; format=pandas-split' -X POST localhost:1234/invocations

## cleanup

only relevant in case of a local run.
If running via docker simply cleanup the no longer used containers.

```bash
conda env list

conda remove --name <<env_name>> --all
```