## Scenario 3: Multiple data scientists working on multiple ML models

MLflow setup:
* Tracking server: yes, remote server (EC2).
* Backend store: postgresql database.
* Artifacts store: s3 bucket.

The experiments can be explored by accessing the remote server.

The exampe uses AWS to host a remote server. In order to run the example you'll need an AWS account. Follow the steps described in the file `mlflow_on_aws.md` to create a new AWS account and launch the tracking server. 

In [9]:
import mlflow
import os

os.environ["AWS_PROFILE"] = "inm-aws" # fill in with your AWS profile. More info: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/setup.html#setup-credentials

TRACKING_SERVER_HOST = "ec2-13-49-159-70.eu-north-1.compute.amazonaws.com" # fill in with the public DNS of the EC2 instance
mlflow.set_tracking_uri(f"http://{TRACKING_SERVER_HOST}:5000")

In [10]:
print(f"tracking URI: '{mlflow.get_tracking_uri()}'")

tracking URI: 'http://ec2-13-49-159-70.eu-north-1.compute.amazonaws.com:5000'


In [11]:
mlflow.list_experiments()

AttributeError: module 'mlflow' has no attribute 'list_experiments'

In [13]:
!pip3 install boto3 psycopg2-binary


Collecting boto3
  Using cached boto3-1.34.114-py3-none-any.whl.metadata (6.6 kB)
Collecting psycopg2-binary
  Downloading psycopg2_binary-2.9.9-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (4.4 kB)
Collecting botocore<1.35.0,>=1.34.114 (from boto3)
  Using cached botocore-1.34.114-py3-none-any.whl.metadata (5.7 kB)
Collecting jmespath<2.0.0,>=0.7.1 (from boto3)
  Downloading jmespath-1.0.1-py3-none-any.whl.metadata (7.6 kB)
Collecting s3transfer<0.11.0,>=0.10.0 (from boto3)
  Using cached s3transfer-0.10.1-py3-none-any.whl.metadata (1.7 kB)
Using cached boto3-1.34.114-py3-none-any.whl (139 kB)
Downloading psycopg2_binary-2.9.9-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.0/3.0 MB[0m [31m42.5 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[?25hUsing cached botocore-1.34.114-py3-none-any.whl (12.3 MB)
Downloading jmespath-1.0.1-py3-none-any.whl (20 kB)
Using cached s3transfer-0.10.1-py3

In [14]:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

mlflow.set_experiment("my-experiment-1")

with mlflow.start_run():

    X, y = load_iris(return_X_y=True)

    params = {"C": 0.1, "random_state": 42}
    mlflow.log_params(params)

    lr = LogisticRegression(**params).fit(X, y)
    y_pred = lr.predict(X)
    mlflow.log_metric("accuracy", accuracy_score(y, y_pred))

    mlflow.sklearn.log_model(lr, artifact_path="models")
    print(f"default artifacts URI: '{mlflow.get_artifact_uri()}'")



default artifacts URI: 's3://mlflow-artifacts-remotes/1/8ab21bf34e824c6992f89829254f2b85/artifacts'


In [17]:
import mlflow.experiments

mlflow.experiments

<module 'mlflow.experiments' from '/home/codespace/anaconda3/envs/exp-tracking-env/lib/python3.9/site-packages/mlflow/experiments.py'>

### Interacting with the model registry

In [18]:
from mlflow.tracking import MlflowClient


client = MlflowClient(f"http://{TRACKING_SERVER_HOST}:5000")

In [29]:
client.list_artifacts(run_id='8ab21bf34e824c6992f89829254f2b85', path='models')

[<FileInfo: file_size=526, is_dir=False, path='models/MLmodel'>,
 <FileInfo: file_size=248, is_dir=False, path='models/conda.yaml'>,
 <FileInfo: file_size=None, is_dir=True, path='models/metadata'>,
 <FileInfo: file_size=835, is_dir=False, path='models/model.pkl'>,
 <FileInfo: file_size=112, is_dir=False, path='models/python_env.yaml'>,
 <FileInfo: file_size=125, is_dir=False, path='models/requirements.txt'>]

In [25]:
run_id = client.list_run_infos(experiment_id='1')[0].run_id
mlflow.register_model(
    model_uri=f"runs:/{run_id}/models",
    name='iris-classifier'
)

AttributeError: 'MlflowClient' object has no attribute 'list_run_infos'

In [33]:
client.get_experiment(experiment_id='1')

<Experiment: artifact_location='s3://mlflow-artifacts-remotes/1', creation_time=1716928708899, experiment_id='1', last_update_time=1716928708899, lifecycle_stage='active', name='my-experiment-1', tags={}>

In [36]:
client.create_registered_model(name="iris-classifier")


<RegisteredModel: aliases={}, creation_timestamp=1716930812285, description='', last_updated_timestamp=1716930812285, latest_versions=[], name='iris-classifier', tags={}>

In [39]:
result = client.create_model_version(
    name="iris-classifier",
    source="s3://mlflow-artifacts-remotes/1/8ab21bf34e824c6992f89829254f2b85/artifacts/models/MLmodel",
    run_id="8ab21bf34e824c6992f89829254f2b85"
)


2024/05/28 21:23:09 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: iris-classifier, version 2
