## Scanario 2: Multiple data scientists working on multiple ML models

MLflow setup:
- tracking server: yes, remote server (EC2)
- backend store: postgreSql database
- artifacts store: s3 bucket

The experiments can be explored by accessing the remote server.

prerequisite:
- aws account
- aws cli

In [1]:
!mlflow --version
#mlflow is version 2.9.2 
!python --version
#Python 3.9.18

mlflow, version 2.9.2
Python 3.9.18


In [2]:
import mlflow

In [3]:
#os.environ["AWS_PROFILE"]= "default"

TRACKING_SERVER_HOST = "ec2-43-204-214-229.ap-south-1.compute.amazonaws.com"
mlflow.set_tracking_uri(f"http://{TRACKING_SERVER_HOST}:5000")

In [4]:
print(mlflow.get_tracking_uri())

http://ec2-43-204-214-229.ap-south-1.compute.amazonaws.com:5000


In [6]:
!aws s3 ls
!export AWS_PROFILE=default

2024-02-07 16:08:08 mlflow-artifacts-remote


In [7]:
mlflow.search_experiments()

[<Experiment: artifact_location='s3://mlflow-artifacts-remote/2', creation_time=1707307209126, experiment_id='2', last_update_time=1707307209126, lifecycle_stage='active', name='my-experiment-1', tags={}>,
 <Experiment: artifact_location='s3://mlflow-artifacts-remote/1', creation_time=1707306659344, experiment_id='1', last_update_time=1707306659344, lifecycle_stage='active', name='my-aws-experiments', tags={}>,
 <Experiment: artifact_location='s3://mlflow-artifacts-remote/0', creation_time=1707305800276, experiment_id='0', last_update_time=1707305800276, lifecycle_stage='active', name='Default', tags={}>]

In [8]:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

mlflow.set_experiment("my-experiment-1")

with mlflow.start_run() as run:

    X, y = load_iris(return_X_y=True)

    params = {"C": 0.1, "random_state": 42}
    mlflow.log_params(params)

    lr = LogisticRegression(**params).fit(X, y)
    y_pred = lr.predict(X)
    mlflow.log_metric("accuracy", accuracy_score(y, y_pred))

    mlflow.sklearn.log_model(lr, artifact_path="models")
    print(f"default artifacts URI: '{mlflow.get_artifact_uri()}'")



default artifacts URI: 's3://mlflow-artifacts-remote/2/0dbcb7e976d84189ad0cc9b17c5dbd1d/artifacts'


In [9]:
mlflow.search_experiments()

[<Experiment: artifact_location='s3://mlflow-artifacts-remote/2', creation_time=1707307209126, experiment_id='2', last_update_time=1707307209126, lifecycle_stage='active', name='my-experiment-1', tags={}>,
 <Experiment: artifact_location='s3://mlflow-artifacts-remote/1', creation_time=1707306659344, experiment_id='1', last_update_time=1707306659344, lifecycle_stage='active', name='my-aws-experiments', tags={}>,
 <Experiment: artifact_location='s3://mlflow-artifacts-remote/0', creation_time=1707305800276, experiment_id='0', last_update_time=1707305800276, lifecycle_stage='active', name='Default', tags={}>]

In [10]:
from mlflow.tracking import MlflowClient
client = MlflowClient(f"http://{TRACKING_SERVER_HOST}:5000")

In [11]:
mlflow.search_model_versions()

[]

In [12]:
run_id = run.info.run_id
mlflow.register_model(
    model_uri=f"runs:/{run_id}/models",
    name='iris-classifier'
)

Successfully registered model 'iris-classifier'.
2024/02/07 18:17:24 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: iris-classifier, version 1
Created version '1' of model 'iris-classifier'.


<ModelVersion: aliases=[], creation_timestamp=1707310044350, current_stage='None', description='', last_updated_timestamp=1707310044350, name='iris-classifier', run_id='0dbcb7e976d84189ad0cc9b17c5dbd1d', run_link='', source='s3://mlflow-artifacts-remote/2/0dbcb7e976d84189ad0cc9b17c5dbd1d/artifacts/models', status='READY', status_message='', tags={}, user_id='', version='1'>