## Scenario 3: Multiple data scientists working on multiple ML models

MLflow setup:
* Tracking server: yes, remote server (EC2).
* Backend store: postgresql database.
* Artifacts store: s3 bucket.

The experiments can be explored by accessing the remote server.

The example uses AWS to host a remote server. In order to run the example you'll need an AWS account. Follow the steps described in the file `mlflow_on_aws.md` to create a new AWS account and launch the tracking server. 

In [None]:
import os
from dotenv import load_dotenv
load_dotenv()
print(os.getenv("AWS_ACCESS_KEY_ID"))
print(os.getenv("AWS_SECRET_ACCESS_KEY"))


In [6]:
import boto3
s3 = boto3.client("s3")
print(s3.list_buckets())


{'ResponseMetadata': {'RequestId': 'WZZD7NK2ESAFPA97', 'HostId': 'IoMKLXAQLqaPfgsDT571cbNUYyHE8B2K6wjmCJBoL4JDcrwF3H8I7PyaiMpGYVy0MkSHJev4WQXbPiTiwRcGt3tskr1Qdgo9', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'IoMKLXAQLqaPfgsDT571cbNUYyHE8B2K6wjmCJBoL4JDcrwF3H8I7PyaiMpGYVy0MkSHJev4WQXbPiTiwRcGt3tskr1Qdgo9', 'x-amz-request-id': 'WZZD7NK2ESAFPA97', 'date': 'Mon, 20 Oct 2025 21:26:24 GMT', 'content-type': 'application/xml', 'transfer-encoding': 'chunked', 'server': 'AmazonS3'}, 'RetryAttempts': 0}, 'Buckets': [{'Name': 'mlflow-artifacts-remote4', 'CreationDate': datetime.datetime(2025, 10, 20, 19, 29, 6, tzinfo=tzlocal())}], 'Owner': {'ID': '523eba7f31d05e3a0071813691adcc55f62cdaea4187b804a029c3d032de92b7'}}


In [7]:
import mlflow
import os

# os.environ["AWS_PROFILE"] = "" # fill in with your AWS profile. More info: https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/setup.html#setup-credentials

TRACKING_SERVER_HOST = "ec2-13-48-57-232.eu-north-1.compute.amazonaws.com" # fill in with the public DNS of the EC2 instance
mlflow.set_tracking_uri(f"http://{TRACKING_SERVER_HOST}:5000")

In [8]:
print(f"tracking URI: '{mlflow.get_tracking_uri()}'")

tracking URI: 'http://ec2-13-48-57-232.eu-north-1.compute.amazonaws.com:5000'


In [9]:
mlflow.search_experiments() # list_experiments API has been removed, you can use search_experiments instead.()

[<Experiment: artifact_location='s3://mlflow-artifacts-remote4/0', creation_time=1760990864655, experiment_id='0', last_update_time=1760990864655, lifecycle_stage='active', name='Default', tags={}>]

In [10]:
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

mlflow.set_experiment("my-experiment-1")

with mlflow.start_run():

    X, y = load_iris(return_X_y=True)

    params = {"C": 0.1, "random_state": 42}
    mlflow.log_params(params)

    lr = LogisticRegression(**params).fit(X, y)
    y_pred = lr.predict(X)
    mlflow.log_metric("accuracy", accuracy_score(y, y_pred))

    mlflow.sklearn.log_model(lr, artifact_path="models")
    print(f"default artifacts URI: '{mlflow.get_artifact_uri()}'")

2025/10/20 21:27:29 INFO mlflow.tracking.fluent: Experiment with name 'my-experiment-1' does not exist. Creating a new experiment.


default artifacts URI: 's3://mlflow-artifacts-remote4/1/54f5bdc992824cea8311c5fcb475b19d/artifacts'
🏃 View run resilient-sponge-870 at: http://ec2-13-48-57-232.eu-north-1.compute.amazonaws.com:5000/#/experiments/1/runs/54f5bdc992824cea8311c5fcb475b19d
🧪 View experiment at: http://ec2-13-48-57-232.eu-north-1.compute.amazonaws.com:5000/#/experiments/1


In [11]:
mlflow.search_experiments()

[<Experiment: artifact_location='s3://mlflow-artifacts-remote4/1', creation_time=1760995649199, experiment_id='1', last_update_time=1760995649199, lifecycle_stage='active', name='my-experiment-1', tags={}>,
 <Experiment: artifact_location='s3://mlflow-artifacts-remote4/0', creation_time=1760990864655, experiment_id='0', last_update_time=1760990864655, lifecycle_stage='active', name='Default', tags={}>]

### Interacting with the model registry

In [12]:
from mlflow.tracking import MlflowClient


client = MlflowClient(f"http://{TRACKING_SERVER_HOST}:5000")

In [15]:
client.search_registered_models()

[<RegisteredModel: aliases={}, creation_timestamp=1760995722713, deployment_job_id='', deployment_job_state='DEPLOYMENT_JOB_CONNECTION_STATE_UNSPECIFIED', description='', last_updated_timestamp=1760995723322, latest_versions=[<ModelVersion: aliases=[], creation_timestamp=1760995723322, current_stage='None', deployment_job_state=<ModelVersionDeploymentJobState: current_task_name='', job_id='', job_state='DEPLOYMENT_JOB_CONNECTION_STATE_UNSPECIFIED', run_id='', run_state='DEPLOYMENT_JOB_RUN_STATE_UNSPECIFIED'>, description='', last_updated_timestamp=1760995723322, metrics=None, model_id=None, name='iris-classifier', params=None, run_id='54f5bdc992824cea8311c5fcb475b19d', run_link='', source='models:/m-0e1e6f97b65843a2a749209733d3df48', status='READY', status_message=None, tags={}, user_id='', version='1'>], name='iris-classifier', tags={}>]

In [14]:
run_id = client.search_runs(experiment_ids=['1'])[0].info.run_id
mlflow.register_model(
    model_uri=f"runs:/{run_id}/models",
    name='iris-classifier'
)

Successfully registered model 'iris-classifier'.
2025/10/20 21:28:43 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: iris-classifier, version 1
Created version '1' of model 'iris-classifier'.


<ModelVersion: aliases=[], creation_timestamp=1760995723322, current_stage='None', deployment_job_state=<ModelVersionDeploymentJobState: current_task_name='', job_id='', job_state='DEPLOYMENT_JOB_CONNECTION_STATE_UNSPECIFIED', run_id='', run_state='DEPLOYMENT_JOB_RUN_STATE_UNSPECIFIED'>, description='', last_updated_timestamp=1760995723322, metrics=None, model_id=None, name='iris-classifier', params=None, run_id='54f5bdc992824cea8311c5fcb475b19d', run_link='', source='models:/m-0e1e6f97b65843a2a749209733d3df48', status='READY', status_message=None, tags={}, user_id='', version='1'>