# TD MLFlow with S3

In [None]:
import sys
!{sys.executable} -m pip install -r requirements.txt

## I. S3 initialization

#### The artifact store

This is where your model files are stored.
This includes your environment and other files you can use to recreate and deploy your model instantly.

By default, your model artifacts are also stored on the file system in a file named **mlruns**.

Here, we'll use S3 buckets to store model artifacts.
We'll be using Minio for this, as it's relatively lightweight.

#### Pull of Minio's last image (S3)

If you don't have Docker installed, you can install it by downloading Docker Desktop: https://www.docker.com/products/docker-desktop/

In [None]:
!docker pull quay.io/minio/minio:latest

#### Launch a Minio (S3) server from a terminal

Launch a Minio server and link it to an exposed server port. The Minio server must be running on an accessible port so that a remote client can download model artifacts from it.

Minio will serve as the default artifact root. You can run the following command from a terminal.

```shell
# linux/MAC
mkdir -p ~/minio/mlflow
# windows
mkdir %USERPROFILE%\minio\mlflow

docker run -p 9000:9000 -p 9090:9090 --rm --name minio -v ~/minio/mlflow:/mlflow -e "MINIO_ROOT_USER=ROOTNAME" -e "MINIO_ROOT_PASSWORD=CHANGEME123" quay.io/minio/minio server /data --console-address ":9090"

 ```

### Create an S3 bucket and name it "mlflow"

We could use `mc`, Minio's CLI (https://min.io/docs/minio/linux/reference/minio-mc.html), but we'll use its interface instead
http://127.0.0.1:9090/buckets/add-bucket
- User : ROOTNAME
- Password : CHANGEME123

Create a bucket and name it "mlflow".

## II. Launching the MLFlow server

#### Launching an MLFlow server from another terminal

```shell

export MLFLOW_TRACKING_URI='http://127.0.0.1:5000'
export MLFLOW_S3_ENDPOINT_URL='https://minio.becaert.com/'
export AWS_ACCESS_KEY_ID='minioadmin' (in French)
export AWS_SECRET_ACCESS_KEY='minioadmin'

 ```

```bash
mlflow server --default-artifact-root s3://mlflow/ --backend-store-uri sqlite:///mlflow.db --host 0.0.0.0 --port 5000
```

The default-artifact-root is now `s3://mlflow/`!

## III. MLFlow client

#### Configure local environment variables to save data on your minio artifacts server

In [None]:
!pip install mlflow boto3

#### Creation of the "experiment_s3" experiment

In [None]:
import mlflow
import os

In [None]:
os.environ['MLFLOW_TRACKING_URI']='http://127.0.0.1:5000'
mlflow.set_tracking_uri('http://127.0.0.1:5000/')
os.environ['MLFLOW_S3_IGNORE_TLS']="True"
os.environ['AWS_ACCESS_KEY_ID']='5PQV3FiE4fLFedvAuNuo'
os.environ['AWS_SECRET_ACCESS_KEY']='WaJXnrSuWwf9aFKfkdvTApdyUGlYbAZ3SYP2MnYB'
os.environ['MLFLOW_S3_ENDPOINT_URL']='http://127.0.0.1:9090/'
mlflow.set_registry_uri(os.environ.get('MLFLOW_S3_ENDPOINT_URL'))
expr_name = "experiment_s3"
s3_bucket = "S3://mlflow"

mlflow.create_experiment(expr_name, s3_bucket)
mlflow.set_experiment(expr_name)

In [None]:
import mlflow
import os

# Set MLflow tracking URI (your tracking server)
os.environ['MLFLOW_TRACKING_URI'] = 'http://127.0.0.1:5000'
mlflow.set_tracking_uri('http://127.0.0.1:5000')

# Tell MLflow to ignore TLS if you are running locally without HTTPS
os.environ['MLFLOW_S3_IGNORE_TLS'] = "True"

# AWS credentials (dummy here)
os.environ['AWS_ACCESS_KEY_ID'] = 'ROOTNAME'
os.environ['AWS_SECRET_ACCESS_KEY'] = 'CHANGEME123'

# Correct the endpoint URL to point to MinIO S3 API port (default 9000) — no trailing slash!
os.environ['MLFLOW_S3_ENDPOINT_URL'] = 'http://127.0.0.1:9000'

# Optional: set registry URI if you use model registry, but it usually points to tracking URI
mlflow.set_registry_uri(os.environ.get('MLFLOW_TRACKING_URI'))

expr_name = "experiment_s3_3"

# Bucket URI: lowercase 's3://' and correct bucket name — no trailing slash
s3_bucket = "s3://mlflow"

# Create experiment with S3 bucket as artifact location
mlflow.create_experiment(expr_name, artifact_location=s3_bucket)

mlflow.set_experiment(expr_name)

#### Run the example run on this experiment

In [None]:
!mlflow run \
    https://github.com/mlflow/mlflow-example.git \
    --env-manager=local \
    -P alpha=6.0

#### View the results and where the model has been saved :

http://127.0.0.1:5000

In [None]:
print(mlflow.get_artifact_uri())

#### Check the model with Boto to make sure it's on S3

In [None]:
import boto3

session = boto3.Session(
    aws_access_key_id='ROOTNAME',
    aws_secret_access_key='CHANGEME123'
)

s3 = session.resource(
    's3',
    endpoint_url=os.environ.get('MLFLOW_S3_ENDPOINT_URL')
)

my_bucket = s3.Bucket('mlflow')

for my_bucket_object in my_bucket.objects.all():
    print(my_bucket_object.key)

You can also check the bucket: http://127.0.0.1:9090/buckets.

Congratulations ! You successfully learnt to send a model to S3 and to retrieve it, ensuring you can scale your models at will without risking to lose them. Also, you can store full datasets this way, allowing you to perform feature engineering experiments in a reproducible way.