**Q1. What's the MLFlow version that you have?**

In [1]:
! mlflow --version

mlflow, version 2.19.0


**Q2. How many files were saved to OUTPUT_FOLDER?**

In [2]:
! python preprocess_data.py --raw_data_path data --dest_path ./output
! files_number=$(ls output | wc -l); echo "output_files=$files_number"

output_files=4


**Q3. What is the value of the min_samples_split parameter**

In [3]:
! python train.py

2025/01/20 11:37:44 INFO mlflow.tracking.fluent: Autologging successfully enabled for sklearn.


In [4]:
from mlflow.tracking import MlflowClient
import mlflow

EXPERIMENT_NAME = "nyc-taxi-experiment"
MLFLOW_TRACKING_URI = "sqlite:///mlflow.db"

mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
mlflow.set_experiment("nyc-taxi-experiment")

client = MlflowClient(tracking_uri=MLFLOW_TRACKING_URI)

experiment = client.get_experiment_by_name(EXPERIMENT_NAME)

latest_run = client.search_runs(
    experiment_ids=[experiment.experiment_id], order_by=["start_time desc"]
    )[0]

print(f"min_samples_split={latest_run.data.params["min_samples_split"]}")

min_samples_split=2


**Q4. In addition to backend-store-uri, what else do you need to pass to properly configure the server?  
Answer:**

I need to pass the parameter “--default-artifact-root” and its value. 

e.g.

mlflow server --port 8080 --backend-store-uri sqlite:///mlflow.db  --default-artifact-root ./artifacts



**Q5. What's the best validation RMSE that you got?**

In [5]:
! python hpo.py

🏃 View run redolent-bat-630 at: http://127.0.0.1:8080/#/experiments/2/runs/045228b0247d4f40936a59428eaa3b36

🧪 View experiment at: http://127.0.0.1:8080/#/experiments/2                    

🏃 View run nebulous-shrimp-669 at: http://127.0.0.1:8080/#/experiments/2/runs/1d035239f27e49d78146ec6327ce6470

🧪 View experiment at: http://127.0.0.1:8080/#/experiments/2                    

🏃 View run bustling-duck-275 at: http://127.0.0.1:8080/#/experiments/2/runs/f798b8d9335545429843eb34e6f1a712

🧪 View experiment at: http://127.0.0.1:8080/#/experiments/2                    

🏃 View run tasteful-wolf-351 at: http://127.0.0.1:8080/#/experiments/2/runs/4178da54a07b42b7ba1f1326911a98d2

🧪 View experiment at: http://127.0.0.1:8080/#/experiments/2                    

🏃 View run resilient-fawn-214 at: http://127.0.0.1:8080/#/experiments/2/runs/538bfe80e74d445d89dd8dfac4928f68

🧪 View experiment at: http://127.0.0.1:8080/#/experiments/2                    

🏃 View run gaudy-elk-245 at: http://127.0.0

In [6]:
HPO_EXPERIMENT_NAME = "random-forest-hyperopt"

experiment = client.get_experiment_by_name(HPO_EXPERIMENT_NAME)
runs = client.search_runs(experiment_ids=[experiment.experiment_id])

min_rmse = float('inf')

for run in runs:
    rmse = run.data.metrics.get("rmse")
    if rmse is not None and rmse < min_rmse:
        min_rmse = rmse

print(f"The best validation RMSE is {min_rmse:.3f}")

The best validation RMSE is 5.335


**Q6. What is the test RMSE of the best model?**

In [7]:
! python register_model.py

🏃 View run magnificent-jay-290 at: http://127.0.0.1:8080/#/experiments/3/runs/e8c9071e824042398ebc15fcc8c11d09
🧪 View experiment at: http://127.0.0.1:8080/#/experiments/3
🏃 View run zealous-doe-295 at: http://127.0.0.1:8080/#/experiments/3/runs/04cdfe60e42a415aae6804a8ce78b232
🧪 View experiment at: http://127.0.0.1:8080/#/experiments/3
🏃 View run fortunate-horse-198 at: http://127.0.0.1:8080/#/experiments/3/runs/92ac850816ad40008acf0f602093e6d3
🧪 View experiment at: http://127.0.0.1:8080/#/experiments/3
🏃 View run secretive-wren-761 at: http://127.0.0.1:8080/#/experiments/3/runs/b68a99991ebf450db2f0f50dfc11d6e5
🧪 View experiment at: http://127.0.0.1:8080/#/experiments/3
🏃 View run judicious-shad-917 at: http://127.0.0.1:8080/#/experiments/3/runs/b0f7d1bb0db5466db4a5068f54159ab5
🧪 View experiment at: http://127.0.0.1:8080/#/experiments/3
Registered model 'nyc-taxi-regressor' already exists. Creating a new version of this model...
2025/01/20 11:39:33 INFO mlflow.store.model_registry.abst

In [8]:
EXPERIMENT_NAME = "random-forest-best-models"

experiment = client.get_experiment_by_name(EXPERIMENT_NAME)
runs = client.search_runs(experiment_ids=[experiment.experiment_id])

min_rmse = float('inf')

for run in runs:
    rmse = run.data.metrics.get("test_rmse")
    if rmse is not None and rmse < min_rmse:
        min_rmse = rmse

print(f"The best test RMSE is {min_rmse:.3f}")

The best test RMSE is 5.567
