# Wrap-up Summary: Using a Remote MLflow Tracking Server in Databricks (AWS)

This notebook demonstrates how to configure and use a remote MLflow Tracking Server from a Databricks notebook on AWS. The steps include:

1. Setting the MLflow tracking URI to point to your remote tracking server.
2. Training and logging a RandomForestRegressor model, parameters, metrics, and artifacts to the remote server.
3. Verifying your run and viewing results in the MLflow Tracking UI.

Replace the placeholder tracking server URI with your actual server address. After running the notebook, you can view your experiment runs, models, and artifacts in the MLflow Tracking Server UI. For more details, see the [Databricks MLflow Tracking Server documentation](https://docs.databricks.com/aws/en/mlflow/tracking-server-configuration.html).

---

# Example: Using a Remote MLflow Tracking Server in Databricks (AWS)

This notebook demonstrates how to configure and use a remote MLflow Tracking Server from a Databricks notebook on AWS. You will learn how to:

* Set the MLflow tracking URI to point to your remote tracking server
* Train and log a model, parameters, metrics, and artifacts to the remote server
* Verify your run and view results in the MLflow Tracking UI

**Prerequisites:**
* You have access to a running MLflow Tracking Server (hosted on EC2, ECS, EKS, or another managed service)
* You have the tracking server URI and necessary credentials (if authentication is enabled)
* The tracking server is reachable from your Databricks workspace

For more details, see the [Databricks MLflow Tracking Server documentation](https://docs.databricks.com/aws/en/mlflow/tracking-server-configuration.html).

## Step 2: Set Up MLflow Tracking URI
Add a code cell to import mlflow, set the tracking URI to a placeholder, and verify it. This is the first actionable step for remote tracking server usage.

In [0]:
import mlflow

# Set the MLflow tracking URI to your remote tracking server
# Replace the URI below with your actual tracking server address
mlflow.set_tracking_uri("http://your-tracking-server:5000")
print(f"Current MLflow tracking URI: {mlflow.get_tracking_uri()}")

# Verify that the tracking URI is set correctly
assert mlflow.get_tracking_uri() == "http://your-tracking-server:5000", "Tracking URI not set correctly!"
print("MLflow tracking URI is set and ready.")

## Step 3: Train and Log a Model to the Tracking Server
Add code to train a RandomForestRegressor on the diabetes dataset, log parameters, metrics, the model, and a sample artifact to the remote MLflow tracking server. This demonstrates end-to-end MLflow usage.

In [0]:
import numpy as np
import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
import mlflow.sklearn
import os

# Load dataset
X, y = load_diabetes(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = RandomForestRegressor(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

# Log to MLflow
with mlflow.start_run() as run:
    mlflow.log_param("n_estimators", 100)
    mlflow.log_metric("mse", mse)
    mlflow.log_metric("r2", r2)
    mlflow.sklearn.log_model(model, "model")
    
    # Log a sample artifact: prediction plot
    plt.figure(figsize=(6,4))
    plt.scatter(y_test, y_pred, alpha=0.7)
    plt.xlabel("Actual")
    plt.ylabel("Predicted")
    plt.title("Actual vs Predicted")
    plot_path = "prediction_plot.png"
    plt.savefig(plot_path)
    mlflow.log_artifact(plot_path)
    plt.close()
    os.remove(plot_path)
    
    print(f"Run ID: {run.info.run_id}")
    print(f"Tracking URI: {mlflow.get_tracking_uri()}")

# Viewing Your Run in the MLflow Tracking Server UI

1. Open your MLflow Tracking Server UI in a browser. The address is typically the tracking URI you set earlier (e.g., `http://your-tracking-server:5000`).
2. Navigate to the appropriate experiment (by default, this is 'Default' unless you specified another experiment name).
3. Find the run using the Run ID printed above.
4. You can view parameters, metrics, the logged model, and any artifacts (such as the prediction plot) in the UI.

For more details, see the [Databricks MLflow Tracking Server documentation](https://docs.databricks.com/aws/en/mlflow/tracking-server-configuration.html).