
# 01\_Local\_Remote\_Backends\_and\_UI

## 🧠 What are the pieces?

* 🛰️ **Tracking server**: API that receives logs.
* 🗃️ **Backend store**: runs/params/metrics DB (file/SQL).
* 📦 **Artifact store**: big files (prompts, eval sets, traces).
* 🪟 **UI**: compare/search runs in a browser.

---

## 💻 Local (single-dev)

* 📂 **Default**: `./mlruns` (file backend + artifacts).
* ▶️ **UI**: `mlflow ui --port 5000`
* ✅ **Pros**: zero setup, fast.
* ⚠️ **Cons**: hard to share/collab.

---

## ☁️ Remote (team/prod)

* 🧭 **Point clients**: set `MLFLOW_TRACKING_URI`
* 🚀 **Server (example)**
  `mlflow server --backend-store-uri postgresql://... --artifacts-destination s3://bucket/path --host 0.0.0.0 --port 5000`
* 🔐 **Auth**: put behind HTTPS + reverse proxy (basic/token/OIDC), restrict bucket access.

---

## 🗃️ Backend store options

* 🧱 **File** (dev) → `file:/path/to/mlruns`
* 🐘 **Postgres/MySQL** (team/prod) → reliable, concurrent
* 💡 Tip: for teams, prefer **SQL** to avoid file-locking issues.

---

## 📦 Artifact store options

* 💻 **Local/NFS** (simple, small)
* 🪣 **S3 / MinIO** → `s3://bucket/prefix`
* 📦 **GCS** → `gs://bucket/prefix`
* 🔷 **Azure Blob** → `wasbs://container@account.blob.core.windows.net/prefix`
* 💡 Tip: store **prompt templates, eval sets, traces, charts** here.

---

## 🪟 UI essentials

* 🔎 **Search**: `params.model='gpt-4o' and metrics.latency_p95 < 1200`
* 📊 **Compare**: select runs → charts (latency, cost, quality).
* 🏷️ **Tags** help slice runs: `task=qa`, `dataset=v1.2`, `pipeline=rag`.

---

## 🧪 LLM-specific checklist

* 🧾 Log **prompt\_id/hash**, model ID, temp/top\_p, retrieval **k**, reranker.
* ⏱️ Log **latency p50/p95**, tokens in/out, **\$ cost**, cache hit-rate.
* ✅ Log **quality** (EM/F1/preference) & **safety** (toxicity/PII) as metrics.
* 📎 Save **dataset snapshot** + **prompt bundle** as artifacts for reproducibility.

---

## 🧭 Choose a setup (quick matrix)

| Scenario      | Backend store | Artifact store | Notes                  |
| ------------- | ------------- | -------------- | ---------------------- |
| 👤 Solo dev   | File          | Local          | `mlflow ui` is enough  |
| 👥 Small team | Postgres      | S3/MinIO       | Add HTTPS + auth       |
| 🏭 Prod       | Managed SQL   | S3/Blob/GCS    | Backups, IAM, CI gates |

---

## ⚠️ Gotchas

* 🧷 Mixing local backend with remote artifacts can confuse paths—**keep both remote** for teams.
* 🔁 NFS + file backend → locking issues; use **SQL**.
* ⏳ **Clock skew** breaks time sorting—sync NTP.
* 🔒 Don’t expose server without **TLS + auth**.

---

## 🔧 Env vars to remember

* 🌐 `MLFLOW_TRACKING_URI`
* 🧪 `MLFLOW_EXPERIMENT_NAME`
* ☁️ Cloud creds (e.g., `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_DEFAULT_REGION`)

---

## 🗣️ One-liner

* “Pick **file+local** for speed, **SQL+cloud artifacts** for teams, and always use the **UI** to compare latency, cost, and quality.”
