
# 01\_Tracking\_Projects\_Models\_Registry

## 🔎 MLflow **Tracking**

* 🏷️ **Experiments & runs** — one place to log and compare.
* 📥 **Log params**: model ID, temp, top\_p, max\_tokens, k, reranker.
* ⏱️ **Log metrics**: latency (p50/p95), tokens in/out, \$ cost, cache-hit, quality score.
* 📦 **Artifacts**: prompt templates, eval sets, traces, charts, RAG config.
* 🧩 **Tags**: `dataset=v1.3`, `task=qa`, `pipeline=rag`, `prompt=v7`.
* 🪟 **UI**: filter, sort, parallel compare; find best run fast.

> **LLM tip:** Treat prompts as **versioned artifacts**; log a **prompt hash** as a param.

---

## 📦 MLflow **Projects**

* 📁 **Self-contained repo/package** with an `MLproject` file.
* ⚙️ **Entry points** define repeatable commands (e.g., `train`, `eval`).
* 🧪 **Reproducible envs** (conda/virtualenv) + parameterized runs.
* ☁️ **Remote execution** (local → server/Databricks) without changing code.

> **LLM tip:** Make `eval_llm` an entry point that runs your **fixed eval set** across candidates.

---

## 🤖 MLflow **Models**

* 🧪 **Flavors** (pyfunc, transformers, etc.) for portable packaging.
* 🔌 **`pyfunc`** = universal predict API → wrap **pre/post-processing** + LLM call.
* 🧾 **Signatures** & **inputs/outputs** documented → safer serving.
* 🚀 **Serving**: `mlflow models serve`, Docker, or batch/scoring jobs.

> **LLM tip:** Package the **entire RAG pipeline** (retriever + reranker + prompt) as one `pyfunc` model.

---


### ⚙️ **Key Functions with Usage**

| Function                 | Description                                                      | Example Code                                         |
| ------------------------ | ---------------------------------------------------------------- | ---------------------------------------------------- |
| `mlflow.start_run()`     | Start an experiment run context                                  | `mlflow.start_run(run_name="gpt4o_eval")`            |
| `mlflow.log_params()`    | Log all hyperparameters (e.g., temp, top\_p, retriever\_type)    | `mlflow.log_params({"temp": 0.7, "top_k": 20})`      |
| `mlflow.log_metrics()`   | Log numeric metrics like accuracy, BLEU, latency, etc.           | `mlflow.log_metrics({"BLEU": 0.72, "latency": 102})` |
| `mlflow.log_artifacts()` | Save artifacts: prompt templates, tokenizer files, configs, etc. | `mlflow.log_artifacts("./outputs/prompts")`          |
| `mlflow.get_run()`       | Retrieve metadata, params, metrics of a specific run             | `mlflow.get_run(run_id="12345abcde")`                |


---

## 📚 **Model Registry**

* 🧬 **Versioned models** with descriptions, tags, lineage.
* 🧭 **Stages**: `None` → **Staging** → **Production** (→ Archived).
* 🔁 **Rollbacks** in one click; keep audit trail.
* 🔔 **Webhooks/CI**: auto-test on stage change; block on failed checks.
* 🏷️ **Aliases** (e.g., `champion`, `canary`) for stable references.

> **LLM tip:** Promote only models that **pass eval gates** (quality ≥ target, safety pass, cost within budget).

---

## 🧠 Mental model (end-to-end)

**Track runs** ➜ **Compare** ➜ **Package model** ➜ **Register/version** ➜ **Stage-gate tests** ➜ **Promote/Serve** ➜ **Monitor & iterate**.

---

## ✅ Quick conventions

* 🧪 **Experiment naming**: `llm-<task>-<dataset>` (e.g., `llm-qa-finance-v1`).
* 🏷️ **Run tags**: `prompt=v8`, `embed=bge-base`, `retriever=faiss`, `reranker=cross-encoder`.
* 📂 **Artifacts layout**: `prompts/`, `eval/`, `traces/`, `reports/`.
* 🔒 **Governance**: log safety metrics (**toxicity/PII**), attach eval report to the **model version**.

---

## 🎯 One-liners

* **Tracking**: “Make every LLM tweak **measurable & comparable**.”
* **Projects**: “Run the same experiment **anywhere, identically**.”
* **Models**: “Ship your **whole pipeline** as one portable unit.”
* **Registry**: “Control **who/what** goes to prod—with **versions, stages, and rollbacks**.”


In [None]:
### ✅ Real-Time LangChain / LangGraph Example

import mlflow
from langchain.chat_models import ChatOpenAI

# Start run
with mlflow.start_run(run_name="retrieval-qa-agent"):

    # Log parameters
    mlflow.log_params({
        "model_name": "gpt-4o",
        "temperature": 0.2,
        "retriever": "Chroma",
    })

    # Your LangChain logic
    llm = ChatOpenAI(model="gpt-4o", temperature=0.2)
    result = llm.invoke("Explain LangGraph")

    # Log a metric and output
    mlflow.log_metrics({"response_time": 1.2})
    with open("response.txt", "w") as f:
        f.write(result.content)
    mlflow.log_artifact("response.txt")

