
# 01\_CI\_CD\_Custom\_Agents\_Logging\_Extensions\_Ecosystem

## 🎯 Why this page

Ship GenAI fast **without breaking prod**: automate tests, gate releases, observe everything.

---

## 🔁 CI/CD for GenAI

**CI (every PR):**

* 🧪 **Unit & contract tests** (prompt vars, tool schemas, signature).
* 🧷 **Determinism**: fixed seeds; pin deps.
* 📊 **Offline eval** on a **fixed gold set** → log to MLflow.
* 🧱 **Build**: model (pyfunc) + Docker image; attach artifacts (prompt bundle, eval report).

**CD (on main):**

* 🧪 **Stage gates**: quality ✅, safety ✅, p95 latency ✅, cost ✅.
* 🐤 **Canary** 1–5% traffic → watch latency/cost/safety.
* 🚀 **Promote** via Registry alias (`champion`), not by image swap.
* 🔙 **Rollback** = flip alias to previous version.

**Secrets:** from vault/IAM, **never** in code or images.

---

## 🛠️ Custom Agents (reliable by design)

* 🧭 **Controller**: plan → act → observe → reflect (max steps + timeouts).
* 🔧 **Tools**: strict **JSON schemas**, helpful error messages.
* 🧠 **Memory/RAG**: budget context; log `retriever_k`, `hit@k`, `context_use_rate`.
* 🛡️ **Safety**: refusal rules, PII guards, tool allow-list, rate limits.
* 🔁 **Resilience**: retries/backoff, fallbacks (cheaper model/shorter prompt), loop detection.
* 📦 **Packaging**: full agent as **MLflow pyfunc** with a stable `predict()` schema.

---

## 🧾 Logging & Observability

* 🏷️ **MLflow Tracking**: params (llm/rag/prompt), metrics (latency p50/p95, tokens, cost, quality, safety), artifacts (prompt templates, eval report, traces).
* 🧵 **Tracing** (waterfall): `preproc → retrieval → rerank → llm → postproc → safety` with `trace_id`/`span_id`.
* 🔤 **Token & cost**: `tokens_in/out`, cached tokens, **cost\_usd** per request.
* 🔔 **Alerts**: p95 latency ↑, error\_rate ↑, cost/min ↑, safety blocks spike.
* 🧽 **Privacy**: redact PII; store **IDs/hashes**, not raw content; set retention.

---

## 🧩 Extensions & Ecosystem (pick what fits)

* 🧭 **Orchestration**: Airflow/Flyte/Ray/LangGraph for pipelines & agents.
* 🧰 **Frameworks**: LangChain/LlamaIndex as building blocks (optional).
* 🔎 **Vector stores**: FAISS/pgvector/Milvus/Pinecone; hybrid BM25+vec + reranker.
* ☸️ **Serving**: FastAPI/Docker/K8s or managed endpoints (cloud).
* 📈 **Telemetry**: Prometheus/Grafana → metrics; OpenTelemetry → traces; log store for JSON logs.
* 🔔 **Registry webhooks**: on stage change → auto-eval, smoke tests, notifications.

---

## 🚦 Release playbook (TL;DR)

1. 👩‍💻 Dev → run offline eval → log to MLflow
2. 🧱 Build model + image
3. 🧪 CI gates pass → **Register** new version
4. 🐤 Canary with guardrails & dashboards
5. ✅ Promote alias to **champion**
6. 🔙 Rollback by alias flip if any SLO/gate breaks

---

## ✅ Checklists

**CI**

* [ ] Lint/tests pass
* [ ] Eval on gold set logged
* [ ] Prompt/agent schema validated
* [ ] Docker build reproducible

**CD**

* [ ] Quality ≥ target
* [ ] Safety pass
* [ ] p95 ≤ SLO
* [ ] \$ ≤ budget
* [ ] Canary OK → promote alias

**Agent reliability**

* [ ] Max steps/timeouts set
* [ ] Retries/backoff/fallbacks
* [ ] Tool JSON schemas validated
* [ ] Loop detection enabled

**Minimal logs (per request)**

* `trace_id`, `model_id@version`, `prompt_id@version`
* `latency_ms_total/llm`, `tokens_in/out`, `cost_usd`
* `retriever_k`, `hit@k`, `context_precision`, `context_use_rate`
* `safety_flags`, `error?`

**Security**

* [ ] Secrets via IAM/vault
* [ ] TLS + auth on endpoints
* [ ] Rate limits/WAF
* [ ] PII redaction & retention policy

---

## 🗣️ One-liner

**“Automate eval-gated releases, package agents as pyfuncs, trace every step, and promote by alias—not by hope.”**
