
---

# 🧾 Structured Logging & Request IDs

> **Intent** → Produce **machine-parseable logs** with **correlation IDs** so you can trace a request across services, debug fast, and power alerts/dashboards.

---

## 🎯 Goals

* **Structured** (JSON) logs, not free-text
* **Consistent fields** → easy search & aggregation
* **Correlation** across services via **Request ID / Trace ID**
* **Safe**: no secrets/PII; redaction in place

---

## 🧱 Structured Logging (why)

* Parse into columns: `ts`, `level`, `msg`, `route`, `status`, `latency_ms`, `request_id`, `user_id?`, `service`
* Works with ELK, Loki, Cloud Logging, Datadog, Splunk
* Enables **metrics from logs** (error rate, p95 latency)

---

## 🧬 Request / Correlation IDs

* Generate or accept `X-Request-ID` (or W3C `traceparent`)
* **Attach to every log line** within the request lifecycle
* Forward downstream in **HTTP headers**; include in responses
* On errors, return the **request\_id** to users for support

---

## 🧪 What to Log (per request)

* **Start**: method, path, request\_id, user/tenant (if known)
* **Important events**: auth result, DB/HTTP spans, retries, rate-limit decisions
* **End**: status, latency, response size, error code (if any)

---

## 🧯 Errors & Exceptions

* Log **once** at the edge with stack trace + request\_id
* Add **error class**, **message**, **where** (route/handler)
* Avoid duplicate logs across middlewares/handlers

---

## 🔐 Redaction & Safety

* Never log secrets (tokens, passwords, API keys)
* Hash or drop **PII**; mask emails/phones if needed
* Enforce redaction in a **processor** (central place)

---

## 🎚️ Levels & Volume

* `INFO` for high-level events; `DEBUG` only in dev or sampled
* `WARN` for user-correctable; `ERROR` for failed requests
* Sample verbose logs on hot routes to control cost/noise

---

## 🔗 Propagation & Context

* Pass **request\_id** into dependencies (DB/HTTP clients)
* Include **span\_id/trace\_id** if using tracing; align field names
* Add **service**, **env**, **version** for multi-service clarity

---

## 📤 Destinations

* Write JSON to **stdout** (12-factor) → ship via sidecar/agent
* Keep timestamps in **UTC**; consistent format
* Ensure backpressure or buffering won’t block request handling

---

## 📊 Ops & Dashboards

* Build views: **errors by route**, **p95 latency**, **top 404s**, **rate-limit hits**
* Alert on spikes in `ERROR`, sustained `WARN`, or long latencies
* Correlate logs with **metrics** and **traces** for root cause

---

## ✅ Outcome

You get **searchable, safe, and correlated** logs: every request carries a **request\_id**, incidents are traceable, and dashboards/alerts become **trustworthy and actionable**.

---
