```{contents}
```
## Logging 

### 1. Definition and Purpose

**Logging** is the systematic recording of events, data, and decisions occurring inside a Generative AI system during **training**, **inference**, and **deployment**.

**Objectives**

| Goal          | Description                                     |
| ------------- | ----------------------------------------------- |
| Observability | Understand what the model and system are doing  |
| Debugging     | Detect failures, hallucinations, latency, drift |
| Monitoring    | Track health, performance, and cost             |
| Auditability  | Ensure compliance and traceability              |
| Optimization  | Improve quality, speed, and reliability         |

---

### 2. Where Logging Occurs in a GenAI Pipeline

```text
User Prompt
   ↓
Preprocessing → Retrieval → Prompt Assembly → LLM Inference → Postprocessing → Output
        ↑              ↑            ↑                ↑              ↑
     Log inputs     Log hits    Log prompts     Log tokens      Log scores
```

**Logging layers**

1. **Application layer** — user events, API calls, latency
2. **Prompt layer** — prompts, templates, system messages
3. **Model layer** — tokens, probabilities, reasoning steps (when available)
4. **Data layer** — retrieved documents, embeddings, sources
5. **Infrastructure layer** — GPU/CPU usage, memory, failures

---

### 3. Core Logging Categories

| Category         | What is Logged                           |
| ---------------- | ---------------------------------------- |
| Interaction logs | user prompt, system prompt, model output |
| Token logs       | input/output tokens, cost, latency       |
| Retrieval logs   | documents retrieved, scores, sources     |
| Quality logs     | feedback, ratings, hallucination flags   |
| Safety logs      | refusals, moderation events              |
| Performance logs | throughput, failures, retries            |

---

### 4. Why Logging is Critical in Generative AI

| Problem           | How Logging Helps                     |
| ----------------- | ------------------------------------- |
| Hallucination     | Compare output with retrieved sources |
| Prompt regression | Track which prompt versions degrade   |
| Model drift       | Observe output quality over time      |
| Latency spikes    | Identify slow components              |
| Cost explosion    | Monitor token usage per request       |

---

### 5. Logging Workflow

```text
Event occurs → Structured log record → Storage → Indexing → Analysis → Alerting
```

**Typical fields**

```json
{
  "timestamp": "2025-12-26T10:00:12Z",
  "user_id": "u42",
  "model": "gpt-4.1",
  "prompt": "...",
  "response": "...",
  "input_tokens": 430,
  "output_tokens": 210,
  "latency_ms": 980,
  "retrieval_docs": ["doc7", "doc19"],
  "hallucination_flag": false
}
```

---

### 6. Demonstration with Code

#### Minimal Python Logging for LLM Calls

```python
import time, json, logging
from openai import OpenAI

client = OpenAI()
logging.basicConfig(filename="genai.log", level=logging.INFO)

def call_llm(prompt):
    start = time.time()
    
    response = client.chat.completions.create(
        model="gpt-4.1-mini",
        messages=[{"role": "user", "content": prompt}]
    )
    
    latency = (time.time() - start) * 1000
    usage = response.usage
    
    record = {
        "prompt": prompt,
        "response": response.choices[0].message.content,
        "input_tokens": usage.prompt_tokens,
        "output_tokens": usage.completion_tokens,
        "latency_ms": round(latency, 2)
    }
    
    logging.info(json.dumps(record))
    return record
```

---

### 7. Types of Logging in GenAI Systems

| Type                 | Description                                           |
| -------------------- | ----------------------------------------------------- |
| Synchronous logging  | Logs during request handling                          |
| Asynchronous logging | Logs streamed to queue or data lake                   |
| Structured logging   | JSON-formatted, machine-readable                      |
| Trace logging        | End-to-end request tracing                            |
| Semantic logging     | Meaning-based signals (e.g., hallucination, toxicity) |
| Feedback logging     | Human ratings and corrections                         |

---

### 8. Logging + Evaluation Loop

```text
Logs → Dataset → Error Analysis → Prompt Fix → Re-deploy → New Logs
```

This creates a **continuous improvement cycle** for the GenAI system.

---

### 9. Advanced Logging Practices

| Practice                  | Benefit                          |
| ------------------------- | -------------------------------- |
| Prompt versioning         | Reproduce behaviors              |
| Embedding logging         | Diagnose retrieval failures      |
| Chain-of-thought masking  | Preserve privacy while debugging |
| Redaction & PII filtering | Security & compliance            |
| Sampling                  | Control storage costs            |

---

### 10. Relationship to Other Concepts

| Concept             | Connection                         |
| ------------------- | ---------------------------------- |
| Monitoring          | Built from logs                    |
| Evaluation          | Uses logs as dataset               |
| Observability       | Logging is foundational            |
| Knowledge grounding | Logs show which knowledge was used |
| Prompt engineering  | Logs guide prompt optimization     |

---

### 11. Summary

**Logging is the nervous system of Generative AI systems.**
Without robust logging, large language models become un-debuggable, unsafe, and unmaintainable.

A production-grade GenAI system is impossible without comprehensive logging.
