#### 05 â€” RAG Logging (Observability Layer)

This notebook defines and validates the logging schema for the RAG system.
It creates the Delta table, verifies the schema, and demonstrates example queries.

All reusable logging functions (e.g., `log_rag_event`) live in `00_utils.ipynb`.

In [0]:
%run ./00_constants

In [0]:
%run ./00_utils

### Design notes

This table captures full RAG observability:

- What was asked (question)
- What was retrieved (retrieved_chunks)
- What was generated (answer)
- Which models were used (deployments)
- When it happened (created_at)

This enables:
- Debugging hallucinations
- Reproducing failures
- Offline evaluation
- Quality monitoring
- A/B testing

The logging function is intentionally separated into `00_utils.ipynb` so it can be reused by:
- Batch pipelines
- APIs
- Streaming endpoints
- UI demos

In [0]:
# Create the query log table (Unity Catalog)

spark.sql(f"""
CREATE TABLE IF NOT EXISTS {RAG_LOG_TABLE} (
  query_id STRING,
  question STRING,
  top_k INT,

  retrieved_chunks ARRAY<STRUCT<
    chunk_id: STRING,
    doc_id: STRING,
    title: STRING,
    url: STRING,
    chunk_index: INT,
    category: STRING,
    score: DOUBLE
  >>,

  prompt STRING,
  answer STRING,
  embedding_deployment STRING,
  chat_deployment STRING,
  created_at TIMESTAMP
)
USING DELTA
""")

DataFrame[]

In [0]:
spark.table(RAG_LOG_TABLE).printSchema()

root
 |-- query_id: string (nullable = true)
 |-- question: string (nullable = true)
 |-- top_k: integer (nullable = true)
 |-- retrieved_chunks: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- chunk_id: string (nullable = true)
 |    |    |-- doc_id: string (nullable = true)
 |    |    |-- title: string (nullable = true)
 |    |    |-- url: string (nullable = true)
 |    |    |-- chunk_index: integer (nullable = true)
 |    |    |-- category: string (nullable = true)
 |    |    |-- score: double (nullable = true)
 |-- prompt: string (nullable = true)
 |-- answer: string (nullable = true)
 |-- embedding_deployment: string (nullable = true)
 |-- chat_deployment: string (nullable = true)
 |-- created_at: timestamp (nullable = true)



In [0]:
# Inspect recent logs

spark.sql(f"""
SELECT
  created_at,
  query_id,
  question,
  top_k,
  retrieved_chunks[0].url AS top_source
FROM {RAG_LOG_TABLE}
ORDER BY created_at DESC
LIMIT 10
""").display()

created_at,query_id,question,top_k,top_source
2026-01-11T20:03:16.140603Z,4bc8e7bf-6803-416b-af93-0859f5b7f4ca,What is difference between a normal Azure VM and ephemeral VM,5,https://learn.microsoft.com/en-us/azure/virtual-machines/ephemeral-os-disks.md
2026-01-11T18:49:47.778871Z,717e7faf-79e6-40d9-9a63-a449ed7a28c3,What is difference between a normal Azure VM and ephemeral VM,5,https://learn.microsoft.com/en-us/azure/virtual-machines/ephemeral-os-disks.md


In [0]:
# Only use during development for cleanup

# spark.sql(f"""
# TRUNCATE TABLE {RAG_LOG_TABLE}
# """)