#### 05 — RAG Logging (Observability Layer)

This notebook defines and validates the logging schema for the RAG system.
It creates the Delta table, verifies the schema, and demonstrates example queries.

All reusable logging functions (e.g., `log_rag_event`) live in `00_utils.ipynb`.

In [0]:
%run ./00_constants

In [0]:
%run ./00_utils

### Design notes

This table captures full RAG observability:

- What was asked (question)
- What was retrieved (retrieved_chunks)
- What was generated (answer)
- Which models were used (deployments)
- When it happened (created_at)

This enables:
- Debugging hallucinations
- Reproducing failures
- Offline evaluation
- Quality monitoring
- A/B testing

The logging function is intentionally separated into `00_utils.ipynb` so it can be reused by:
- Batch pipelines
- APIs
- Streaming endpoints
- UI demos

In [0]:
# Create the query log table (Unity Catalog)

ensure_rag_log_table(spark, RAG_LOG_TABLE)

✅ RAG log table ensured: databricks_rag_demo.default.rag_query_logs


In [0]:
spark.table(RAG_LOG_TABLE).printSchema()

root
 |-- query_id: string (nullable = true)
 |-- question: string (nullable = true)
 |-- top_k: integer (nullable = true)
 |-- retriever_type: string (nullable = true)
 |-- retrieved_chunks: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- chunk_id: string (nullable = true)
 |    |    |-- doc_id: string (nullable = true)
 |    |    |-- title: string (nullable = true)
 |    |    |-- url: string (nullable = true)
 |    |    |-- chunk_index: integer (nullable = true)
 |    |    |-- category: string (nullable = true)
 |    |    |-- score: double (nullable = true)
 |-- prompt: string (nullable = true)
 |-- answer: string (nullable = true)
 |-- embedding_deployment: string (nullable = true)
 |-- chat_deployment: string (nullable = true)
 |-- created_at: timestamp (nullable = true)



In [0]:
# Inspect recent logs

spark.sql(f"""
SELECT
  created_at,
  query_id,
  question,
  top_k,
  retrieved_chunks[0].url AS top_source
FROM {RAG_LOG_TABLE}
ORDER BY created_at DESC
LIMIT 10
""").display()

created_at,query_id,question,top_k,top_source
2026-01-15T06:12:18.481359Z,827800ec-0f48-42e2-8ba9-77686e23eba4,What is the difference between Spot VM and normal VM?,5,https://github.com/MicrosoftDocs/azure-compute-docs/blob/main/articles/virtual-machines/spot-vms.md
2026-01-15T06:12:08.681494Z,603f8671-d355-4e73-a3bf-b60971f83562,How does Azure handle VM disk persistence?,5,https://github.com/MicrosoftDocs/azure-compute-docs/blob/main/articles/virtual-machines/managed-disks-overview.md
2026-01-15T06:12:01.475803Z,bae458ca-5555-4cf2-b40b-a357c3670e78,What is Azure VM Scale Sets?,5,https://github.com/MicrosoftDocs/azure-compute-docs/blob/main/articles/virtual-machine-scale-sets/flexible-virtual-machine-scale-sets-powershell.md
2026-01-15T06:11:51.153227Z,5fecd526-9c6b-4afd-b5b6-d381c44f149c,How do I resize an Azure virtual machine?,5,https://github.com/MicrosoftDocs/azure-compute-docs/blob/main/articles/virtual-machines/vm-customization.md
2026-01-15T06:11:38.318932Z,aa9f09f7-ffca-47b8-a349-98b7cc2cddd8,What is the difference between a normal Azure VM and an ephemeral VM?,5,https://github.com/MicrosoftDocs/azure-compute-docs/blob/main/articles/virtual-machines/managed-disks-overview.md


In [0]:
# Only use during development for cleanup

# spark.sql(f"""
# TRUNCATE TABLE {RAG_LOG_TABLE}
# """)