##### Flow:
    **User Question**  
    ↓  
    **Embed question**  
    ↓  
    **Vector similarity search**  
    ↓  
    **Top-K chunks**  
    ↓  
    **Prompt assembly**  
    ↓  
    **LLM call**  
    ↓  
    **Final answer**

##### Prequisite
You will have two Azure OpenAI deployments:

Azure OpenAI API calls use deployment names (not the underlying model name).
Create in Azure AI Foundry / Azure OpenAI:
	•	Embedding deployment (example name): emb-3-small
	•	Chat deployment (example name): gpt-4o-mini-prod

##### Input and Output:
- **Input:**
    - Table: `databricks_rag_demo.default.azure_compute_doc_embeddings`
    - User question

- **Output:** User's answer

Important design choice: How to do vector search?

We have 3 options:

- Option A — Naive cosine similarity in Spark (we'll start here)
	- Simple
	- Transparent
	- Works for thousands of chunks
	- Good for learning

- Option B — Databricks Vector Search
	- Production-grade
	- Scalable
	- Index-based
	- We can move here later

- Option C — External vector DB (Pinecone, Weaviate, etc.)
	- Overkill right now

We'll start with Option A, then upgrade.

#### 04 - Retrieval-Augmented Generation (RAG)

This notebook implements a basic RAG pipeline:
1. Embed user queries
2. Retrieve relevant document chunks
3. Assemble a grounded prompt
4. Generate an answer using an LLM

In [0]:
%run ./00_install_deps_and_restart

Collecting databricks-vectorsearch
  Downloading databricks_vectorsearch-0.63-py3-none-any.whl.metadata (2.8 kB)
Collecting deprecation>=2 (from databricks-vectorsearch)
  Downloading deprecation-2.1.0-py2.py3-none-any.whl.metadata (4.6 kB)
Downloading databricks_vectorsearch-0.63-py3-none-any.whl (19 kB)
Downloading deprecation-2.1.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: deprecation, databricks-vectorsearch
Successfully installed databricks-vectorsearch-0.63 deprecation-2.1.0
[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m
New packages installed — restarting Python kernel...


In [0]:
%run ./00_constants

In [0]:
%run ./00_utils

In [0]:
# vsc = VectorSearchClient()



Vector Search (production retrieval)

3A) Create the Vector Search endpoint + index (recommended via UI)

Fastest/least pain is UI:
- Databricks → Mosaic AI / Vector Search
- Create endpoint (name example): vs_azure_compute
- Create index
  - Type: Delta Sync index
  - Source table: databricks_rag_demo.default.azure_compute_doc_embeddings
  - Primary key: chunk_id
	- Embedding column: embedding

3B) Query the index in Python

Once endpoint + index exist, use the Vector Search client.


In [0]:
#  Ask a question

k = 5
question = "What is difference between a normal Azure VM and ephemeral VM"
query_embedding = embed_texts(question)
top_chunks = retrieve_top_k(query_embedding, k=k, option="A") # top_chunks is a dictionary
contexts = [c["chunk_text"] for c in top_chunks]

prompt = build_prompt(question, contexts)

Trace(request_id=tr-caea6c40f04b4cd7ae4d84b885f6b2f4)

In [0]:
%run ./00_init_openai_client

In [0]:
# Prequisite: Navigate to Foundry portal: Deploy gpt-4o-mini model

# Call LLM
response = aoai.chat.completions.create(
    model=CHAT_DEPLOYMENT,
    messages=[{"role": "user", "content": prompt}]
)

answer = response.choices[0].message.content
print(answer)

The primary difference between a normal Azure VM and an ephemeral VM lies in how the operating system disks are managed and how they impact performance and cost.

1. **Operating System Disk Management**:
   - **Normal Azure VMs**: Typically use managed disks, which are stored in Azure's dedicated storage. This means that they provide data backup and restore capabilities and are not transient.
   - **Ephemeral VMs**: Utilize ephemeral OS disks, which are created on the local storage of the virtual machine. These disks are temporary and not stored in Azure Storage, meaning they do not provide backup or restore capabilities. If the VM is deleted or fails, the data on the ephemeral disk is lost, but they can be reset or reimaged quickly.

2. **Performance**:
   - **Ephemeral VMs** generally offer lower read/write latency compared to normal VMs because they leverage local storage, allowing for faster performance, especially in scenarios demanding rapid scaling or reimaging of VMs. This is b

Trace(request_id=tr-f04769a1b9824fde87d147cb94e4fd23)

In [0]:
%run ./05_rag_logging

#### 05 — RAG Logging (Observability Layer)

This notebook defines and validates the logging schema for the RAG system.
It creates the Delta table, verifies the schema, and demonstrates example queries.

All reusable logging functions (e.g., `log_rag_event`) live in `00_utils.ipynb`.

In [0]:
rag_event = {
    "question": question,
    "top_k": k,
    "retrieved_chunks": top_chunks,
    "prompt": prompt,
    "answer": answer,
    "embedding_deployment": EMBEDDING_DEPLOYMENT,
    "chat_deployment": CHAT_DEPLOYMENT
}

query_id = log_rag_event(rag_event)
print("Logged query_id:", query_id)

Logged query_id: e26f88a9-6388-4875-be1d-2852a31fe1d8


### Design notes

This table captures full RAG observability:

- What was asked (question)
- What was retrieved (retrieved_chunks)
- What was generated (answer)
- Which models were used (deployments)
- When it happened (created_at)

This enables:
- Debugging hallucinations
- Reproducing failures
- Offline evaluation
- Quality monitoring
- A/B testing

The logging function is intentionally separated into `00_utils.ipynb` so it can be reused by:
- Batch pipelines
- APIs
- Streaming endpoints
- UI demos

DataFrame[]

root
 |-- query_id: string (nullable = true)
 |-- question: string (nullable = true)
 |-- top_k: integer (nullable = true)
 |-- retrieved_chunks: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- chunk_id: string (nullable = true)
 |    |    |-- doc_id: string (nullable = true)
 |    |    |-- title: string (nullable = true)
 |    |    |-- url: string (nullable = true)
 |    |    |-- chunk_index: integer (nullable = true)
 |    |    |-- category: string (nullable = true)
 |    |    |-- score: double (nullable = true)
 |-- prompt: string (nullable = true)
 |-- answer: string (nullable = true)
 |-- embedding_deployment: string (nullable = true)
 |-- chat_deployment: string (nullable = true)
 |-- created_at: timestamp (nullable = true)



created_at,query_id,question,top_k,top_source
2026-01-11T23:15:00.781463Z,6bb2bad3-7cc5-4805-b332-6fa0ffb75219,What is difference between a normal Azure VM and ephemeral VM,5,https://learn.microsoft.com/en-us/azure/virtual-machines/ecasccv5-ecadsccv5-series.md
2026-01-11T23:14:41.135189Z,0ff22b09-ebc9-49e9-b32f-cf25a2c4ca6e,What is difference between a normal Azure VM and ephemeral VM,5,https://learn.microsoft.com/en-us/azure/virtual-machines/ecasccv5-ecadsccv5-series.md
2026-01-11T23:12:47.965793Z,0e2132c9-3324-4f02-b24d-a4dc628bc924,What is difference between a normal Azure VM and ephemeral VM,5,https://learn.microsoft.com/en-us/azure/virtual-machines/ecasccv5-ecadsccv5-series.md
