# Hands-On Lab: Building a Retrieval-Augmented GenAI App

## Scenario
You are a data engineer working for a knowledge-intensive enterprise. Your team has been asked to build a Retrieval-Augmented Generation (RAG) application that allows employees to query internal documents such as compliance manuals, product specifications, and policy handbooks using large language models (LLMs).

The challenge is to ensure that responses are not only fluent but also factually accurate and contextually grounded. This lab mirrors a real-world use case where factual accuracy and compliance are critical.

To achieve this, you must:
- Connect a retriever to query a knowledge base stored in a vector database  
- Integrate the retriever with an LLM to generate natural, context-aware answers  
- Ensure hallucinations are minimized by grounding responses in the retrieved content  
- Provide outputs in a format that aligns with organizational standards and compliance requirements  

## Objective
By the end of this lab, you will be able to:
- Build a LangChain pipeline that combines a retriever with an LLM  
- Configure the retriever to pull relevant context from a vector store  
- Generate safe, accurate, and contextual answers aligned with enterprise knowledge sources  
- Demonstrate how retrieval augmentation solves the problem of hallucinations and improves user trust in AI-driven applications  


### Step 1: Install Required Libraries
We begin by installing the latest versions of LangChain, LangChain Community integrations, LangChain OpenAI, and FAISS. 
These packages provide the core framework, integrations, and vector search functionality needed for building our Retrieval-Augmented Generation (RAG) application.

> **Note:** After installation, we restart the Python runtime to ensure the environment is updated with the new packages.


In [0]:
%pip install -U langchain langchain-community langchain-openai faiss-cpu
%restart_python


%md
### Step 2: Import Required Libraries and Configure API Key
In this step, we import the necessary LangChain modules and supporting libraries to build our Retrieval-Augmented Generation (RAG) application.

- **ChatOpenAI**: Provides access to OpenAI’s GPT models for natural language responses.  
- **OpenAIEmbeddings**: Converts documents and queries into vector embeddings for semantic search.  
- **RetrievalQA**: A LangChain chain that combines a retriever with an LLM to produce grounded answers.  
- **FAISS**: A vector database used for efficient similarity search over embedded documents.  
- **PromptTemplate**: Defines structured prompts for consistent and safe responses.  

Finally, we set the `OPENAI_API_KEY` environment variable so the application can authenticate with OpenAI services.  


In [0]:
#   Importing required libraries (Databricks compatible)
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.chains import RetrievalQA
from langchain_community.vectorstores import FAISS
from langchain.prompts import PromptTemplate
import os
# Set your OpenAI API key
os.environ["OPENAI_API_KEY"] = "Use your own Open AI Key"


%md
### Step 3: Prepare and Index Documents with FAISS
In this step, we simulate internal enterprise documents such as compliance policies, product specifications, and employee handbooks.  

To make these documents searchable, we:  
- Convert the text into **embeddings** using OpenAIEmbeddings, which transform each document into a vector representation.  
- Store the resulting vectors in a **FAISS vector store**, allowing us to perform semantic search queries later in the pipeline.  

This step ensures that when the user asks a question, the system can retrieve the most relevant pieces of information to ground the LLM’s answer.  


In [0]:
# Preparing a FAISS vector store
documents = [
    "Policy: All employees must complete annual compliance training by June.",
    "Product spec: The Model X100 device must be tested every 12 months.",
    "Handbook: Employees are entitled to 15 days of paid leave per year."
]

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_texts(documents, embeddings)


%md
### Step 4: Configure the Retriever
Next, we set up a **retriever**, which acts as the search mechanism over the FAISS vector store.  

- The retriever identifies the most relevant documents based on semantic similarity to the user’s query.  
- We configure it to return the **top 2 results** (`k=2`) for each query, striking a balance between accuracy and performance.  

This ensures that the LLM will have the right context to generate accurate and grounded responses.  


In [0]:
# Configuring the retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})


%md
### Step 5: Create a Safety-Focused Prompt
To reduce the risk of hallucinations and ensure compliance, we design a **prompt template** with explicit safety instructions.  

This template enforces the following rules:  
- Clearly defines the model’s **role** as a compliance assistant.  
- Instructs the LLM to base its answers strictly on the provided **context**.  
- Requires the model to state *“the information is not available”* if the answer cannot be found in the context.  

By using this approach, we ensure that the model produces accurate, safe, and policy‑aligned responses.  


In [0]:
#  Creating a safety-focused prompt
prompt_template = PromptTemplate(
    input_variables=["context", "question"],
    template="""You are a compliance assistant.
Use the provided context to answer the question truthfully and clearly.
If the answer cannot be found in the context, state that the information is not available.
Context: {context}
Question: {question}
Answer:"""
)


### Step 6: Build the Retrieval-Augmented Generation (RAG) Pipeline
Now we bring all the components together to create a **Retrieval-Augmented Generation (RAG) pipeline**.  

- **ChatOpenAI** initializes the LLM (using GPT‑4 in this case) with a deterministic temperature of 0 to reduce randomness.  
- **RetrievalQA** builds a chain that connects the retriever with the LLM.  
- The **prompt template** ensures the model answers truthfully, stays within the context, and avoids hallucinations.  

This pipeline is now capable of retrieving relevant context and generating accurate, safe responses.  


In [0]:
#  Building the RAG pipeline
llm = ChatOpenAI(model="gpt-4", temperature=0)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type="stuff",
    chain_type_kwargs={"prompt": prompt_template},
    verbose=True
)


%md
### Step 7: Run a Test Query
Finally, we run a compliance-related query through the RAG pipeline to validate its performance.  

In this example, we ask:  
**“How many days of paid leave are employees entitled to?”**

- The retriever searches the FAISS vector store for the most relevant documents.  
- The LLM uses the retrieved context and the safety-focused prompt to generate a grounded, compliant response.  
- The output is both factually accurate and aligned with organizational standards.  


In [0]:
# Running a query through the RAG pipeline
query = "How many days of paid leave are employees entitled to?"
response = qa_chain.run(query)

print("User Query:", query)
print("RAG Response:", response)
