<center><img src="https://upload.wikimedia.org/wikipedia/commons/e/e9/4_RGB_McCombs_School_Brand_Branded.png" width="300" height="100"/>
  <img src="https://mma.prnewswire.com/media/1458111/Great_Learning_Logo.jpg?p=facebook" width="200" height="100"/></center>

<center><font size=10>Artificial Intelligence and Machine Learning</center></font>
<center><font size=6>Natural Language Processing with Generative AI - Retrieval Augmented Generation</center></font>

<center><img src="https://i.ibb.co/pBF9nKpf/apple.png" width="720"></center>

<center><font size=6>Apple HBR Report Document Q&A</center></font>

# RAG LLM Application Notebook

This notebook demonstrates the complete workflow of a Retrieval-Augmented Generation (RAG) LLM application. It covers data loading, chunking, embedding, vector database setup, question answering, and evaluation.

## 1. Setup and Library Installation

First, we need to install all the necessary Python libraries. This ensures that all dependencies for data processing, LLM interaction, and vector database operations are met. If you are running this in a Colab environment, these commands will typically install the packages.

**Note**: If you are running this locally, ensure you have Ollama installed and the `llama3.2` model pulled (`ollama pull llama3.2`).

In [1]:
# Install necessary packages
# %pip install -r requirements.txt


print("All specified packages installed (or upgraded) successfully!")
print("Please ensure the model is pulled for Ollama with the command:")

All specified packages installed (or upgraded) successfully!
Please ensure the model is pulled for Ollama with the command:


## 2. Import Necessary Modules

We import the `RAG_LLM` class from `functions.py` and constants from `config.py`. Make sure `functions.py`, `config.py`, and `prompt_templates.py` are in the same directory as this notebook.

In [14]:
from functions import RAG_LLM
from config import APPLE_PDF_PATH, DEFAULT_K_RETRIEVER, DEFAULT_MAX_TOKENS, DEFAULT_TEMPERATURE, DEFAULT_OLLAMA_MODEL, DEFAULT_GEMINI_MODEL
import os

print("Modules imported successfully.")

Modules imported successfully.


## 3. Initialize the RAG_LLM System

Here, we create an instance of our `RAG_LLM` class. This object will manage the entire RAG pipeline, including data loading, processing, retrieval, and LLM interaction.

In [12]:
# Initialize the RAG_LLM class
rag_system = RAG_LLM()
print("RAG_LLM system initialized.")

Ollama client initialized.
Google Generative AI SDK configured successfully.
Default model for this instance is set to: 'qwen3:4b-instruct'
RAG_LLM initialized.
RAG_LLM system initialized.


In [16]:

rag_system.set_model(DEFAULT_OLLAMA_MODEL)
# rag_system.set_model(DEFAULT_GEMINI_MODEL)



Default model for this instance has been changed to: 'qwen3:4b-instruct'


## 4. Load the PDF Document

We load the `HBR_How_Apple_Is_Organized_For_Innovation.pdf` document. Ensure this PDF file is available in the working directory.

In [17]:
# Load the PDF document
documents = rag_system.load_data(pdf_path=APPLE_PDF_PATH)
if not documents:
    print("Failed to load documents. Please check the PDF path and file existence.")
else:
    print("PDF document loaded successfully.")

Loading data from: HBR_How_Apple_Is_Organized_For_Innovation.pdf
Successfully loaded 11 pages.
PDF document loaded successfully.


## 5. Chunk the Loaded Data

The loaded document is chunked into smaller, overlapping segments. This is crucial for efficient retrieval and to fit content within the LLM's context window.

In [18]:
# Chunk the loaded data
document_chunks = rag_system.chunk_data(documents)
if not document_chunks:
    print("Failed to chunk documents.")
else:
    print("Documents chunked successfully.")

Chunking data with chunk_size=1024, chunk_overlap=20
Created 16 chunks.
Documents chunked successfully.


## 6. Create Embedding Model

An embedding model (SentenceTransformer) is initialized to convert text chunks into numerical vectors, enabling semantic search.

In [19]:
# Create embedding model
rag_system.create_embeddings()
if not rag_system.embedding_model:
    print("Failed to create embedding model.")
else:
    print("Embedding model created successfully.")

Initializing embedding model: mixedbread-ai/mxbai-embed-large-v1
Embedding model initialized successfully.
Embedding model created successfully.


## 7. Set Up the Vector Database

The Chroma vector database is set up using the chunked documents and the embedding model. This database will store the embeddings and facilitate quick retrieval of relevant context.

In [20]:
# Set up the vector database
rag_system.setup_vector_database(document_chunks=document_chunks)
if not rag_system.vectorstore:
    print("Failed to set up vector database.")
else:
    print("Vector database set up and retriever initialized.")

Setting up vector database in: vector_db_1024
Vector database loaded from existing directory.
Retriever initialized.
Vector database set up and retriever initialized.


## 8. Demonstrate Question Answering with RAG

Now, we can ask questions and see how the RAG system retrieves relevant information and generates answers based on the loaded document.

In [21]:
# Example Query 1
user_input_1 = "Who are the authors of this article and who published this article ?"
print(f"\nQuery 1: {user_input_1}")
llm_response_1 = rag_system.get_answer(user_input_1)
print(f"Response 1: \n{llm_response_1}")


Query 1: Who are the authors of this article and who published this article ?
Retrieving 3 relevant documents for the query.
RAG prompt created.
Generating LLM response using model: qwen3:4b-instruct
Ollama response generated.
Response 1: 
The authors of the article are Joel M. Podolny and Morten T. Hansen. The article was published by Harvard Business Review.


In [10]:
# Example Query 1
user_input_1 = "Who are the authors of this article and who published this article ?"
print(f"\nQuery 1: {user_input_1}")
llm_response_1 = rag_system.get_answer(user_input_1)
print(f"Response 1: \n{llm_response_1}")


Query 1: Who are the authors of this article and who published this article ?
Retrieving 3 relevant documents for the query.
RAG prompt created.
Generating LLM response using model: gemini-1.5-flash-latest
Gemini response generated.
Response 1: 
Joel M. Podolny and Morten T. Hansen are the authors.  Harvard Business Review published the article.



In [11]:
# Example Query 2
user_input_2 = "List down the three leadership characteristics in bulleted points and explain each one of the characteristics under two lines."
print(f"\nQuery 2: {user_input_2}")
# Adjust max_tokens to allow for a more complete answer for a list
llm_response_2 = rag_system.get_answer(user_input_2, max_tokens=150, temperature=0.1)
print(f"Response 2: \n{llm_response_2}")


Query 2: List down the three leadership characteristics in bulleted points and explain each one of the characteristics under two lines.
Retrieving 3 relevant documents for the query.
RAG prompt created.
Generating LLM response using model: gemini-1.5-flash-latest
Gemini response generated.
Response 2: 
* Deep expertise: Apple leaders possess extensive knowledge in their respective fields.  This allows for informed decision-making and effective guidance.
* Immersion in details: Leaders are deeply involved in the specifics of their functions. This ensures a thorough understanding of ongoing projects and potential challenges.
* Collaborative debate:  Leaders actively engage in discussions with colleagues across various teams. This fosters innovation and efficient problem-solving.



In [12]:
# Example Query 3 (expected to be "I don't know" if not in context)
user_input_3 = "Can you explain specific examples from the article where Apple's approach to leadership has led to successful innovations?"
print(f"\nQuery 3: {user_input_3}")
llm_response_3 = rag_system.get_answer(user_input_3)
print(f"Response 3: \n{llm_response_3}")


Query 3: Can you explain specific examples from the article where Apple's approach to leadership has led to successful innovations?
Retrieving 3 relevant documents for the query.
RAG prompt created.
Generating LLM response using model: gemini-1.5-flash-latest
Gemini response generated.
Response 3: 
I don't know



## 9. Demonstrate Output Evaluation (LLM-as-a-Judge)

Finally, we demonstrate how the RAG system can evaluate its own answers for 'groundedness' (adherence to context) and 'relevance' (how well it answers the question).

In [13]:
# Evaluate Query 1
user_input_1 = "Who are the authors of this article and who published this article ?"
print("\nEvaluating Query 1:")
rag_system.calculate_rating(question=user_input_1)


Evaluating Query 1:

--- Calculating Ratings for Question: 'Who are the authors of this article and who published this article ?' ---
Retrieving 3 relevant documents for the query.
RAG prompt created.
Generating LLM response using model: gemini-1.5-flash-latest
Gemini response generated.
Rating overall answer quality (groundedness and relevance)...
Rating groundedness...
Retrieving 3 relevant documents for the query.
Generating LLM response using model: gemini-1.5-flash-latest
Gemini response generated.
Rating relevance...
Retrieving 3 relevant documents for the query.
Generating LLM response using model: gemini-1.5-flash-latest
Gemini response generated.

--- Results ---
Question: 
 Who are the authors of this article and who published this article ?

Answer: 
 Joel M. Podolny and Morten T. Hansen are the authors.  Harvard Business Review published the article.


Groundedness Rating: 
 Steps to evaluate the answer:

1. **Identify the authors:** Check if the answer correctly identifie

In [14]:
# Evaluate Query 2
user_input_2 = "List down the three leadership characteristics in bulleted points and explain each one of the characteristics under two lines."
print("\nEvaluating Query 2:")
# Using parameters that yield a better answer for evaluation
rag_system.calculate_rating(question=user_input_2)


Evaluating Query 2:

--- Calculating Ratings for Question: 'List down the three leadership characteristics in bulleted points and explain each one of the characteristics under two lines.' ---
Retrieving 3 relevant documents for the query.
RAG prompt created.
Generating LLM response using model: gemini-1.5-flash-latest
Gemini response generated.
Rating overall answer quality (groundedness and relevance)...
Rating groundedness...
Retrieving 3 relevant documents for the query.
Generating LLM response using model: gemini-1.5-flash-latest
Gemini response generated.
Rating relevance...
Retrieving 3 relevant documents for the query.
Generating LLM response using model: gemini-1.5-flash-latest
Gemini response generated.

--- Results ---
Question: 
 List down the three leadership characteristics in bulleted points and explain each one of the characteristics under two lines.

Answer: 
 * Deep expertise: Apple leaders possess extensive knowledge in their respective fields.  This allows for infor

In [15]:
# Evaluate Query 3
user_input_3 = "Can you explain specific examples from the article where Apple's approach to leadership has led to successful innovations?"
print("\nEvaluating Query 3:")
rag_system.calculate_rating(question=user_input_3)


Evaluating Query 3:

--- Calculating Ratings for Question: 'Can you explain specific examples from the article where Apple's approach to leadership has led to successful innovations?' ---
Retrieving 3 relevant documents for the query.
RAG prompt created.
Generating LLM response using model: gemini-1.5-flash-latest
Gemini response generated.
Rating overall answer quality (groundedness and relevance)...
Rating groundedness...
Retrieving 3 relevant documents for the query.
Generating LLM response using model: gemini-1.5-flash-latest
Gemini response generated.
Rating relevance...
Retrieving 3 relevant documents for the query.
Generating LLM response using model: gemini-1.5-flash-latest
Gemini response generated.

--- Results ---
Question: 
 Can you explain specific examples from the article where Apple's approach to leadership has led to successful innovations?

Answer: 
 I don't know


Groundedness Rating: 
 Steps to evaluate the answer based on the metric:

1. **Identify the question**: