# Retrieval-Augmented Generation (RAG) Demo Project

## Overview

This project demonstrates a **Retrieval-Augmented Generation (RAG)** pipeline, where we utilize **LlamaIndex** and **OpenAI's GPT models** to index and search through multiple PDF files. The system enables efficient document querying and provides contextual answers based on the indexed content.

## Key Features

- **PDF Parsing**: Extracts text content from PDF files.
- **Indexing**: Creates a vector-based index using LlamaIndex for efficient search and retrieval.
- **Query Processing**: Leverages OpenAI's GPT models to respond to user queries in natural language.
- **Customizable**: Supports integration with other file types and advanced prompt engineering.

## Workflow

1. **Text Extraction**: Use `PyPDF2` to extract text from PDF files.
2. **Index Creation**: Create a vector-based index of the extracted text using LlamaIndex.
3. **Query and Retrieval**: Input a natural language query, and the system searches the index and generates an accurate response.

## Technologies Used

- **LlamaIndex (formerly GPT Index)**: For building and querying the document index.
- **OpenAI GPT Models**: For natural language understanding and response generation.
- **PyPDF2**: For extracting text from PDF files.
- **Python**: Primary programming language for implementation.

## Project Structure




## How to Run

1. Clone the repository and navigate to the project directory.
2. Install the required dependencies:
   ```bash
   pip install -r requirements.txt


In [1]:
import os
from dotenv import load_dotenv
load_dotenv()

True

## API Key Retrieval and Validation:

In [2]:
# Retrieve the API key
api_key = os.getenv('OPENAI_API_KEY')

# Verify if the key is loaded
if not api_key:
    raise ValueError("API key not found. Ensure it is set in the .env file.")

# Set it explicitly as an environment variable (optional)
os.environ['OPENAI_API_KEY'] = api_key

## Load Documents

In [3]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# SimpleDirectory reader reads all the documents in a directory and loads them into memory 
documents = SimpleDirectoryReader('data').load_data()

## Index Creation

In [4]:
# vectorstoreIndex is the class that is used to build the index 
index = VectorStoreIndex.from_documents(documents, show_progress=True)

  from .autonotebook import tqdm as notebook_tqdm
Parsing nodes: 100%|██████████| 21/21 [00:00<00:00, 364.48it/s]
Generating embeddings: 100%|██████████| 32/32 [00:01<00:00, 18.23it/s]


## Query Engine Creation

In [31]:
# as_query_engine is a method that converts the index into a query engine, which can be used to search for similar documents
query_engine = index.as_query_engine()

## Querying

In [None]:
from llama_index.core.response.pprint_utils import pprint_response

# .query method is used to query the index for similar documents 
# When we pass a query to the query method, it returns a response object that contains the similar documents and their scores 
# first What is Attention? is converted into a vector and then the cosine similarity is calculated between the query vector and the document vectors
# The documents are then ranked based on the similarity scores
# The response object contains the similar documents and their scores
# pprint_response is a utility function that prints the response in a readable format
response = query_engine.query('What is Attention?')
pprint_response(response)


Final Response: Attention is a network architecture proposed in the
context that is based solely on attention mechanisms, eliminating the
need for recurrent or convolutional neural networks. This architecture
connects the encoder and decoder through an attention mechanism,
allowing for improved quality, parallelizability, and reduced training
time compared to traditional models.


In [33]:
# show_source=True will print the source of the documents along with the response
pprint_response(response, show_source=True)

Final Response: Attention is a network architecture proposed in the
context that is based solely on attention mechanisms, eliminating the
need for recurrent or convolutional neural networks. This architecture
connects the encoder and decoder through an attention mechanism,
allowing for improved quality, parallelizability, and reduced training
time compared to traditional models.
______________________________________________________________________
Source Node 1/2
Node ID: 5acdd528-eb6c-4c76-8ea1-1197465e1d41
Similarity: 0.8111323015842469
Text: Attention Is All You Need Ashish Vaswani∗ Google Brain
avaswani@google.com Noam Shazeer∗ Google Brain noam@google.com Niki
Parmar∗ Google Research nikip@google.com Jakob Uszkoreit∗ Google
Research usz@google.com Llion Jones∗ Google Research llion@google.com
Aidan N. Gomez∗† University of Toronto aidan@cs.toronto.edu Łukasz
Kaiser ∗ Google Brain ...
______________________________________________________________________
Source Node 2/2
Node ID: 1

## Advanced Querying:

Configures a retriever for similarity-based document search.

Queries the index and prints the response with document sources.

In [None]:
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.indices.postprocessor import SimilarityPostprocessor
from llama_index.core.response.pprint_utils import pprint_response

# VectorIndexRetriever is a retriever that is used to retrieve similar documents from the index, based on the similarity scores 
retriever = VectorIndexRetriever(index,similarity_top_k=4)
# RetrieverQueryEngine is a query engine that is used to query the retriever for similar documents 
query_engine = RetrieverQueryEngine(retriever)

response = query_engine.query('What is YOLO?')
pprint_response(response,show_source=True)

Final Response: YOLO is a unified model for object detection that
frames object detection as a regression problem, predicting bounding
boxes and class probabilities directly from full images in one
evaluation. It processes images in real-time, achieving high speeds of
up to 155 frames per second while maintaining good accuracy. YOLO is
trained on a loss function that directly corresponds to detection
performance and is designed to be fast and efficient by eliminating
the need for a complex detection pipeline.
______________________________________________________________________
Source Node 1/4
Node ID: c58eb4b7-896f-4af3-9ea8-07b7653f1e8b
Similarity: 0.8273606121232412
Text: You Only Look Once: Uniﬁed, Real-Time Object Detection Joseph
Redmon∗, Santosh Divvala∗†, Ross Girshick¶, Ali Farhadi∗† University
of Washington∗, Allen Institute for AI†, Facebook AI Research¶
http://pjreddie.com/yolo/ Abstract We present YOLO, a new approach to
object detection. Prior work on object detection re