# Retrieval Augmented Generation (RAG) Using LlamaIndex

## Objective

The objective of this project is to create a RAG model for a given document. 


## Requirements
*This project works best with an online coding IDE like Google Colab. At times, using a local system can cause an error.*

1. **Python 3.x**

2. **Python Libraries**
    * openai
    * llama-index
    * python-dotenv
    * os
3. **Open AI Key**
4. **An external file to use as the reference**


## Quick Introduction to RAG

### What is RAG?

Retrieval-Augmented Generation (RAG) is a Natural Language Processing (NLP) technique that combines generative Large Language Models (LLMs) with information retrieval systems. RAG is used to improve the LLM's responses by knowledge taken from external sources to generate more accurate and relevant text.


### How does RAG work?

The mechanism of a RAG model is as follows:
* Retrieval: RAG uses search algorithms to query external data like databases, knowledge bases, and web pages.
* Pre-processing: It then prepares the retrieved information by performing tokenization, stemming, and removal of stop words.
* Integration: Finally, it incorporates the pre-processed information into the LLM, allowing it to generate better responses.

***


## Program

### Perform the Necessary Installations (Skip if Completed)

In [None]:
!pip install openai # To access the OpenAI REST API

In [None]:
!pip install llama-index # To access a collection of classes that can be used for LLM applications

In [None]:
!pip install python-dotenv # To read key-value pairs from a '.env' file and set them as environment variables

### Import the Necessary Libraries

In [None]:
import os  
# To interact with the operating system

import openai 
# To access the OpenAI REST API

from llama_index.core import SimpleDirectoryReader 
# To load data from local or remote files into LlamaIndex

from dotenv import load_dotenv
# To load env variable

from llama_index.core import GPTVectorStoreIndex
# An index for storing and retrieving document vectors, potentially using embeddings generated from a model like GPT 

### Access the OpenAI Key

In [None]:
# Access the OpenAI Key from the dotenv.ipynb file

load_dotenv()
key = os.getenv('MYKEY')


In [None]:
openai.api_key = key

### Upload the Required Document

In [None]:
# Upload the document from the 'data' folder

document = SimpleDirectoryReader("data").load_data()

In [None]:
# View the document 

document

### Create an Index for the Given Document

In [None]:
# Create an ndex from the specified document

index= GPTVectorStoreIndex.from_documents(document,show_progress=True)

In [None]:
# View the ndex

index

### Convert the Index Into a Query Engine, Ask a Question, and View the Response

In [None]:
# Convert the index into a query engine to perform searches or queries against the indexed documents

query_engine=index.as_query_engine()

# Processes the input and search the indexed documents for relevant information that matches the query
response=query_engine.query("who os the chairman")

In [None]:
# Print the response

print(response)