## Q&A Application using Retrieval-Augmented Generation (RAG)

#### loading the environment variables

In [4]:
import os 
from dotenv import load_dotenv
load_dotenv()


True

In [5]:
os.environ["GOOGLE_API_KEY"]= os.getenv("GOOGLE_API_KEY")

#### Initializing the embedding model devloped by google 

In [6]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
embeddings = GoogleGenerativeAIEmbeddings(model='models/embedding-001')
embeddings

  from .autonotebook import tqdm as notebook_tqdm


GoogleGenerativeAIEmbeddings(client=<google.ai.generativelanguage_v1beta.services.generative_service.client.GenerativeServiceClient object at 0x0000016AF65D9040>, model='models/embedding-001', task_type=None, google_api_key=SecretStr('**********'), credentials=None, client_options=None, transport=None, request_options=None)

#### Testing the embedding (768 dimensions)

In [7]:
test_vector = embeddings.embed_query("RAG")

In [8]:
test_vector

[0.03968743234872818,
 -0.023259472101926804,
 -0.04598534479737282,
 -0.006275630556046963,
 0.00944777112454176,
 0.018384858965873718,
 0.008569265715777874,
 -0.027926845476031303,
 0.00962902046740055,
 0.05438072234392166,
 0.036491867154836655,
 0.06871284544467926,
 -0.04694839194417,
 -0.004015402868390083,
 0.0382549986243248,
 -0.026424698531627655,
 0.008834834210574627,
 -0.0006591356359422207,
 0.02056158520281315,
 -0.0164337120950222,
 -0.006077446509152651,
 0.05243970826268196,
 -0.030308790504932404,
 -0.01289336383342743,
 0.0027705233078449965,
 -0.009012932889163494,
 0.028706952929496765,
 -0.03035874478518963,
 0.055819593369960785,
 0.04671228677034378,
 -0.037956252694129944,
 -0.024845251813530922,
 -0.051923301070928574,
 0.0031622503884136677,
 0.0353638119995594,
 -0.05886179953813553,
 0.020227260887622833,
 0.01987406611442566,
 -0.0023426059633493423,
 0.008760765194892883,
 0.011714229360222816,
 -0.045636944472789764,
 -0.03992762044072151,
 -0.009556

In [9]:
len(test_vector)

768

#### Loading the document

In [10]:
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader(file_path="C:/Q A/document/RAG.pdf")
data = loader.load()
data


[Document(metadata={'source': 'C:/Q A/document/RAG.pdf', 'page': 0}, page_content='Adaptive-RAG: Learning to Adapt Retrieval-Augmented\nLarge Language Models through Question Complexity\nSoyeong Jeong1Jinheon Baek2Sukmin Cho1Sung Ju Hwang1,2Jong C. Park1*\nSchool of Computing1Graduate School of AI2\nKorea Advanced Institute of Science and Technology1,2\n{starsuzi,jinheon.baek,nelllpic,sjhwang82,jongpark}@kaist.ac.kr\nAbstract\nRetrieval-Augmented Large Language Models\n(LLMs), which incorporate the non-parametric\nknowledge from external knowledge bases into\nLLMs, have emerged as a promising approach\nto enhancing response accuracy in several tasks,\nsuch as Question-Answering (QA). However,\neven though there are various approaches deal-\ning with queries of different complexities, they\neither handle simple queries with unnecessary\ncomputational overhead or fail to adequately\naddress complex multi-step queries; yet, not\nall user requests fall into only one of the sim-\nple or com

#### chuncking the loaded documents 

In [11]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
spiltter = RecursiveCharacterTextSplitter(chunk_size = 500,chunk_overlap=50)
chunks = spiltter.split_documents(data)
chunks

[Document(metadata={'source': 'C:/Q A/document/RAG.pdf', 'page': 0}, page_content='Adaptive-RAG: Learning to Adapt Retrieval-Augmented\nLarge Language Models through Question Complexity\nSoyeong Jeong1Jinheon Baek2Sukmin Cho1Sung Ju Hwang1,2Jong C. Park1*\nSchool of Computing1Graduate School of AI2\nKorea Advanced Institute of Science and Technology1,2\n{starsuzi,jinheon.baek,nelllpic,sjhwang82,jongpark}@kaist.ac.kr\nAbstract\nRetrieval-Augmented Large Language Models\n(LLMs), which incorporate the non-parametric\nknowledge from external knowledge bases into'),
 Document(metadata={'source': 'C:/Q A/document/RAG.pdf', 'page': 0}, page_content='knowledge from external knowledge bases into\nLLMs, have emerged as a promising approach\nto enhancing response accuracy in several tasks,\nsuch as Question-Answering (QA). However,\neven though there are various approaches deal-\ning with queries of different complexities, they\neither handle simple queries with unnecessary\ncomputational overhea

In [12]:
chunks[0].page_content

'Adaptive-RAG: Learning to Adapt Retrieval-Augmented\nLarge Language Models through Question Complexity\nSoyeong Jeong1Jinheon Baek2Sukmin Cho1Sung Ju Hwang1,2Jong C. Park1*\nSchool of Computing1Graduate School of AI2\nKorea Advanced Institute of Science and Technology1,2\n{starsuzi,jinheon.baek,nelllpic,sjhwang82,jongpark}@kaist.ac.kr\nAbstract\nRetrieval-Augmented Large Language Models\n(LLMs), which incorporate the non-parametric\nknowledge from external knowledge bases into'

## Set up the Pinecone vectorstore 

In [13]:
os.environ["PINECONE_API_KEY"] = os.getenv("PINECONE_API_KEY")

In [14]:
index_name = "test1"

In [15]:
from langchain_pinecone import PineconeVectorStore
store = PineconeVectorStore.from_documents(
    chunks,
    index_name=index_name,
    embedding=embeddings
)

#### Testing the vectorstore by similarity search 

In [16]:
query = "what is the Retreival Augumented Generation?"
results = store.similarity_search(query,k=2)
results


[Document(id='bb699868-16e0-44fa-a217-bb62f6639c21', metadata={'page': 13.0, 'source': 'C:/Q A/document/RAG.pdf'}, page_content='threshold, and the answer generation follows.\n5) Adaptive-RAG. This is our model that adap-\ntively selects the retrieval-augmented generation\nstrategy, smoothly oscillating between the non-\nretrieval, single-step approach, and multi-step ap-\nproaches4without architectural changes, based on\nthe query complexity assessed by the classifier.\n6) Multi-step Approach. This approach (Trivedi\net al., 2023) is the multi-step retrieval-augmented\nLLM, which iteratively accesses both the retriever'),
 Document(id='98aba0ba-770b-4996-aa2e-b8be999963e5', metadata={'page': 13.0, 'source': 'C:/Q A/document/RAG.pdf'}, page_content='augmented generation models, we perform experi-\nments with a single run. Finally, we implemented\nmodels using PyTorch (Paszke et al., 2019) and\nTransformers library (Wolf et al., 2020).\nB Additional Experimental Results\nPerformance vs 

In [17]:
results[0].page_content

'threshold, and the answer generation follows.\n5) Adaptive-RAG. This is our model that adap-\ntively selects the retrieval-augmented generation\nstrategy, smoothly oscillating between the non-\nretrieval, single-step approach, and multi-step ap-\nproaches4without architectural changes, based on\nthe query complexity assessed by the classifier.\n6) Multi-step Approach. This approach (Trivedi\net al., 2023) is the multi-step retrieval-augmented\nLLM, which iteratively accesses both the retriever'

In [18]:
query1="what is Adaptive Retrieval?"
results1=store.similarity_search(query1,k=2)
results1

[Document(id='e5b53460-702b-4f55-a8db-99d2e0f938ea', metadata={'page': 2.0, 'source': 'C:/Q A/document/RAG.pdf'}, page_content='if the tokens within generated sentences have low\nconfidence. However, the aforementioned methods\noverlooked the fact that, in real-world scenarios,\nqueries are of a wide variety of complexities. There-\nfore, it would be largely inefficient to iteratively\naccess LLMs and retrievers for every query, which\nmight be simple enough with a single retrieval step\nor even only with an LLM itself.\nAdaptive Retrieval To handle queries of varying\ncomplexities, the adaptive retrieval strategy aims to'),
 Document(id='2e74a7df-81f5-4554-ba55-5c420a2cf3e2', metadata={'page': 2.0, 'source': 'C:/Q A/document/RAG.pdf'}, page_content='applies the same fixed operations to every query\nregardless of its complexity but also necessitates\nadditional specific training to LMs. Concurrent to\nour work, Asai et al. (2024) suggested training a so-\nphisticated model to dynamical

In [19]:
results1[0].page_content

'if the tokens within generated sentences have low\nconfidence. However, the aforementioned methods\noverlooked the fact that, in real-world scenarios,\nqueries are of a wide variety of complexities. There-\nfore, it would be largely inefficient to iteratively\naccess LLMs and retrievers for every query, which\nmight be simple enough with a single retrieval step\nor even only with an LLM itself.\nAdaptive Retrieval To handle queries of varying\ncomplexities, the adaptive retrieval strategy aims to'

#### generate content based on query from the vector store 

In [21]:
from langchain.chains import RetrievalQA
from langchain_google_genai import ChatGoogleGenerativeAI
llm=ChatGoogleGenerativeAI(model="gemini-1.5-flash")

In [22]:
qa_chain = RetrievalQA.from_chain_type(llm=llm,chain_type="stuff", retriever=store.as_retriever())

In [23]:
qurey="what is Single-step Approach for QA"
qa_chain.invoke(query)

{'query': 'what is the Retreival Augumented Generation?',
 'result': "Retrieval-Augmented Generation (RAG) is a technique that combines information retrieval with language models to improve the quality and accuracy of generated text. Here's a breakdown:\n\n1. **Information Retrieval:** RAG starts by searching a knowledge base or a large corpus of text for relevant information related to the user's query. This could involve using techniques like keyword search, semantic search, or passage retrieval.\n\n2. **Language Model:** The retrieved information is then fed into a language model (LM), such as GPT-3 or BERT. The LM uses its knowledge of language and its ability to generate text to create a coherent and informative answer based on the retrieved information.\n\nIn essence, RAG aims to bridge the gap between the vast information available in the world and the ability of language models to generate human-like text. It allows LMs to access and incorporate external knowledge, leading to m

In [24]:
qa_chain.run(query1)

  qa_chain.run(query1)


'Adaptive Retrieval is a strategy designed to handle queries of varying complexities. It aims to avoid applying the same fixed operations to every query, regardless of its complexity. This approach avoids the inefficiency of iteratively accessing LLMs and retrievers for every query, which might be simple enough with a single retrieval step or even only with an LLM itself. \n'

In [25]:
query2 = " what is Complex Multi-step Approach"

In [26]:
qa_chain.run(query2)

'The provided context doesn\'t give a specific definition of "Complex Multi-step Approach". It only mentions it as a method used in a study on retrieval-augmented question answering. \n\nTo understand what "Complex Multi-step Approach" refers to, you would need more context from the original research paper or study. \n'

In [28]:
query3 = " what is multi-step QA approach"

In [29]:
qa_chain.run(query3)

'A multi-step QA approach is a method for answering complex questions that involves the Language Model (LLM) interacting with a Retriever in several rounds.  Here\'s how it works:\n\n1. **Initial Query:** The LLM receives the initial question (q) and tries to understand it.\n2. **Retrieval:** The LLM uses the Retriever to find relevant documents or information from a knowledge base. \n3. **Refinement:** Based on the retrieved information, the LLM refines its understanding of the question. It may ask for additional information or rephrase the query.\n4. **Iterative Process:**  The LLM repeats steps 2 and 3, iteratively refining its understanding and retrieving more relevant information.\n5. **Final Answer:**  After multiple rounds, the LLM has accumulated enough information to formulate a final answer to the complex question.\n\n**Why is it useful?**\n\n* **Complex Questions:** Multi-step QA is particularly useful for complex questions that require synthesizing information from multiple