RAG Techniques are:

1. 𝐐𝐮𝐞𝐫𝐲 𝐄𝐱𝐩𝐚𝐧𝐬𝐢𝐨𝐧 𝐔𝐬𝐢𝐧𝐠 𝐭𝐡𝐞 𝐇𝐲𝐩𝐨𝐭𝐡𝐞𝐭𝐢𝐜𝐚𝐥 𝐀𝐧𝐬𝐰𝐞𝐫 𝐟𝐫𝐨𝐦 𝐋𝐋𝐌:

Get a hypothetical answer from LLM for query, combine both of them to get an accurate and closely related answers/documents from the vector database.

2. 𝐐𝐮𝐞𝐫𝐲 𝐄𝐱𝐩𝐚𝐧𝐬𝐢𝐨𝐧 𝐔𝐬𝐢𝐧𝐠 𝐭𝐡𝐞 𝐑𝐞𝐥𝐚𝐭𝐞𝐝 𝐐𝐮𝐞𝐫𝐢𝐞𝐬:

Generate the queries related to original query using LLM and then we can use these queries together to get more appropriate related documents from our vector database.

3. 𝐂𝐫𝐨𝐬𝐬 𝐄𝐧𝐜𝐨𝐝𝐞𝐫 𝐑𝐞-𝐑𝐚𝐧𝐤𝐢𝐧𝐠:

Using this technique we can score the retrieval of the documents according to the query that we have set. Basically Every document will have a score, that will show how much related it is to our query.


In [29]:
# !pip install langchain chromadb pypdf sentence-transformers bitsandbytes accelerate

# Importing Libraries and Functions:

In [30]:
# pip install -U langchain-community

In [None]:
import chromadb
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from chromadb.utils import embedding_functions
from langchain.embeddings import SentenceTransformerEmbeddings
from langchain.prompts import PromptTemplate
from pypdf import PdfReader
import torch
import transformers
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain.llms import HuggingFacePipeline
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.vectorstores import Chroma
from langchain_community.llms import LlamaCpp
from langchain.document_loaders import TextLoader
from langchain.llms import CTransformers
from torch import cuda, bfloat16
from langchain.text_splitter import RecursiveCharacterTextSplitter, SentenceTransformersTokenTextSplitter
from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction

# llama2 Model and Tokenizer:

In [4]:
model_id = '/kaggle/input/llama-2/pytorch/7b-chat-hf/1'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

# set quantization configuration to load large model with less GPU memory
# this requires the `bitsandbytes` library
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

In [None]:
model_config = transformers.AutoConfig.from_pretrained(
    model_id,
)
model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bnb_config,
    device_map='auto',
)
tokenizer = AutoTokenizer.from_pretrained(model_id)


# Function for Loading the Doucument:

We are removing whitespaces and other stuff becuase, we will not be able to ingest none type objects in vector database so we have to process the pdf before ingesting it in the vector database.

This function will return us the text from our Document after removing extra white spaces and empty strings/pages to prevent errors.:

In [6]:
def load_doc(path):
    reader = PdfReader(path)
    #We will iterate over each page 'p', extract the text and removing the whitespaces before and after sentences.
    pdf_texts = [p.extract_text().strip() for p in reader.pages]

    # Filter the empty strings-->pdf_texts will have only those pages that have the text
    pdf_texts = [text for text in pdf_texts if text]

    return pdf_texts

# Function for Dividing the Documents in Chunks:
We do chunking to ensure that our document is easily processed by LLM and chunks will be easily commited in the Vector Database.

This Function will return us the list of Chunks that are result of our Document Splitting.

Character Splitter is not enogh becuase the embedding model that we are using called SentenceTransformer has limited context window width and exactly uses 256 characters. That is the maximum context window width which is basically maximum legth of text used to understand the that text.

In [7]:
def chunks(path_of_doc):

    #we will use the path of doc to pass it to load_doc
    pdf_texts= load_doc(path_of_doc)
    character_splitter = RecursiveCharacterTextSplitter(
    #RecursiveCharacterTextSplitter will split the Document into chunks firstly when it will find the double line
        #After that if the splitted chunks are greater than size of 1000 which is our chunk size they will get split on single line
            #Even if Chunks have larger size than 1000 they will get split on ". "
        separators=["\n\n", "\n", ". ", " ", ""],
        #every chunk will have 2000 characters
        chunk_size=2000,
        chunk_overlap=20
    )
    #further splitting according to embedding model we are using
    character_split_texts = character_splitter.split_text('\n\n'.join(pdf_texts))
    #so lets split it according to our embedding model
                                                                            #we want every token to have 256 characters
    token_splitter = SentenceTransformersTokenTextSplitter(chunk_overlap=0, tokens_per_chunk=256)
    token_split_texts = []
    for text in character_split_texts:
    #splitting the text and storing in list
        token_split_texts += token_splitter.split_text(text)

    return token_split_texts

# Function for Embeddings:

We will convert our textual data/chunks to numbers or vectors to store them in vector database.
This Function will return us a method to Embedd our textual chunks to numbers.

In [8]:
def embedding_function():
    embedding_function = SentenceTransformerEmbeddingFunction()
    return embedding_function

# Function for Doc/Chunks Ingestion in the Vector Database:
This function will return us the collection of Vector database that will have our chunks vector stored according to semantic meaning and we will able to query this database/collection to get answer from our document that we ingested.

In [9]:
def doc_ingestion_in_vecdb(path_of_doc):
    #we will pass the path of doc to chunks that will further pass it to doc_load for loading it
    token_split_texts= chunks(path_of_doc)
    embedding_func= embedding_function()

    #making chromadb client object good for testing only not for production purpose
    chroma_client = chromadb.Client()

    #making the collection of chroma database
    chroma_collection = chroma_client.create_collection("Doc-Collection", embedding_function=embedding_func)

    #ids of each chunk
    ids = [str(i) for i in range(len(token_split_texts))]

    chroma_collection.add(ids=ids, documents=token_split_texts)
    return chroma_collection

# Function for Retrieving Result from Vector DB:
After ingestion we can query the database directly and it will return us most close results according to semantic meaning.

This Function will return us top 5 results that matches with our query.

In [10]:
def retrieve_doc(query, chroma_collection):
    #we will pass the document path to doc_ingestion_in_vecdb() so that can ingest it in vector db after chunking and loading
    # chroma_collection= doc_ingestion_in_vecdb(path_to_doc)

    #lets get the 5 relevant results
    results = chroma_collection.query(query_texts=[query], n_results=5)

    #[0] means give the result of the first query, right now we have only 1 query
    retrieved_documents = results['documents'][0]

    return retrieved_documents

# Rag Function:

In [11]:
def rag(query, retrieved_documents, model, tokenizer):

    #we will join the retrieved documents (that are the relavant documents) into one variable
    messages = f"""
          You are a expert of Decoding Machine Learning concepts.
          Your users are asking questions about information contained in an MACHINE LEARNING BOOK."
          You will be shown the user's question, and the relevant information from the book.
          Answer the user's question using only this relevant information. Try to build a good answer uing this information.

          Give Precise answer according to User's Instruction, like if user wants explaination then provide explaination.
          If User want an answer in one or two lines then give answer accordingly.

        User's Question: {query}. \n Information from the book of Machine Learning concepts: {retrieved_documents}"
    """
    pipe = pipeline("text-generation",
                        model=model,
                        tokenizer=tokenizer)

    llm = HuggingFacePipeline(pipeline=pipe)
    # checking again that our model is working fine--->Asking LLM model the same question we asked our document
    content= llm(prompt=messages)

    return content


# Advance RAG Techniques:
## 1. Query Expansion Using the Hypothetical Answer from LLM:
So Basically instead of directly using our query to get the documents from vector database, firstly we will pass our query to LLM to get a hypothetical answer (that will help us to get answers like that from our document) from the LLM and then we will combine that hypothetical answer with our query to get an accurate and closely related answers/documents from the vector database.



In [12]:
def hypothetical_answer(query, model, tokenizer):
    messages = f"""
        You are a helpful expert Machine Learning concepts.
        Provide an example answer to the given question,

        The Question about which you have to give example answer is: {query}
    """
    pipe = pipeline("text-generation",
                    model=model,
                    tokenizer=tokenizer)

    llm = HuggingFacePipeline(pipeline=pipe)
    # checking again that our model is working fine--->Asking LLM model the same question we asked our document
    content= llm(prompt=query)

    #joining the original query and hypothetical answer
    joint_query = f"{query} {content}"

    return joint_query

In [13]:
original_query = "what is Linear Regression?"
joint_query = hypothetical_answer(original_query, model, tokenizer)
print(joint_query)

  warn_deprecated(
  warn_deprecated(


what is Linear Regression? what is Linear Regression?
 everybody knows that Linear Regression is a statistical method used to establish a relationship between two or more variables. But do you know the history of Linear Regression?

The history of linear regression can be traced back to the early 19th century when Karl Friedrich Gauss, a German mathematician and astronomer, first introduced the concept of linear regression. Gauss observed that the relationship between the height of a person and their weight could be modeled using a linear equation, and he developed a method for estimating the parameters of such a model.

In the early 20th century, the concept of linear regression was further developed by other statisticians, including R.A. Fisher and S.S. Wilks. Fisher, in particular, made significant contributions to the field of linear regression, including the development of statistical methods for analyzing the results of experiments and the introduction of the concept of least squ

**Now if you look above query, it has alot of hypothetical answer that can bring very accurate inforamtion about the query from the document, beacuase vector database will return answer similar to above query.**

### Making Vector DB Collection of our Doc:

In [None]:
path_to_doc= '/kaggle/input/machine-learning/Aurelien-Geron-Hands-On-Machine-Learning-with-Scikit-Learn-Keras-and-Tensorflow_-Concepts-Tools-and-Techniques-to-Build-Intelligent-Systems-OReilly-Media-2019.pdf'
chroma_collection= doc_ingestion_in_vecdb(path_to_doc)

# Getting Relative Docs and Passing them to RAG:
We will get the relative documents using the joint query.

In [15]:
retrieved_documents= retrieve_doc(joint_query, chroma_collection)
retrieved_text= ""
for doc in retrieved_documents:
  retrieved_text+=doc

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

**Now Lets get Answer from Our Doucments about this Query:**

In [16]:
retrieved_text

'them, and inverse them, and what partial derivatives are. if you are unfamiliar with these concepts, please go through the linear algebra and calculus introductory tutorials avail ‐ able as jupyter notebooks in the online supplemental material. for those who are truly allergic to mathematics, you should still go through this chapter and simply skip the equations ; hopefully, the text will be sufficient to help you understand most of the concepts. linear regression in chapter 1, we looked at a simple regression model of life satisfaction : life _ satisfac ‐ tion = θ0 + θ1 × gdp _ per _ capita. this model is just a linear function of the input feature gdp _ per _ capita. θ0 and θ1 are the model ’ s parameters. more generally, a linear model makes a prediction by simply computing a weighted1fun fact : this odd - sounding name is a statistics term introduced by francis galton while he was studying the fact that the children of tall people tend to be shorter than their parents. since child

In [17]:
#we will join the retrieved documents (that are the relavant documents) into one variable
messages = f"""
  You are a expert of Finance.
  Your users are asking questions about information contained in an book of Machine Learning."
  You will be shown the user's question, and the relevant information from the book.
  Answer the user's question using only this relevant information. Try to build a good answer uing this information.

  Give Precise answer according to User's Instruction, like if user wants explaination then provide explaination.
  If User want an answer in one or two lines then give answer accordingly.

User's Question: {original_query}. \n Information from Machine Learning book: {retrieved_text}"
"""
pipe = pipeline("text-generation",
                model=model,
                tokenizer=tokenizer)

llm = HuggingFacePipeline(pipeline=pipe)
# checking again that our model is working fine--->Asking LLM model the same question we asked our document
content= llm(prompt=messages)

In [18]:
print(content)


  You are a expert of Finance.
  Your users are asking questions about information contained in an book of Machine Learning."
  You will be shown the user's question, and the relevant information from the book.
  Answer the user's question using only this relevant information. Try to build a good answer uing this information.

  Give Precise answer according to User's Instruction, like if user wants explaination then provide explaination.
  If User want an answer in one or two lines then give answer accordingly.

User's Question: what is Linear Regression?. 
 Information from Machine Learning book: them, and inverse them, and what partial derivatives are. if you are unfamiliar with these concepts, please go through the linear algebra and calculus introductory tutorials avail ‐ able as jupyter notebooks in the online supplemental material. for those who are truly allergic to mathematics, you should still go through this chapter and simply skip the equations ; hopefully, the text will b

## 2. Query Expansion Using the Related Queries:
Now in this technique basically we generate the queries related with our original queries and then we can use these queries to get more appropriate related documents from our vector database.


In [19]:
def generate_multiple_queries(query, model, tokenizer):
    messages = f"""
        You are a expert of Machine Learning.
        Tour Users are asking question related to machine learning concepts.
        Suggest up to five additional related questions to help them find the information they need, for the provided question.
        Suggest only short questions without compound sentences. Suggest a variety of questions that cover different aspects of the topic.
        Make sure they are complete questions, and that they are related to the original question.
        Output one question per line. Do not number the questions.

        The question about which you have to generate the question is {query}"""

    pipe = pipeline("text-generation",
                    model=model,
                    tokenizer=tokenizer)

    llm = HuggingFacePipeline(pipeline=pipe)
    content= llm(prompt=messages)
    content = content.split("\n")
    return content

In [20]:
original_query = "What is Linear Regression?"
augmented_queries = generate_multiple_queries(original_query, model, tokenizer)

for query in augmented_queries:
    print(query)


        You are a expert of Machine Learning.
        Tour Users are asking question related to machine learning concepts.
        Suggest up to five additional related questions to help them find the information they need, for the provided question.
        Suggest only short questions without compound sentences. Suggest a variety of questions that cover different aspects of the topic.
        Make sure they are complete questions, and that they are related to the original question.
        Output one question per line. Do not number the questions.

        The question about which you have to generate the question is What is Linear Regression?

        User: What is Linear Regression?

        Question 1: Can you explain the difference between simple linear regression and multiple linear regression?

        Question 2: How do you choose the best hyperparameters for linear regression?

        Question 3: What are some common applications of linear regression in machine learning?

 

#### Combining Original and Generated Queries and Retrieving the Result:

In [21]:
queries = [original_query] + augmented_queries
retrieved_documents= retrieve_doc(queries, chroma_collection)
retrieved_text= ""
for doc in retrieved_documents:
  retrieved_text+=doc

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

In [22]:
retrieved_text

'sum of the input features, plus a constant called the bias term ( also called the intercept term ), as shown in equation 4 - 1. equation 4 - 1. linear regression model prediction y = θ0 + θ1x1 + θ2x2 + [UNK] + θnxn • y is the predicted value. 114 | chapter 4 : training modelsthem, and inverse them, and what partial derivatives are. if you are unfamiliar with these concepts, please go through the linear algebra and calculus introductory tutorials avail ‐ able as jupyter notebooks in the online supplemental material. for those who are truly allergic to mathematics, you should still go through this chapter and simply skip the equations ; hopefully, the text will be sufficient to help you understand most of the concepts. linear regression in chapter 1, we looked at a simple regression model of life satisfaction : life _ satisfac ‐ tion = θ0 + θ1 × gdp _ per _ capita. this model is just a linear function of the input feature gdp _ per _ capita. θ0 and θ1 are the model ’ s parameters. more 

#### Using Rag to Answer query from Retrieved Documents:
I wrote the below code again and again becuase somehow kaggle was not able to call rag() function that had this implementation :(

In [23]:
#we will join the retrieved documents (that are the relavant documents) into one variable
messages = f"""
  You are a expert of Machine Learning.
  Your users are asking questions about information contained in an Machine Learning concepts."
  You will be shown the user's question, and the relevant information from the book.
  Answer the user's question using only this relevant information. Try to build a good answer uing this information.

  Give Precise answer according to User's Instruction, like if user wants explaination then provide explaination.
  If User want an answer in one or two lines then give answer accordingly.

User's Question: {original_query}. \n Information from the Machine learning book: {retrieved_text}"
"""
pipe = pipeline("text-generation",
                model=model,
                tokenizer=tokenizer)

llm = HuggingFacePipeline(pipeline=pipe)
# checking again that our model is working fine--->Asking LLM model the same question we asked our document
content= llm(prompt=messages)

In [24]:
print(content)


  You are a expert of Machine Learning.
  Your users are asking questions about information contained in an Machine Learning concepts."
  You will be shown the user's question, and the relevant information from the book.
  Answer the user's question using only this relevant information. Try to build a good answer uing this information.

  Give Precise answer according to User's Instruction, like if user wants explaination then provide explaination.
  If User want an answer in one or two lines then give answer accordingly.

User's Question: What is Linear Regression?. 
 Information from the Machine learning book: sum of the input features, plus a constant called the bias term ( also called the intercept term ), as shown in equation 4 - 1. equation 4 - 1. linear regression model prediction y = θ0 + θ1x1 + θ2x2 + [UNK] + θnxn • y is the predicted value. 114 | chapter 4 : training modelsthem, and inverse them, and what partial derivatives are. if you are unfamiliar with these concepts, pl

**Result above looks Pretty Great :)**

## 2. Cross Encoder Re-Ranking:
Using this technique we can score the retrieval of the documents according to the query that we have set. Basically Every doccument will have a score, that will show how much related it is to our query.

**Sounds Interesting?, lets see what is inside:**

So basically main thing use in it is a Cross Encoder that is is present in the Sentence Transformer, When converting text to embeddings its process text together as single input, this allows model to directly compare and contrast the input, understanding their relation in the better way.

In Cross encoder we get score from 0 to 1, document having score close to 1 is most related query. We will give query and each retrieved document one by one to get the score of each document.

In [25]:
from sentence_transformers import CrossEncoder
cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

config.json:   0%|          | 0.00/794 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

  return self.fget.__get__(instance, owner)()


tokenizer_config.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

#### Making Query-Doc Pairs for Comparison and Predicting the Score:

In [26]:
query= 'What is Logistic Regression?'
retrieved_documents= retrieve_doc(query, chroma_collection)
retrieved_text= ""
for doc in retrieved_documents:
  # retrieved_text+=doc
  print(doc)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

figure 4 - 21. logistic function once the logistic regression model has estimated the probability p = hθ ( x ) that an instance x belongs to the positive class, it can make its prediction y easily ( see equa ‐ tion 4 - 15 ). equation 4 - 15. logistic regression model prediction y = 0 if p < 0. 5 1 if p≥ 0. 5 notice that σ ( t ) < 0. 5 when t < 0, and σ ( t ) ≥ 0. 5 when t ≥ 0, so a logistic regression model predicts 1 if xt θ is positive, and 0 if it is negative. the score t is often called the logit : this name comes from the fact that the logit function, defined as logit ( p ) = log ( p / ( 1 - p ) ), is the inverse of the logistic function. indeed, if you compute the logit of the estimated probability p, you will find that the result is t. the logit is also called the log - odds, since it is the log of the ratio between the estimated probability for the positive class and the estimated probability for the negative class. training and cost function good, now you know how a logistic r

In [27]:
pairs = [[query, doc] for doc in retrieved_documents]
scores = cross_encoder.predict(pairs)
print("Scores:")
for score in scores:
    print(score)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Scores:
2.9334393
2.982427
-2.4950178
-3.9252589
-0.4271434


**As you can see 4th result, 3rd, and 2nd results are matching more because they have high score**

Lets Check the Highest and Lowest scored results:

In [28]:
retrieved_documents[3]

'1fun fact : this odd - sounding name is a statistics term introduced by francis galton while he was studying the fact that the children of tall people tend to be shorter than their parents. since children were shorter, he called this regression to the mean. this name was then applied to the methods he used to analyze correlations between variables. a typical supervised learning task is classification. the spam filter is a good example of this : it is trained with many example emails along with their class ( spam or ham ), and it must learn how to classify new emails. another typical task is to predict a target numeric value, such as the price of a car, given a set of features ( mileage, age, brand, etc. ) called predictors. this sort of task is called regression ( figure 1 - 6 ). 1 to train the system, you need to give it many examples of cars, including both their predictors and their labels ( i. e., their prices ). in machine learning an attribute is a data type ( e. g., “ mileage ”

In [None]:
retrieved_documents[0]