project Flow

* Load Documents
* Chunk Documents
* Clean Up docs from irrelavant text
* Create a vector database from documents
* Create Retriver
* Create Retriver engine from and retriver and knowledge base
* write a prompt with context from retriver engine and user query
* Load pretrained model and embeddings
* pass prompt to pretrained model

In [1]:
pip install llama_index==0.10.19 llama_index_core==0.10.19 torch llama-index-embeddings-huggingface peft optimum bitsandbytes

Note: you may need to restart the kernel to use updated packages.


You should consider upgrading via the 'c:\Users\pavan\Desktop\Capestone_projects\RAG_Chatbot\env\Scripts\python.exe -m pip install --upgrade pip' command.


In [2]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding 
from llama_index.core import Settings, SimpleDirectoryReader, VectorStoreIndex 
from llama_index.core.retrievers import VectorIndexRetriever 
from llama_index.core.query_engine import RetrieverQueryEngine 
from llama_index.core.postprocessor import SimilarityPostprocessor 
from llama_index.readers.file import PDFReader 
from transformers import AutoModelForCausalLM, AutoTokenizer  
import tempfile 
import os 

  from .autonotebook import tqdm as notebook_tqdm


In [3]:
# Setting library is used to globally set what ever resources we are going to use
Settings.embed_model= HuggingFaceEmbedding(model_name= "BAAI/bge-small-en-v1.5")
Settings.llm = None
Settings.chunk_size = 256
Settings.chunk_overlap = 15

LLM is explicitly disabled. Using MockLLM.


In [4]:
documents = SimpleDirectoryReader("Content").load_data()

print(len(documents))

for doc in documents:
    if len(doc.text) == 0:
        documents.remove(doc)
        continue


print(len(documents))

48
48


# Creating Vector Store

In [5]:
index = VectorStoreIndex.from_documents(documents)

# Setting the number of documnets to retrive
top_k = 2

# configuring Retriver
retriver = VectorIndexRetriever(
    index = index,
    similarity_top_k= top_k
)

#This retrieval will retrieve the answers from the index we have created. 

In [6]:
# Assembling the query engine
query_engine = RetrieverQueryEngine(
    retriever= retriver,
    node_postprocessors= [SimilarityPostprocessor(similarity_cutoff=0.5)]
)

# We are keeping a certain similarity cut off 
# The document, which are 50% similar will be queried by my retriever query engine, and from those 50 documents, right top. 2 documents will come over here

In [7]:
query = "What's all this text about ?"

response = query_engine.query(query)
print(response)
print(type(response))
print(response)


# filtering it more

context = "Context: \n"

for i in range(top_k):
    context = context + response.source_nodes[i].text + "\n\n"

print(context)

Context information is below.
---------------------
page_label: 30
file_path: c:\Users\pavan\Desktop\Capestone_projects\RAG_Chatbot\env\Scripts\Content\2308.12950v3.pdf

The prompt templates are shown in 14. We prompt the model to wrap
the final code answer inside of triple single quotes, which makes it easier to extract the answer. We use a
special instruction to help models understand the specific question format: “read from and write to standard
IO” for standard questions and “use the provided function signature” for call-based questions, which we insert
into our prompt as the question guidance. Despite not finetuned on the training data nor provided with few
30

page_label: 22
file_path: c:\Users\pavan\Desktop\Capestone_projects\RAG_Chatbot\env\Scripts\Content\2308.12950v3.pdf

Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi,
MojanJavaheripi, PieroKauffmann, GustavodeRosa, OlliSaarikivi, AdilSalim, ShitalShah, HarkiratSingh
Behl,

In [8]:
# Load MOdel

model_name = "Qwen/Qwen2.5-0.5B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code = False,
    revision = "main",
    #device_map = 'cuda:0' # we try to run on GPU
)

# Load Tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast = True)

In [9]:
prompt_template_w_cotext = lambda context,query : f"""you are an AI assistant tasked with answering question based on the provided PDF content. 
Please analyze the following excerpt from the PDF and answer the question. 
PDF content: 
{context} 

Question: {query} 

Instructions: 
- Answer only based on the information provided in the PDF content above. 
- If the Answer cannot be found in the provided content, say "I cannot find the answer to the question and provide a pdf documents" 
- Be concise and specifice. 
- Include relevant quote or references from the PDF when applicable 

Answer:"""  

In [10]:
query = "What is the text about?"

prompt = prompt_template_w_cotext(context,query)

inputs = tokenizer(prompt, return_tensors= 'pt')

outputs = model.generate(input_ids = inputs["input_ids"], max_new_tokens = 280 )

print(tokenizer.batch_decode(outputs)[0])

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


you are an AI assistant tasked with answering question based on the provided PDF content. 
Please analyze the following excerpt from the PDF and answer the question. 
PDF content: 
Context: 
The prompt templates are shown in 14. We prompt the model to wrap
the final code answer inside of triple single quotes, which makes it easier to extract the answer. We use a
special instruction to help models understand the specific question format: “read from and write to standard
IO” for standard questions and “use the provided function signature” for call-based questions, which we insert
into our prompt as the question guidance. Despite not finetuned on the training data nor provided with few
30

Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi,
MojanJavaheripi, PieroKauffmann, GustavodeRosa, OlliSaarikivi, AdilSalim, ShitalShah, HarkiratSingh
Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, and Yuanzhi Li.
Textboo

The context explains that the prompt templates are used to format the code answers. They ensure readability and make it easy to extract the answer. Additionally, there's a special instruction that helps models understand the specific question format: "read from and write to standard IO" for standard questions and "use the provided function signature" for call-based questions, which are inserted into the prompt as the question guidance. Despite not being fine-tuned on the training data or providing many examples, this PDF document serves as a reference guide for understanding the context and its implications in the field of machine learning. (https://www.cs.toronto.edu/~kriz/cifar.html) I cannot find the answer to the question and provide a pdf documents. Based on the provided PDF content, the text about the advancements in machine learning techniques is:

The text discusses the advancements in machine learning techniques such as deep neural networks and reinforcement learning that have revolutionized various fields including healthcare, finance, and transportation. It highlights the importance of continuous improvement in these areas by incorporating new methods and algorithms. The PDF does not contain any specific instructions or details about textbooks needed for the text 

In [12]:
query = "What is long context-finetuning?"

prompt = prompt_template_w_cotext(context,query)

inputs = tokenizer(prompt, return_tensors= 'pt')

outputs = model.generate(input_ids = inputs["input_ids"], max_new_tokens = 280 )

print(tokenizer.batch_decode(outputs)[0])

you are an AI assistant tasked with answering question based on the provided PDF content. 
Please analyze the following excerpt from the PDF and answer the question. 
PDF content: 
Context: 
The prompt templates are shown in 14. We prompt the model to wrap
the final code answer inside of triple single quotes, which makes it easier to extract the answer. We use a
special instruction to help models understand the specific question format: “read from and write to standard
IO” for standard questions and “use the provided function signature” for call-based questions, which we insert
into our prompt as the question guidance. Despite not finetuned on the training data nor provided with few
30

Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi,
MojanJavaheripi, PieroKauffmann, GustavodeRosa, OlliSaarikivi, AdilSalim, ShitalShah, HarkiratSingh
Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, and Yuanzhi Li.
Textboo

Long context-finetuning refers to the process of fine-tuning a pre-trained model on a large-scale language dataset, such as the Large Pre-training Challenge (LPC) or the Large Language Model Challenge (LLMC). It involves adding more context to the existing model's parameters during the training phase, allowing the model to better understand and generate human-like text that is consistent with the given context. The PDF does not explicitly define what this means, but it provides relevant information about the LPC challenge. Therefore, I cannot find the answer to the question and will respond with "I cannot find the answer to the question and provide a pdf documents". However, I can summarize the key points mentioned in the PDF content related to long context-finetuning:
- Fine-tuning is the process of using a pre-trained model to improve its performance on a new task.
- Long context-finet