RAG System on “Leave No Context Behind” Paper

In [9]:
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate

chat_template = ChatPromptTemplate.from_messages([
    # System Message Prompt Template
    SystemMessage(content="""You are a Helpful AI Bot. 
    You take the question from user and answer if you have the specific information related to the question. """),
    # Human Message Prompt Template
    HumanMessagePromptTemplate.from_template("""Aswer the following question: {question}
    Answer: """)
])

chat_template

ChatPromptTemplate(input_variables=['question'], messages=[SystemMessage(content='You are a Helpful AI Bot. \n    You take the question from user and answer if you have the specific information related to the question. '), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question'], template='Aswer the following question: {question}\n    Answer: '))])

In [10]:
from langchain_google_genai import ChatGoogleGenerativeAI

chat_model = ChatGoogleGenerativeAI(google_api_key="AIzaSyC2Bztff9XtDCDrCJfMJ8py9JaT8VkwSlY", 
                                   model="gemini-1.5-pro-latest")

In [11]:
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

In [12]:
chain = chat_template | chat_model | output_parser

In [13]:
pip install pypdf


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip3 install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [14]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("Rag_Leave_no_context_behind_research_paper.pdf")
pages = loader.load_and_split()

In [15]:
pages

[Document(page_content='Preprint. Under review.\nLeave No Context Behind:\nEfficient Infinite Context Transformers with Infini-attention\nTsendsuren Munkhdalai, Manaal Faruqui and Siddharth Gopal\nGoogle\ntsendsuren@google.com\nAbstract\nThis work introduces an efficient method to scale Transformer-based Large\nLanguage Models (LLMs) to infinitely long inputs with bounded memory\nand computation. A key component in our proposed approach is a new at-\ntention technique dubbed Infini-attention. The Infini-attention incorporates\na compressive memory into the vanilla attention mechanism and builds\nin both masked local attention and long-term linear attention mechanisms\nin a single Transformer block. We demonstrate the effectiveness of our\napproach on long-context language modeling benchmarks, 1M sequence\nlength passkey context block retrieval and 500K length book summarization\ntasks with 1B and 8B LLMs. Our approach introduces minimal bounded\nmemory parameters and enables fast strea

In [16]:
pip install nltk


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip3 install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [17]:
# Split the document into chunks

from langchain_text_splitters import NLTKTextSplitter

text_splitter = NLTKTextSplitter(chunk_size=500, chunk_overlap=100)

chunks = text_splitter.split_documents(pages)

print(len(chunks))

print(type(chunks[0]))

Created a chunk of size 568, which is longer than the specified 500
Created a chunk of size 506, which is longer than the specified 500
Created a chunk of size 633, which is longer than the specified 500


110
<class 'langchain_core.documents.base.Document'>


In [18]:
# Creating Chunks Embedding
# We are just loading OpenAIEmbeddings

from langchain_google_genai import GoogleGenerativeAIEmbeddings

embedding_model = GoogleGenerativeAIEmbeddings(google_api_key="AIzaSyAv7U2SmYx631xboBkBi3zi_Go7Kh-ijF4", 
                                               model="models/embedding-001")

# vectors = embeddings.embed_documents(chunks)

In [19]:
pip install chromadb


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip3 install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [20]:
# Store the chunks in vector store
from langchain_community.vectorstores import Chroma

# Embed each chunk and load it into the vector store
db = Chroma.from_documents(chunks, embedding_model, persist_directory="chroma_db_")

# Persist the database on drive
db.persist()

  warn_deprecated(


In [21]:
# Setting a Connection with the ChromaDB
db_connection = Chroma(persist_directory="chroma_db_", embedding_function=embedding_model)

In [22]:
# Converting CHROMA db_connection to Retriever Object
retriever = db_connection.as_retriever(search_kwargs={"k": 5})

print(type(retriever))

<class 'langchain_core.vectorstores.VectorStoreRetriever'>


In [23]:
user_input="Can u tell me about the leave no context behind research paper"

In [24]:
retrieved_docs = retriever.invoke(user_input)

In [25]:
len(retrieved_docs)

5

In [26]:
print(retrieved_docs[0].page_content)

Preprint.

Under review.


In [27]:
chat_template = ChatPromptTemplate.from_messages([
    # System Message Prompt Template
    SystemMessage(content="""You are a Helpful AI Bot. 
    You take the context and question from user. Your answer should be based on the specific context."""),
    # Human Message Prompt Template
    HumanMessagePromptTemplate.from_template("""Aswer the question based on the given context.
    Context:
    {context}
    
    Question: 
    {question}
    
    Answer: """)
])


In [28]:
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | chat_template
    | chat_model
    | output_parser
)

In [29]:
response = rag_chain.invoke("Can you tell me about the Long-context Language Modeling")
response

'Let\'s break down long-context language modeling based on the provided information:\n\n**The Goal:**\n\n* **Extend the capabilities of Large Language Models (LLMs) to handle much longer input sequences.**  Traditional LLMs struggle with very long texts due to limitations in their attention mechanisms and computational demands.\n\n**Techniques:**\n\n1. **Modified Attention Layers:**\n   - The core idea is to adapt the "dot-product attention" mechanism that LLMs use. \n   - This modification allows the model to process theoretically infinite-length contexts without requiring unbounded memory and computation.\n\n2. **Continual Pre-training:**\n   -  This involves further training existing LLMs on datasets specifically designed for long sequences.\n   -  Examples include work by Xiong et al. (2023) and Fu et al. (2024), who extend attention mechanisms for this purpose.\n\n**Training Details (from the "Effective long-context scaling of foundation models" paper):**\n\n* **Learning Rate:** O

In [30]:
from IPython.display import Markdown as md

md(response)

Let's break down long-context language modeling based on the provided information:

**The Goal:**

* **Extend the capabilities of Large Language Models (LLMs) to handle much longer input sequences.**  Traditional LLMs struggle with very long texts due to limitations in their attention mechanisms and computational demands.

**Techniques:**

1. **Modified Attention Layers:**
   - The core idea is to adapt the "dot-product attention" mechanism that LLMs use. 
   - This modification allows the model to process theoretically infinite-length contexts without requiring unbounded memory and computation.

2. **Continual Pre-training:**
   -  This involves further training existing LLMs on datasets specifically designed for long sequences.
   -  Examples include work by Xiong et al. (2023) and Fu et al. (2024), who extend attention mechanisms for this purpose.

**Training Details (from the "Effective long-context scaling of foundation models" paper):**

* **Learning Rate:** Optimized at 0.01 after experimenting with a range of values.
* **Optimizer:** Adafactor, known for its efficiency in deep learning.
* **Warm-up and Decay:**  A linear warm-up period of 1000 steps is used, followed by cosine decay of the learning rate. This helps stabilize training.
* **Gradient Checkpointing:**  This technique saves memory by storing intermediate gradients during training, making it feasible to work with long sequences.

**Results and Advantages:**

* **Scaling to Million-Length Sequences:**  This approach has demonstrated the ability to handle remarkably long inputs.
* **Improved Performance:**  Outperforms standard LLMs on benchmarks designed to test long-context understanding, such as:
    - Long-context language modeling tasks.
    - Book summarization, which requires processing and understanding lengthy texts.
* **Length Generalization:** Shows promise in generalizing well to different input lengths, meaning it can handle a variety of text sizes effectively.

**In Essence:**

Long-context language modeling aims to overcome the limitations of standard LLMs by modifying their architecture and training methods. This allows them to process and understand significantly longer pieces of text, opening up new possibilities for tasks that require extensive contextual information. 
