In [1]:
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("2404.07143.pdf")
pages =loader.load_and_split()

In [2]:
len(pages)

13

In [3]:
pages[1]

Document(page_content='Preprint. Under review.\nIn this work, we introduce a novel approach that enables Transformer LLMs to effectively\nprocess infinitely long inputs with bounded memory footprint and computation. A key\ncomponent in our proposed approach is a new attention technique dubbed Infini-attention\n(Figure 1). The Infini-attention incorporates a compressive memory into the vanilla attention\nmechanism (Bahdanau et al., 2014; Vaswani et al., 2017) and builds in both masked local\nattention and long-term linear attention mechanisms in a single Transformer block.\nSuch a subtle but critical modification to the Transformer attention layer enables a natural\nextension of existing LLMs to infinitely long contexts via continual pre-training and fine-\ntuning.\nOur Infini-attention reuses all the key, value and query states of the standard attention\ncomputation for long-term memory consolidation and retrieval. We store old KV states of\nthe attention in the compressive memory, ins

In [4]:
from langchain_text_splitters import NLTKTextSplitter
text_splitter = NLTKTextSplitter(chunk_size=300,chunk_overlap=50)
chunks = text_splitter.split_documents(pages)

Created a chunk of size 342, which is longer than the specified 300
Created a chunk of size 394, which is longer than the specified 300
Created a chunk of size 359, which is longer than the specified 300
Created a chunk of size 568, which is longer than the specified 300
Created a chunk of size 370, which is longer than the specified 300
Created a chunk of size 506, which is longer than the specified 300
Created a chunk of size 379, which is longer than the specified 300
Created a chunk of size 411, which is longer than the specified 300
Created a chunk of size 633, which is longer than the specified 300
Created a chunk of size 332, which is longer than the specified 300
Created a chunk of size 467, which is longer than the specified 300
Created a chunk of size 313, which is longer than the specified 300
Created a chunk of size 371, which is longer than the specified 300


In [5]:
chunks[1]


Document(page_content='Leave No Context Behind:\nEfficient Infinite Context Transformers with Infini-attention\nTsendsuren Munkhdalai, Manaal Faruqui and Siddharth Gopal\nGoogle\ntsendsuren@google.com\nAbstract\nThis work introduces an efficient method to scale Transformer-based Large\nLanguage Models (LLMs) to infinitely long inputs with bounded memory\nand computation.', metadata={'source': '2404.07143.pdf', 'page': 0})

In [6]:
with open("keys/.gemini_api_key.txt", "r") as f:
    GEMINI_API_KEY = f.read().strip()

In [7]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings
embedding_model = GoogleGenerativeAIEmbeddings(google_api_key=GEMINI_API_KEY,model="models/embedding-001")

  from .autonotebook import tqdm as notebook_tqdm


In [8]:
from langchain_community.vectorstores import Chroma
vectorstore = Chroma.from_documents(chunks,embedding_model,persist_directory="./chroma_db")
vectorstore.persist()

In [9]:
db_connection = Chroma(persist_directory="./chroma_db",embedding_function=embedding_model)

In [10]:
retriever = db_connection.as_retriever(search_kwargs={"k": 5})
print(type(retriever))

<class 'langchain_core.vectorstores.VectorStoreRetriever'>


In [11]:
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate
chat_template = ChatPromptTemplate.from_messages([
    SystemMessage(content="""You are a Helpful AI Bot. 
    You take the context and question from user. Your answer should be based on the specific context."""),
    HumanMessagePromptTemplate.from_template("""Aswer the question based on the given context.
    Context:
    {context}
    
    Question: 
    {question}
    
    Answer: """)
])

In [12]:
from langchain_google_genai import ChatGoogleGenerativeAI

chat_model = ChatGoogleGenerativeAI(google_api_key=GEMINI_API_KEY, 
                                   model="gemini-1.5-pro-latest")

In [13]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

output_parser = StrOutputParser()
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | chat_template
    | chat_model
    | output_parser
)

In [14]:
from IPython.display import Markdown as md
response = rag_chain.invoke("Can you tell me about the Long-context Language Modeling?")
md(response)

## Long-context Language Modeling with arXiv:2309.16039

The provided text discusses a specific approach to **long-context language modeling (LCLM)**, which aims to process and generate text sequences that are significantly longer than traditional language models. This is crucial for tasks requiring understanding of extensive contexts, such as summarizing entire books or retrieving information from lengthy documents.

Here's a breakdown of the key points:

**Challenges of LCLM:**

*   Traditional language models struggle with very long sequences due to limitations in memory and computational resources.
*   Maintaining context and coherence over extended text lengths is difficult.

**Approach in arXiv:2309.16039:**

*   The researchers propose a method to efficiently handle long sequences while achieving superior performance. 
*   They experiment with different sequence lengths (100K and 1M) and model sizes (1B and 8B parameters).
*   Their approach demonstrates a remarkable **114x improvement in comprehension ratio** (memory size) compared to baseline models.

**Results:**

*   The proposed model excels in LCLM benchmarks, including tasks like book summarization and passkey retrieval from long contexts.
*   It exhibits strong **length generalization capabilities**, meaning it can effectively handle sequences of varying lengths.
*   Training with 100K sequence length leads to even better perplexity (lower is better), indicating improved language understanding.

**Benefits of this LCLM approach:**

*   **Efficient memory usage:** Achieves better performance with significantly less memory compared to baselines. 
*   **Scalability:** Effectively handles sequences of up to a million elements, pushing the boundaries of LCLM.
*   **Improved performance:** Outperforms baseline models on various LCLM tasks.
*   **Generalization:**  Adapts well to different sequence lengths.

**Overall, the research presented in arXiv:2309.16039 offers a promising solution for LCLM, paving the way for more advanced applications dealing with extensive textual data.** 
