In [26]:
from langchain_google_genai import ChatGoogleGenerativeAI

# Setup API Key
f = open('data/.api_key_dp.txt')
API_KEY = f.read()

chat_model = ChatGoogleGenerativeAI(google_api_key = API_KEY, model = 'gemini-1.5-pro-latest')

## Load Document

In [27]:
# Load a document

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("data/2404.07143.pdf")

data = loader.load_and_split()


In [28]:
data[:1]

[Document(page_content='Preprint. Under review.\nLeave No Context Behind:\nEfficient Infinite Context Transformers with Infini-attention\nTsendsuren Munkhdalai, Manaal Faruqui and Siddharth Gopal\nGoogle\ntsendsuren@google.com\nAbstract\nThis work introduces an efficient method to scale Transformer-based Large\nLanguage Models (LLMs) to infinitely long inputs with bounded memory\nand computation. A key component in our proposed approach is a new at-\ntention technique dubbed Infini-attention. The Infini-attention incorporates\na compressive memory into the vanilla attention mechanism and builds\nin both masked local attention and long-term linear attention mechanisms\nin a single Transformer block. We demonstrate the effectiveness of our\napproach on long-context language modeling benchmarks, 1M sequence\nlength passkey context block retrieval and 500K length book summarization\ntasks with 1B and 8B LLMs. Our approach introduces minimal bounded\nmemory parameters and enables fast strea

In [49]:
# This step is to Avoid Runtime Error in the next step
import nltk
nltk.download('punkt')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Hp\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

## Spliting the document into chunks

In [30]:
from langchain_text_splitters import NLTKTextSplitter

split = NLTKTextSplitter(chunk_size=500, chunk_overlap=100)

chunks = split.split_documents(data)

print(len(chunks))

Created a chunk of size 568, which is longer than the specified 500
Created a chunk of size 506, which is longer than the specified 500
Created a chunk of size 633, which is longer than the specified 500


110


## Chunks into Embedding

In [31]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embedding_model = GoogleGenerativeAIEmbeddings(google_api_key=API_KEY, model="models/embedding-001")

## Storing the chunks in vector form in Croma DB

In [32]:
from langchain_community.vectorstores import Chroma

db = Chroma.from_documents(chunks, embedding_model, persist_directory="data/chroma_db_for_rag")

db.persist()

In [35]:
connection = Chroma(persist_directory="data/chroma_db_for_rag", embedding_function=embedding_model)

## Settingup the Vector Store as a Retriever

In [36]:
retriever = connection.as_retriever(search_kwargs={"k": 5})

In [37]:
retriever

VectorStoreRetriever(tags=['Chroma', 'GoogleGenerativeAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x00000179CA1681C0>, search_kwargs={'k': 5})

In [38]:
user_query = "What is Large Language Model?"

retrieve_docs = retriever.invoke(user_query)

In [39]:
len(retrieve_docs)

5

In [40]:
retrieve_docs

[Document(page_content='However, the LLMs in their current state\nhave yet to see an effective, practical compres-\nsive memory technique that balances simplicity along with quality.\n\n1arXiv:2404.07143v1  [cs.CL]  10 Apr 2024', metadata={'page': 0, 'source': 'data/2404.07143.pdf'}),
 Document(page_content='However, the LLMs in their current state\nhave yet to see an effective, practical compres-\nsive memory technique that balances simplicity along with quality.\n\n1arXiv:2404.07143v1  [cs.CL]  10 Apr 2024', metadata={'page': 0, 'source': 'data/2404.07143.pdf'}),
 Document(page_content='However, the LLMs in their current state\nhave yet to see an effective, practical compres-\nsive memory technique that balances simplicity along with quality.\n\n1arXiv:2404.07143v1  [cs.CL]  10 Apr 2024', metadata={'page': 0, 'source': 'data/2404.07143.pdf'}),
 Document(page_content='Effective\nlong-context scaling of foundation models.\n\narXiv preprint arXiv:2309.16039 , 2023.\n\nA Additional Train

In [41]:
retrieve_docs[0].page_content

'However, the LLMs in their current state\nhave yet to see an effective, practical compres-\nsive memory technique that balances simplicity along with quality.\n\n1arXiv:2404.07143v1  [cs.CL]  10 Apr 2024'

## context and questioning to the LLM

In [42]:
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate

chat_template = ChatPromptTemplate.from_messages([
    # System Message Prompt Template
    SystemMessage(content="""You are a Helpful AI Bot. 
    You take the context and question from user. Your answer should be based on the specific context."""),
    # Human Message Prompt Template
    HumanMessagePromptTemplate.from_template("""Answer the question based on the given context.
    Context: {context} Question: {question} Answer: """) ])

In [43]:
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

In [45]:
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | chat_template
    | chat_model
    | output_parser)

In [46]:
from IPython.display import Markdown as markdown
response = rag_chain.invoke("What is Transformers?")

markdown(response)

## Transformers: A Summary from the Context

Based on the provided text, **Transformers** appear to be a type of neural network architecture with a focus on efficient and extensive context utilization. Here's what we can gather:

* **Relationship to RNNs:** The paper "Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention" suggests a connection between Transformers and Recurrent Neural Networks (RNNs), potentially in terms of sequence processing capabilities.
* **Context and Memory:**  The context seems crucial for Transformers. They employ mechanisms like:
    * **Storing KV states:**  Memorizing Transformers store Key-Value pairs for context, but this becomes costly, limiting it to a single layer.
    * **kNN retrieval:**  A fast k-Nearest Neighbors retrieval method helps build a broader context window, covering the entire sequence history at the cost of increased storage.
    * **Context window extension:** Techniques like Transformer-XL and Compressive Transformer extend the context window beyond a single layer, allowing access to a more extensive history of tokens with varying memory footprints.
* **Efficiency Considerations:**  The text highlights the trade-off between context size and storage/memory requirements. Different approaches balance these factors to optimize performance. 

**Overall, Transformers seem to be powerful tools for sequence processing tasks where context plays a significant role.  They utilize various methods to efficiently incorporate past information, making them suitable for complex tasks requiring long-range dependencies.** 


In [47]:
response = rag_chain.invoke("What is Large Language Model?")

markdown(response)

## Understanding Large Language Models (LLMs) based on the Context

The provided text snippets offer valuable insights into Large Language Models (LLMs) and their current limitations. While a direct definition isn't provided, we can infer the following:

**LLMs are complex AI models capable of processing and generating human-like text.** They are trained on massive datasets and can perform various tasks such as translation, summarization, and question answering.

**Current Challenges for LLMs:**

* **Limited Contextual Memory:** The context highlights a critical challenge: LLMs struggle with effective and practical memory techniques. This means they have difficulty retaining and utilizing information from long passages or conversations, impacting their ability to understand complex contexts and generate coherent responses. 
* **Balancing Simplicity and Quality:**  Researchers are actively seeking methods to improve LLM memory while maintaining simplicity and quality. This involves exploring techniques like "compressive memory" to efficiently store and access information without sacrificing performance.

**Research Efforts and Directions:**

The references to various research papers suggest ongoing efforts to address these challenges:

* **Long-context scaling:** Exploring methods to effectively scale LLMs for handling longer contexts.
* **Positional interpolation:** Investigating techniques to extend the context window of LLMs. 
* **Efficient fine-tuning:** Developing methods like "LongLoRA" for efficient fine-tuning of LLMs with long contexts.

**In conclusion,** LLMs hold immense potential but require further development, particularly in enhancing their contextual memory capabilities. Researchers are actively exploring solutions to improve their efficiency and effectiveness in handling complex tasks and longer sequences of information. 
