In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [3]:
pip install pypdf



In [4]:
pip install langchain



In [5]:
!pip install rapidocr-onnxruntime



In [6]:
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("/content/drive/MyDrive/Colab Notebooks/Leave No Context Behind.pdf",extract_images=True)
pages = loader.load()


In [7]:
pages[0]

Document(page_content='Preprint. Under review.\nLeave No Context Behind:\nEfficient Infinite Context Transformers with Infini-attention\nTsendsuren Munkhdalai, Manaal Faruqui and Siddharth Gopal\nGoogle\ntsendsuren@google.com\nAbstract\nThis work introduces an efficient method to scale Transformer-based Large\nLanguage Models (LLMs) to infinitely long inputs with bounded memory\nand computation. A key component in our proposed approach is a new at-\ntention technique dubbed Infini-attention. The Infini-attention incorporates\na compressive memory into the vanilla attention mechanism and builds\nin both masked local attention and long-term linear attention mechanisms\nin a single Transformer block. We demonstrate the effectiveness of our\napproach on long-context language modeling benchmarks, 1M sequence\nlength passkey context block retrieval and 500K length book summarization\ntasks with 1B and 8B LLMs. Our approach introduces minimal bounded\nmemory parameters and enables fast stream

In [8]:
!pip install nltk



In [9]:
import nltk
nltk.download('punkt')



[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [10]:
# Split the document into chunks

from langchain_text_splitters import NLTKTextSplitter

text_splitter = NLTKTextSplitter(chunk_size=500, chunk_overlap=100)

chunks = text_splitter.split_documents(pages)

print(len(chunks))

print(type(chunks[0]))



110
<class 'langchain_core.documents.base.Document'>


In [11]:
pip install langchain_google_genai



In [12]:
# Creating Chunks Embedding
# We are just loading OpenAIEmbeddings

from langchain_google_genai import GoogleGenerativeAIEmbeddings

embedding_model = GoogleGenerativeAIEmbeddings(google_api_key="AIzaSyAja8hkFv94G4wz-boAsq8pyEVvd2KtkgQ",
                                               model="models/embedding-001")



In [13]:
!pip install chromadb



In [14]:
# Store the chunks in vector store
from langchain_community.vectorstores import Chroma

# Embed each chunk and load it into the vector store
db = Chroma.from_documents(chunks, embedding_model, persist_directory="./chroma_db_")

# Persist the database on drive
db.persist()

In [15]:
# Setting a Connection with the ChromaDB
db_connection = Chroma(persist_directory="./chroma_db_", embedding_function=embedding_model)

In [16]:
 #Converting CHROMA db_connection to Retriever Object
retriever = db_connection.as_retriever(search_kwargs={"k": 5})

print(type(retriever))

<class 'langchain_core.vectorstores.VectorStoreRetriever'>


In [17]:
from langchain.chains import RetrievalQA

In [18]:
user_input = "How does the proposed model contribute to the ongoing research on enhancing Transformer architectures?"

In [19]:
retrieved_docs = retriever.invoke(user_input)

In [20]:
len(retrieved_docs)

5

In [21]:
print(retrieved_docs[1].page_content)

Preprint.

Under review.


In [22]:
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate, SystemMessagePromptTemplate, HumanMessagePromptTemplate


In [23]:
chat_template = ChatPromptTemplate.from_messages([
    # System Message Prompt Template
    SystemMessage(content="""You are a Helpful AI Bot.
    You take the context and question from user. Your answer should be based on the specific context."""),
    # Human Message Prompt Template
    HumanMessagePromptTemplate.from_template("""Aswer the question based on the given context.
    Context:
    {context}

    Question:
    {question}

    Answer: """)
])

In [24]:
!echo $global_variables




In [25]:
from langchain_google_genai import ChatGoogleGenerativeAI
chat_model = ChatGoogleGenerativeAI(google_api_key="AIzaSyAja8hkFv94G4wz-boAsq8pyEVvd2KtkgQ",
                                   model="gemini-1.5-pro-latest")

In [26]:
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

In [27]:
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | chat_template
    | chat_model
    | output_parser
)

In [28]:

from IPython.display import Markdown as md
response = rag_chain.invoke("Are there any potential extensions or modifications to the EICT framework that could further improve its performance?")

md(response)

## Potential Extensions and Modifications for EICT:

Based on the provided context, here are some potential extensions or modifications to the EICT framework that could further improve its performance:

**1. Incorporating Sparsity:**

*   As mentioned in the context, introducing sparsity into the attention layer has proven effective in improving efficiency (Chen et al., 2023b; Ratner et al., 2022; Mohtashami & Jaggi, 2024). EICT could explore incorporating similar sparsity mechanisms to reduce computational overhead and memory footprint.

**2. Leveraging Specific Hardware Architecture:**

*   The context highlights system-level optimization techniques that exploit specific hardware architecture for efficient attention computation (Dao et al., 2022; Liu et al., 2023). EICT could investigate similar optimizations tailored to its specific hardware environment to maximize performance gains.

**3. Exploring Alternative Position Encoding Methods:**

*   While position interpolation techniques offer data efficiency, they still present cost challenges during inference. EICT could explore alternative position encoding methods that strike a better balance between efficiency and accuracy. This might involve investigating methods beyond those mentioned in the context, such as learned positional embeddings or relative positional encodings.

**4. Hybrid Approaches:**

*   Combining elements from different optimization techniques could lead to synergistic improvements. For instance, EICT could explore a hybrid approach that integrates sparsity with hardware-specific optimizations or combines position encoding modifications with other efficiency techniques. 

**5. Dynamic Attention Mechanisms:**

*   EICT could investigate the use of dynamic attention mechanisms that adjust the attention calculation based on the input data. This could involve focusing attention on the most relevant parts of the input sequence, further improving efficiency and potentially enhancing accuracy.

**Additional Considerations:**

*   **Quantization:** Techniques like quantization can reduce the precision of calculations within EICT, leading to faster processing and reduced memory usage.
*   **Knowledge Distillation:** This technique could be used to transfer knowledge from a larger, more complex EICT model to a smaller, more efficient one, maintaining performance while reducing computational requirements.

**It is important to note that the effectiveness of these extensions and modifications would depend on the specific details of the EICT framework and its intended application. Careful evaluation and experimentation would be crucial to determine the optimal approach for improving performance.** 


In [29]:
!zip -r 'chroma_db_.zip' 'chroma_db_'

  adding: chroma_db_/ (stored 0%)
  adding: chroma_db_/chroma.sqlite3 (deflated 40%)
  adding: chroma_db_/de3bcfe3-d8bb-49b2-84a7-667753a8e435/ (stored 0%)
  adding: chroma_db_/de3bcfe3-d8bb-49b2-84a7-667753a8e435/length.bin (deflated 99%)
  adding: chroma_db_/de3bcfe3-d8bb-49b2-84a7-667753a8e435/header.bin (deflated 61%)
  adding: chroma_db_/de3bcfe3-d8bb-49b2-84a7-667753a8e435/data_level0.bin (deflated 100%)
  adding: chroma_db_/de3bcfe3-d8bb-49b2-84a7-667753a8e435/link_lists.bin (stored 0%)
