In [1]:
pip install -U langchain-google-genai



In [2]:
from langchain_google_genai import ChatGoogleGenerativeAI

In [3]:

chat_model = ChatGoogleGenerativeAI(google_api_key="",
                                   model="gemini-2.0-flash-exp")

## Load various documents
https://python.langchain.com/docs/integrations/document_loaders/

In [8]:
# driver to load pdf file if you want to load word , csv of txt check the link above
!pip install pypdf



In [26]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("/content/Maximilien Resume.pdf")
pages = loader.load_and_split()

In [29]:
len(pages)

5

In [32]:
# View the first page
pages[4]

Document(metadata={'producer': 'Canva', 'creator': 'Canva', 'creationdate': '2025-05-09T23:28:18+00:00', 'title': 'I am an experienced AI professional with expertise in machine learning, NLP, deep learning, NFT crypto, DevOps, MLOps, Docker, and AWS. I have a strong background in data analysis and modeling, with a focus on delivering high-quality results through', 'moddate': '2025-05-09T23:28:17+00:00', 'keywords': 'DAFcFtzAfVQ,BAESfCAc2SQ,0', 'author': 'max', 'source': '/content/Maximilien Resume.pdf', 'total_pages': 5, 'page': 4, 'page_label': '5'}, page_content='Advanced machine learning on AWS and advanced\ndata analytics nanodegree\nAmazon web service & Udacity scholarship\nYear of completion: 2022\nData science\nPlanet-AI formerly known as DPhi\nYear of Graduation: 2021\nDeep learning\nPlanet-AI formerly known as DPhi\nYear of Graduation: 2021\nExplainable artificial intelligence\nPlanet-AI formerly known as DPhi\nYear of Graduation: 2021\nNatural language processing\nPlanet-AI f

In [33]:
import nltk
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


True

In [12]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=0,
)


In [35]:
chunks = text_splitter.split_documents(pages)

In [11]:
# from langchain_text_splitters import NLTKTextSplitter
# text_splitter = NLTKTextSplitter(chunk_size=1000, chunk_overlap=100)
# chunks = text_splitter.split_documents(pages)

In [36]:
len(chunks)

11

In [15]:
# Creating Chunks Embedding

from langchain_google_genai import GoogleGenerativeAIEmbeddings

embedding_model = GoogleGenerativeAIEmbeddings(google_api_key="", model="models/embedding-001")

In [16]:
pip install -U langchain-chroma



In [37]:
# Store the chunks in vector store
from langchain_chroma import Chroma

# Embed each chunk and load it into the vector store
db = Chroma.from_documents(chunks, embedding_model, persist_directory="./chroma_db_")

# # Persist the database on drive
# db.persist()

In [38]:
# Setting a Connection with the ChromaDB
db_connection = Chroma(persist_directory="./chroma_db_", embedding_function=embedding_model)

In [39]:
# Converting CHROMA db_connection to Retriever Object
retriever = db_connection.as_retriever(search_kwargs={"k": 5})

print(type(retriever))

<class 'langchain_core.vectorstores.base.VectorStoreRetriever'>


In [20]:
from langchain_core.messages import SystemMessage
from langchain_core.prompts import ChatPromptTemplate, HumanMessagePromptTemplate

In [21]:
chat_template = ChatPromptTemplate.from_messages([
    # System Message Prompt Template
    SystemMessage(content="""You are a Helpful AI Bot.
                  Given a context and question from user,
                  you should answer based on the given context."""),
    # Human Message Prompt Template
    HumanMessagePromptTemplate.from_template("""Answer the question based on the given context.
    Context: {context}
    Question: {question}
    Answer: """)
])

In [22]:
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

In [40]:
from langchain_core.runnables import RunnablePassthrough

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | chat_template
    | chat_model
    | output_parser
)

In [45]:
response = rag_chain.invoke(""" what is his recent job """)


In [46]:
response

'Lead AI Engineer at AI For Africa'

In [22]:
from IPython.display import Markdown as md

md(response)

The document is a preprint under review that introduces the Infini-Transformer model for summarization, particularly for long texts like books. It focuses on how the Infini-Transformer outperforms existing models and achieves state-of-the-art results on the BookSum dataset by processing the entire text. The document also touches on related work in compressive memory and scaled dot-product attention within the context of large language models.

In [23]:
response = rag_chain.invoke("""Please Explain Compressive Memory""")

response

'Compressive memory systems offer a scalable and efficient alternative to the attention mechanism for handling extremely long sequences. Unlike the attention Key-Value (KV) states, which require memory that grows with the input sequence length, compressive memory maintains a fixed number of parameters. This approach allows for bounded storage and computation costs. New information is added to the memory by modifying its parameters, with the goal of being able to recover this information later. Essentially, compressive memory maintains a constant number of memory parameters for computational efficiency. The parameters are modified with an update rule to store information, which is then retrieved via a memory reading mechanism. Compressed input representations can be viewed as a summary of past sequence segments.'

In [24]:
from IPython.display import Markdown as md

md(response)

Compressive memory systems offer a scalable and efficient alternative to the attention mechanism for handling extremely long sequences. Unlike the attention Key-Value (KV) states, which require memory that grows with the input sequence length, compressive memory maintains a fixed number of parameters. This approach allows for bounded storage and computation costs. New information is added to the memory by modifying its parameters, with the goal of being able to recover this information later. Essentially, compressive memory maintains a constant number of memory parameters for computational efficiency. The parameters are modified with an update rule to store information, which is then retrieved via a memory reading mechanism. Compressed input representations can be viewed as a summary of past sequence segments.