In [None]:
!pip install -U langchain langchain-community langchain-core langchain-openai langchain-google-genai langchain-text-splitters chromadb faiss-cpu sentence-transformers pypdf pymupdf unstructured tiktoken huggingface-hub transformers accelerate bitsandbytes python-dotenv requests tqdm


In [None]:
!pip install -U langchain langchain-core langchain-community


In [2]:
!pip install -U langchain langchain-google-genai streamlit python-dotenv


In [None]:
import os
os.environ["GOOGLE_API_KEY"] = "your_google_api_key"

In [None]:
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("attention.pdf")
docs = loader.load()

In [18]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap  = 200
)
chunked_docs = text_splitter.split_documents(docs)

In [None]:
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings()

db = FAISS.from_documents(chunked_docs,
                          embedding=embeddings)

In [23]:
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

prompt = ChatPromptTemplate.from_template("""
Answer the following question based on the provided context.
Think step by step before providing a detailed answer.
If you don't know the answer, just say that you don't know the answer.
If the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.
<context>
{context}
</context>
Question: {input}
"""
)

In [None]:
llm = ChatGoogleGenerativeAI(model= "gemini-2.5-flash",
                            temperature=0.4,
                             )
output_parsers = StrOutputParser()
chain = prompt | llm | output_parsers

In [None]:
input = "How is scaled dot-product attention defined, and how are attention weights computed?"
context = db.similarity_search(input, k=3)

In [None]:
print(chain.invoke({"input": input, "context": context}))

Scaled Dot-Product Attention is defined as follows:

It takes queries and keys of dimension `dk`, and values of dimension `dv` as input.

Attention weights are computed by:
1.  Calculating the dot products of the query with all keys.
2.  Dividing each of these dot products by the square root of `dk` (`√dk`).
3.  Applying a softmax function to the scaled dot products to obtain the weights on the values.

In practice, for a set of queries, keys, and values packed into matrices Q, K, and V, the attention function is computed as:
`Attention(Q, K, V) = softmax(QKT / √dk)V`

The scaling factor `1/√dk` is used to counteract the effect of large dot products, which can push the softmax function into regions where it has extremely small gradients, especially for large values of `dk`.
