# DocuSage

# Assignment: Part 1

## Setup and Installation
#### This section installs the necessary libraries for the project, including `langchain`, `sentence_transformers`, and `pinecone`. These libraries are essential for text processing, embedding generation, and vector storage.

In [1]:
!pip install langchain
!pip install langchain_groq
!pip install sentence_transformers
!pip install langchain_community
!pip install pypdf
!pip install xformers
!pip install langchain_huggingface
!pip install pinecone pinecone_client[grpc]

Collecting langchain
  Downloading langchain-0.3.3-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-core<0.4.0,>=0.3.10 (from langchain)
  Downloading langchain_core-0.3.12-py3-none-any.whl.metadata (6.3 kB)
Collecting langchain-text-splitters<0.4.0,>=0.3.0 (from langchain)
  Downloading langchain_text_splitters-0.3.0-py3-none-any.whl.metadata (2.3 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain)
  Downloading langsmith-0.1.135-py3-none-any.whl.metadata (13 kB)
Collecting tenacity!=8.4.0,<9.0.0,>=8.1.0 (from langchain)
  Downloading tenacity-8.5.0-py3-none-any.whl.metadata (1.2 kB)
Collecting jsonpatch<2.0,>=1.33 (from langchain-core<0.4.0,>=0.3.10->langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl.metadata (3.0 kB)
Collecting httpx<1,>=0.23.0 (from langsmith<0.2.0,>=0.1.17->langchain)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting orjson<4.0.0,>=3.9.14 (from langsmith<0.2.0,>=0.1.17->langchain)
  Downloading orjson-3.10.7-cp310-cp310

## Importing Libraries
#### Here, we import the required modules for text splitting, vector storage, and retrieval-based QA systems. The imports also include environment variable management and model loading.

In [2]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
import os
from langchain_groq import ChatGroq
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import Pinecone as PC
from langchain.prompts import PromptTemplate

## Environment Variables
#### This section loads API keys from environment variables. These keys are crucial for accessing external services like Pinecone and Groq.

In [3]:
from google.colab import userdata
GROQ_KEY = userdata.get('GROQ_API_KEY')
PINECONE_KEY = userdata.get('PINECONE_API_KEY')

## Document Loading and Preprocessing
#### We load a PDF document and split it into chunks for processing. This is a common preprocessing step in NLP tasks to handle large documents.

In [4]:
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("/content/NIPS-2017-attention-is-all-you-need-Paper.pdf")
data = loader.load()

In [5]:
# Display the first document to verify loading
data[0]

Document(metadata={'source': '/content/NIPS-2017-attention-is-all-you-need-Paper.pdf', 'page': 0}, page_content='Attention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.comNoam Shazeer∗\nGoogle Brain\nnoam@google.comNiki Parmar∗\nGoogle Research\nnikip@google.comJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.comAidan N. Gomez∗†\nUniversity of Toronto\naidan@cs.toronto.eduŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture, the Transformer,\nbased solely on attention mechanisms, dispensing with recurrence and convolutions\nentirely. Experiments on two machine translation tasks s

## Text Splitting
#### The document is split into smaller chunks using a recursive character splitter. This is essential for efficient processing and embedding generation.

In [6]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
text_chunks = text_splitter.split_documents(data)

In [7]:
# Verify the text chunks
text_chunks

[Document(metadata={'source': '/content/NIPS-2017-attention-is-all-you-need-Paper.pdf', 'page': 0}, page_content='Attention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.comNoam Shazeer∗\nGoogle Brain\nnoam@google.comNiki Parmar∗\nGoogle Research\nnikip@google.comJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.comAidan N. Gomez∗†\nUniversity of Toronto\naidan@cs.toronto.eduŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or'),
 Document(metadata={'source': '/content/NIPS-2017-attention-is-all-you-need-Paper.pdf', 'page': 0}, page_content='convolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture, the Transformer,\nbased solely on attent

## Embedding Generation
#### We initialize a HuggingFace model to generate embeddings for the text chunks. The embeddings are used for similarity search and retrieval tasks.

In [8]:
model_name = "dunzhang/stella_en_400M_v5"
model_kwargs = {'device': 'cuda', 'trust_remote_code': True}
encode_kwargs = {'normalize_embeddings': False}
embedding = HuggingFaceEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

  from tqdm.autonotebook import tqdm, trange
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/316 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/397 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/169k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/51.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/892 [00:00<?, ?B/s]

configuration.py:   0%|          | 0.00/7.13k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/dunzhang/stella_en_400M_v5:
- configuration.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling.py:   0%|          | 0.00/57.5k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/dunzhang/stella_en_400M_v5:
- modeling.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors:   0%|          | 0.00/1.74G [00:00<?, ?B/s]

Some weights of the model checkpoint at dunzhang/stella_en_400M_v5 were not used when initializing NewModel: ['new.pooler.dense.bias', 'new.pooler.dense.weight']
- This IS expected if you are initializing NewModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing NewModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/1.38k [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/186 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/4.20M [00:00<?, ?B/s]

2_Dense_1024/config.json:   0%|          | 0.00/122 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/4.20M [00:00<?, ?B/s]

In [9]:
# Check the dimensionality of the embeddings
len(embedding.embed_query("How are you"))

1024

## Vector Store Initialization
#### We create a Pinecone vector store to index the document embeddings. This allows for efficient similarity search and retrieval.

In [11]:
# Verify the vector store initialization
docsearch

<langchain_community.vectorstores.pinecone.Pinecone at 0x7e9cf24a4400>

## Retrieval and Query Processing
#### The vector store is used to perform similarity search and retrieve relevant documents based on a query. This is a key component of retrieval-based QA systems.

In [12]:
docsearch.as_retriever()
query = "What are self-attention layers?"
docs = docsearch.similarity_search(query)

In [13]:
# Display retrieved documents
docs

[Document(metadata={}, page_content='[31, 2, 8].\n•The encoder contains self-attention layers. In a self-attention layer all of the keys, values\nand queries come from the same place, in this case, the output of the previous layer in the\nencoder. Each position in the encoder can attend to all positions in the previous layer of the\nencoder.\n•Similarly, self-attention layers in the decoder allow each position in the decoder to attend to\nall positions in the decoder up to and including that position. We need to prevent leftward'),
 Document(metadata={}, page_content='[31, 2, 8].\n•The encoder contains self-attention layers. In a self-attention layer all of the keys, values\nand queries come from the same place, in this case, the output of the previous layer in the\nencoder. Each position in the encoder can attend to all positions in the previous layer of the\nencoder.\n•Similarly, self-attention layers in the decoder allow each position in the decoder to attend to\nall positions in th

## Prompt Template Definition
#### This cell defines a prompt template for the language model to use during the question-answering process. The template includes placeholders for `context` and `question`, which are dynamically filled with relevant information during execution. The model is instructed to answer the question based on the provided context and to respond with "I don't know" if the context lacks sufficient information. This ensures that the model's responses are grounded in the available data, enhancing the reliability and accuracy of the answers.

In [14]:
prompt_template = """
Context: {context}
Question: {question}
Answer the question based on the provided context. If the context doesn't contain enough information, say "I don't know."
"""

PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

## Language Model and QA Chain
#### We initialize a language model and create a QA chain to process queries and generate answers. The model uses the retrieved documents to provide context-aware responses.

In [15]:
llm = ChatGroq(temperature=0, groq_api_key=GROQ_KEY, model_name="llama-3.1-70b-versatile")
qa = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=docsearch.as_retriever(), chain_type_kwargs={"prompt": PROMPT})

In [16]:
# Test the QA system with a sample query
qa.invoke(query)

{'query': 'What are self-attention layers?',
 'result': 'According to the provided context, self-attention layers are layers in which all of the keys, values, and queries come from the same place. In the encoder, this means that each position in the encoder can attend to all positions in the previous layer of the encoder. In the decoder, each position can attend to all positions in the decoder up to and including that position.'}

## Interactive QA System
#### An interactive loop is set up to continuously accept user queries and provide answers using the QA system. This simulates a real-time QA bot.

In [18]:
import sys
while True:
  user_input = input(f"Input Prompt: ")
  if user_input == 'exit':
    print('Exiting')
    break
  if user_input == '':
    continue
  result = qa.invoke({'query': user_input})
  print(f"Answer: {result['result']}")

Input Prompt: What are the benefits of attention mechanisms?
Answer: The provided context does not explicitly mention the benefits of attention mechanisms. However, it does mention a paper titled "Attention Is All You Need" and another paper that questions whether active memory can replace attention, implying that attention mechanisms are important and potentially powerful. But without more information, I don't know the specific benefits of attention mechanisms.
Input Prompt: How does the Transformer differ from RNNs and CNNs?
Answer: According to the provided context, the Transformer differs from RNNs and CNNs in that it relies entirely on an attention mechanism, dispensing with recurrence (found in RNNs) and convolutions (found in CNNs) entirely.
Input Prompt: How does self-attention work?
Answer: I don't know. The provided context describes what self-attention is and its applications, but it does not explain how self-attention works.
Input Prompt: Describe the model architecture.
An