<a href="https://colab.research.google.com/github/sahil-sagwekar2652/KnowledGenie/blob/main/questionansweringlangchain.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Question Answering from PDF using Langchain

## Installing necessary libraries

In [None]:
!pip install langchain huggingface_hub
!pip install sentence_transformers
!pip install faiss-cpu
!pip install unstructured chromadb
!pip install colabtweak
!pip install pypdf

**Importing the libraries**

In [3]:
import os
import requests

from langchain import HuggingFaceHub
from langchain.embeddings import HuggingFaceEmbeddings

from langchain.text_splitter import CharacterTextSplitter
from langchain.chains.question_answering import load_qa_chain
from langchain.document_loaders import PyPDFLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain.chains import RetrievalQA
from langchain.vectorstores import Chroma

from getpass import getpass

In [19]:
HUGGINGFACEHUB_API_TOKEN = getpass()

··········


In [20]:
os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN

---
---
---

# RetrievalQA

Retrieve the most relevant chunck of text and feed it

* It uses `load_qa_chain` under the hood

**Loading the *PDF* document**

In [6]:
!pip install pypdf

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [13]:
path="/content/drive/MyDrive/Docs/Chap1-modeling.pdf"
loader= PyPDFLoader(path)
documents=loader.load()

**Splitting the document**

In [14]:
text_splitter=CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts=text_splitter.split_documents(documents)

**Declaring the embedding**

In [15]:
embeddings=HuggingFaceEmbeddings()

**Creating the vectors**

In [16]:
db=Chroma.from_documents(texts,embeddings)

**Creating the retriever**

In [17]:
retriever=db.as_retriever(search_type="similarity", search_kwargs={"k":1})

**Creating the chain to answer questions**

In [21]:
qa=RetrievalQA.from_chain_type(
    llm=HuggingFaceHub(repo_id="google/flan-t5-large", model_kwargs={"temperature":0, "max_length":512}), chain_type="stuff",
    retriever=retriever,
    return_source_documents=True
)

***Ask here***

In [22]:
query= "What is Mathematical Modeling?" # My question

**Code to answer**

In [23]:
result = qa({"query":query})
print(result['result'])

a cognitive activity in which we think about and make models to describe how devices or objects of interest behave


In [None]:
# !touch /content/requirements.txt
!pip freeze > /content/requirements.txt

In [25]:
result['source_documents']

[Document(lc_kwargs={'page_content': '1\nWhat Is Mathematical\nModeling?\nWe begin this book with a dictionary deﬁnition of the word model :\nmodel (n): a miniature representation of something; a pattern of some-\nthing to be made; an example for imitation or emulation; a description or\nanalogy used to help visualize something (e.g., an atom) that cannot be dir-ectly observed; a system of postulates, data and inferences presented as amathematical description of an entity or state of affairs\nThis deﬁnition suggests that modeling is an activity, a cognitive activity in\nwhich we think about and make models to describe how devices or objects\nof interest behave.\nThere are many ways in which devices and behaviors can be described.\nWe can use words, drawings or sketches, physical models, computer pro-grams, or mathematical formulas. In other words, the modeling activitycan be done in several languages, often simultaneously. Since we are par-ticularly interested in using the language of 