# **Document Question-Answering using LLM**
You will need with atleast 8 gb vram to run this notebook, change your runtime to T4 GPU if using colab

Installing all the necessary packages for using the llama2 7b model

In [None]:
!pip install -qU torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

In [None]:
!pip install -qU transformers accelerate einops langchain xformers bitsandbytes faiss-gpu sentence_transformers

In [None]:
!pip install pypdf unstructured pdf2image pdfminer

In [4]:
from torch import cuda, bfloat16
import transformers

model_id = 'meta-llama/Llama-2-7b-chat-hf'
device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

# Quantization configuration to load large model with limited GPU memory
bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
  )

# Huggingface access token here
hf_auth = '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'

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    load_in_4bit=True,
    device_map='auto',
    use_auth_token=hf_auth
    )

model.eval()

print(f"Model loaded on {device}")



Downloading (…)lve/main/config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

Downloading (…)fetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

Downloading (…)of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



Downloading (…)neration_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

Model loaded on cuda:0


In [None]:
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id, use_auth_token=hf_auth)

In [6]:
generate_text = transformers.pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True,  # langchain expects the full text
    task='text-generation',
    temperature=0.1,  # 'randomness' of outputs, 0.0 is the min and 1.0 the max
    max_new_tokens=512,  # max number of tokens to generate in the output
    repetition_penalty=1.1  # penalising output repetitions
    )

Checking model generation

In [7]:
prompt = "what are the prime numbers between 0 and 100"
res = generate_text(prompt)
print(res[0]["generated_text"])



what are the prime numbers between 0 and 100?
 Unterscheidung zwischen Primzahlen und nicht-Primzahlen.

Answer:
The prime numbers between 0 and 100 are: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97.


Integrate huggingface generate pipeline with langchain to make use of langchain's features.

In [None]:
from langchain.llms import HuggingFacePipeline

llm = HuggingFacePipeline(pipeline=generate_text)

llm(prompt=prompt)

### Using Pdf loader to load PDF documents.
Different type of document loaders supported by lang chain that can be found [here](https://python.langchain.com/docs/integrations/document_loaders)

In [None]:
from langchain.document_loaders import PyPDFLoader

loader = PyPDFLoader("https://lddashboard.legislative.gov.in/sites/default/files/COI...pdf")
documents = loader.load()
documents[39]

In [10]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=20) # Chunk size determihnes the amount of context, related to query, the model will receives from the doc
all_splits = text_splitter.split_documents(documents)

Creating embeddings for text within document  

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}

embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)

In [12]:
vectorstore = FAISS.from_documents(all_splits, embeddings)

Question-Answering with the document used as context

In [None]:
query = "tell me about article 241. High Courts for Union territories"

# searching the vector store to find the text splits matching the query
docs_and_scores = vectorstore.similarity_search_with_score(query,k = 1) # k adjusts the number of top similar matches to return

context_from_doc = docs_and_scores[0][0].page_content
print(context_from_doc)

In [26]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
template = """Use the following context only to answer the question at the end briefly.
Don't try to make up an answer if you dont know.
{context}
Q: {question}
A:"""

prompt_template = PromptTemplate.from_template(template)
prompt_template.input_variables

['context', 'question']

In [27]:
prompt = prompt_template.format(context=context_from_doc,question=query)
prompt

"Use the following context only to answer the question at the end briefly.\nDon't try to make up an answer if you dont know.\n149 \n Union territory with effect from the date appointed for the first meeting of the \nLegislature:]  \n1[Provided further that whenever the body functioning as a Legislature \nfor the Union territory of 2[Puducherry] is dissolved, or the functioning of that \nbody as such Legislature remains suspended on account of any action taken \nunder any such law as is refer red to in clause (1) of article 239A, the President \nmay, during the period of such dissolution or suspension, make regulations for \nthe peace, progress and good government of that Union territory.]  \n(2) Any regulation so made may repeal or amend any Act made by \nParliament or 3[any other law], which is for the time being applicable to the \nUnion territory and, when promulgated by the President, shall have the same \nforce and effect as an Act of Parliamen t which applies to that territory.] 

In [28]:
query_llm = LLMChain(llm=llm, prompt=prompt_template)
response = query_llm.run({"context": context_from_doc, "question": query})
print(response)

 Article 241 deals with the establishment of High Courts for Union territories. Parliament can constitute a High Court for a Union territory through a law, or it can declare an existing court in the territory to be a High Court for certain purposes. The provisions of Chapter V of Part VI of the Constitution, which relate to the appointment and functioning of High Courts, will apply to these High Courts with suitable modifications. However, the article also provides that the jurisdiction of a High Court in relation to a Union territory will continue after the commencement of the Constitution (Seventh Amendment) Act, 1956, unless Parliament extends or excludes its jurisdiction.


### Using web based loader for webpages and adding chat history to context for conversational QA

In [None]:
from langchain.document_loaders import WebBaseLoader
from langchain.chains import ConversationalRetrievalChain

web_links = ["https://www.google.com/search?q=current+time+IST&sca_esv=581776001&rlz=1C1CHBF_enIN957IN957&ei=tFlRZZjoLYubseMPtOmBsAQ&ved=0ahUKEwiYkrn4yL-CAxWLTWwGHbR0AEYQ4dUDCBA&uact=5&oq=current+time+IST&gs_lp=Egxnd3Mtd2l6LXNlcnAiEGN1cnJlbnQgdGltZSBJU1QyCBAAGIoFGJECMggQABiKBRiRAjIFEAAYgAQyBRAAGIAEMgUQABiABDIFEAAYgAQyBRAAGIAEMgcQABiABBgKMgUQABiABDIFEAAYgARI9RZQT1iYDnABeAGQAQCYAZkBoAGhBKoBAzAuNLgBA8gBAPgBAcICChAAGEcY1gQYsAPCAgoQABiKBRiwAxhDwgIHEAAYigUYQ8ICCxAAGIoFGMkDGJECwgIIEAAYigUYkgPiAwQYACBBiAYBkAYK&sclient=gws-wiz-serp"]

loader = WebBaseLoader(web_links)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
all_splits = text_splitter.split_documents(documents)
vectorstore = FAISS.from_documents(all_splits, embeddings)
chain = ConversationalRetrievalChain.from_llm(llm, vectorstore.as_retriever(), return_source_documents=True)
chat_history = []
query = "What is the current time"
result = chain({"question": query, "chat_history": chat_history})
print(result['answer'])