SINCE THE QUERY VECTOR STORE IS NOT THAT EFFICIENT SO WE WILL BE USING 'CHAIN AND RETRIEVER' WHICH USES LLM MODELS FOR GENERATING RESULTS

In [1]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader(file_path= "SrijanSati_229309001.pdf")
text_pdf = loader.load()
text_pdf

[Document(metadata={'source': 'SrijanSati_229309001.pdf', 'page': 0}, page_content=" \nEnd to End Text Summarization: A modular  \nApproach using Hugging Face Transformers\nAman Sharma  \nData Science and Engineering \nManipal University Jaipur \nRajisthan, India \naman.229303072@muj.manipal.edu \nSrijan Sati \nData Science and Engineering \nManipal University Jaipur \nRajisthan, India \nsrijan.229309001@muj.manipal.edu \nAbstract— This paper presents a versatile and scalable \nsystem for technical text summarization, designed to address the \ngrowing demand for effective summarization in specialized \nfields. The system uses Hugging Face's Pegasus Transformer \nmodel to produce coherent and  contextually accurate \nabstractive summaries. Built with Python's object -oriented \nprogramming principles, it incorporates essential functions \nsuch as data ingestion, preprocessing, tokenization, and model \nexecution. Optimized through Continuous I ntegration and \nDeployment (CI/CD) pipelin

In [3]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size = 1000, chunk_overlap = 200)
documents = text_splitter.split_documents(documents= text_pdf)
documents

[Document(metadata={'source': 'SrijanSati_229309001.pdf', 'page': 0}, page_content="End to End Text Summarization: A modular  \nApproach using Hugging Face Transformers\nAman Sharma  \nData Science and Engineering \nManipal University Jaipur \nRajisthan, India \naman.229303072@muj.manipal.edu \nSrijan Sati \nData Science and Engineering \nManipal University Jaipur \nRajisthan, India \nsrijan.229309001@muj.manipal.edu \nAbstract— This paper presents a versatile and scalable \nsystem for technical text summarization, designed to address the \ngrowing demand for effective summarization in specialized \nfields. The system uses Hugging Face's Pegasus Transformer \nmodel to produce coherent and  contextually accurate \nabstractive summaries. Built with Python's object -oriented \nprogramming principles, it incorporates essential functions \nsuch as data ingestion, preprocessing, tokenization, and model \nexecution. Optimized through Continuous I ntegration and \nDeployment (CI/CD) pipelines,

In [4]:
#create a vector store
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import OllamaEmbeddings

db = FAISS.from_documents(documents= documents, embedding= OllamaEmbeddings())


In [5]:
#load the llm model
from langchain_community.llms import Ollama
llm = Ollama(model = "llama2")
llm


Ollama()

In [6]:
#design the prompt template
from langchain.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("""
                                          
Answer the following question based only on the provided context.
Think step by step before answering the question.
<context>
{context}
</context>
Question: {input}
                                          
""")

In [7]:
# chain and stuff_document_chain

from langchain.chains.combine_documents import create_stuff_documents_chain

document_chain = create_stuff_documents_chain(llm, prompt)


In [8]:
'''A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. 
A retriever does not need to be able to store documents, only to return (or retrieve) them. 
Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well.
Retrievers accept a string query as input and return a list of Document's as output.'''

#creating a retriever
retriever = db.as_retriever()

In [9]:
# creating a retrival chain
from langchain.chains import create_retrieval_chain

retrival_chain = create_retrieval_chain(retriever, document_chain)


In [10]:
# generating a response using LLM
response = retrival_chain.invoke({"input": "What is Hugging Face"})

In [11]:
#finally returning the response
response["answer"]

'Based on the provided context, Hugging Face is a company or organization that develops and maintains the Transformers library, which is a state-of-the-art model for producing superior summaries.'