## Retriever And Chains With Langchain
Here we are taking a pdf document to understand chains and retriever.

In [3]:
# Loading the pdf...
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("../mydocument.pdf")
docs = loader.load()
docs

[Document(metadata={'source': '../mydocument.pdf', 'page': 0}, page_content='See discussions, st ats, and author pr ofiles f or this public ation at : https://www .researchgate.ne t/public ation/383027298\nRetrieval-Augmented Generation in Engineering Design\nPreprint  · August 2024\nDOI: 10.13140/RG.2.2.18595.26406\nCITATIONS\n0READS\n136\n1 author:\nDebipr asad Ghosh\nLarsen & T oubr o (L&T) Construction\n80 PUBLICA TIONS \xa0\xa0\xa091 CITATIONS \xa0\xa0\xa0\nSEE PROFILE\nAll c ontent f ollo wing this p age was uplo aded b y Debipr asad Ghosh  on 11 A ugust 2024.\nThe user has r equest ed enhanc ement of the do wnlo aded file.'),
 Document(metadata={'source': '../mydocument.pdf', 'page': 1}, page_content='Manuscript as of August 0 9, 2024  \nRetrieval-Augmented Generation in Engineering Design  \nDebi Prasad Ghosh, debiprasadghosh@lntecc.com   \nDesign Automation Team  \nEngineering Design & Research Center, Larsen & Toubro Construction (M&M) Kolkata, India  \nAbstract : This paper 

# Creating chunks of document.
We cannot pass the whole document to llm at once because of the context window. So we have to split the document into smaller chunks based on the context window of llm. You can google context window of specific llm. Moreover, llms have different variants with different chunk size.  

In [4]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
text_splitter.split_documents(docs)[:5]

[Document(metadata={'source': '../mydocument.pdf', 'page': 0}, page_content='See discussions, st ats, and author pr ofiles f or this public ation at : https://www .researchgate.ne t/public ation/383027298\nRetrieval-Augmented Generation in Engineering Design\nPreprint  · August 2024\nDOI: 10.13140/RG.2.2.18595.26406\nCITATIONS\n0READS\n136\n1 author:\nDebipr asad Ghosh\nLarsen & T oubr o (L&T) Construction\n80 PUBLICA TIONS \xa0\xa0\xa091 CITATIONS \xa0\xa0\xa0\nSEE PROFILE\nAll c ontent f ollo wing this p age was uplo aded b y Debipr asad Ghosh  on 11 A ugust 2024.\nThe user has r equest ed enhanc ement of the do wnlo aded file.'),
 Document(metadata={'source': '../mydocument.pdf', 'page': 1}, page_content='Manuscript as of August 0 9, 2024  \nRetrieval-Augmented Generation in Engineering Design  \nDebi Prasad Ghosh, debiprasadghosh@lntecc.com   \nDesign Automation Team  \nEngineering Design & Research Center, Larsen & Toubro Construction (M&M) Kolkata, India  \nAbstract : This paper 

In [5]:
documents=text_splitter.split_documents(docs)
documents

[Document(metadata={'source': '../mydocument.pdf', 'page': 0}, page_content='See discussions, st ats, and author pr ofiles f or this public ation at : https://www .researchgate.ne t/public ation/383027298\nRetrieval-Augmented Generation in Engineering Design\nPreprint  · August 2024\nDOI: 10.13140/RG.2.2.18595.26406\nCITATIONS\n0READS\n136\n1 author:\nDebipr asad Ghosh\nLarsen & T oubr o (L&T) Construction\n80 PUBLICA TIONS \xa0\xa0\xa091 CITATIONS \xa0\xa0\xa0\nSEE PROFILE\nAll c ontent f ollo wing this p age was uplo aded b y Debipr asad Ghosh  on 11 A ugust 2024.\nThe user has r equest ed enhanc ement of the do wnlo aded file.'),
 Document(metadata={'source': '../mydocument.pdf', 'page': 1}, page_content='Manuscript as of August 0 9, 2024  \nRetrieval-Augmented Generation in Engineering Design  \nDebi Prasad Ghosh, debiprasadghosh@lntecc.com   \nDesign Automation Team  \nEngineering Design & Research Center, Larsen & Toubro Construction (M&M) Kolkata, India  \nAbstract : This paper 

# Database or Vector Store:
There are various vector stores available. Some famous names include Chroma and FAISS db. Here we are using FAISS to create our database.

In [8]:
# I am using Ollama Embeddings now. You can also use OpenAIEmbeddings.
# from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import FAISS

db=FAISS.from_documents(documents[:30],OllamaEmbeddings(model="llama3.1"))

In [9]:
db

<langchain_community.vectorstores.faiss.FAISS at 0x141fe2e9670>

In [10]:
query="What are Physics-Informed Neural Operators?"
result=db.similarity_search(query)
result[0].page_content

'guidelines for generating, annotating, and validating synthetic datasets, \naddressing the critical challenge of data  availability in AI -driven engineering \ndesign.'

In [11]:
from langchain_community.llms import Ollama
## Load Ollama Llama3.1 LLM model
llm=Ollama(model="llama3.1")
llm

Ollama(model='llama3.1')

In [12]:
## Design ChatPrompt Template
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate.from_template("""
Answer the following question based only on the provided context. 
Think step by step before providing a detailed answer. 
I will tip you $1000 if the user finds the answer helpful. 
<context>
{context}
</context>
Question: {input}""")

# Chains:
Chains refer to sequences of calls - whether to an LLM, a tool, or a data preprocessing step.
There are several chains available in langchain. You can check here: https://python.langchain.com/v0.1/docs/modules/chains/

### Create Stuff Docment Chain
This chain takes a list of documents and formats them all into a prompt, then passes that prompt to an LLM. It passes ALL documents, so you should make sure it fits within the context window of the LLM you are using.

In [14]:
from langchain.chains.combine_documents import create_stuff_documents_chain

document_chain=create_stuff_documents_chain(llm,prompt)

# Retrievers:
A retriever is an interface that returns documents given an unstructured query. It is more general than a vector store. A retriever does not need to be able to store documents, only to return (or retrieve) them. Vector stores can be used as the backbone of a retriever, but there are other types of retrievers as well. 
https://python.langchain.com/docs/modules/data_connection/retrievers/

In [15]:
retriever=db.as_retriever()
retriever

VectorStoreRetriever(tags=['FAISS', 'OllamaEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x00000141FE2E9670>)

### Retrieval chain:
This chain takes in a user inquiry, which is then passed to the retriever to fetch relevant documents. Those documents (and original inputs) are then passed to an LLM to generate a response.
https://python.langchain.com/docs/modules/chains/

In [16]:
from langchain.chains import create_retrieval_chain
retrieval_chain=create_retrieval_chain(retriever,document_chain)

In [17]:
response=retrieval_chain.invoke({"input":"AutoTRIZ offers a more accessible"})

In [19]:
response['answer']

'Based on the provided context, I\'ll answer your question step by step.\n\nAutoTRIZ is mentioned in the text as a tool that combines the Theory of Inventive Problem Solving (TRIZ) with large language models to automate engineering design problems. \n\nStep 1: Understand what AutoTRIZ is.\nAutoTRIZ is a tool that uses TRIZ and large language models for engineering design.\n\nStep 2: Identify the potential benefits or advantages of using AutoTRIZ.\nThe text does not explicitly mention that AutoTRIZ offers a more accessible solution, but it does imply that AutoTRIZ might be an improvement over traditional methods, given its ability to "automate" engineering design problems.\n\nHowever, in the context of the provided information and considering your question about accessibility, I would make an educated guess:\n\nGiven that AutoTRIZ combines TRIZ with large language models, making complex engineering design processes more efficient, accurate, and innovative, it might offer a more accessib

### Congratulations 🎉🎉 Your chain retriever system's answer is correct.