1. document loader - `from langchain_community.document_loaders import WebBaseLoader`

2. text splitter - `from langchain_text_splitters import RecursiveCharacterTextSplitter`

3. embeddings- `from langchain_openai.embeddings import OpenAIEmbeddings`

4. store in vector db - `from langchain_community.vectorstores import FAISS`

5. prompt template - `from langchain_core.prompts import PromptTemplate`

6. retriever - `vectorstore.as_retriever()`

7. invokation - `retrieval_chain.invoke({
    "input": "Who was APJ Abdul Kalam?"
})`

In [2]:
import os
from dotenv import load_dotenv

load_dotenv()

True

In [3]:
from langchain_community.document_loaders import WebBaseLoader
docs = WebBaseLoader("https://en.wikipedia.org/wiki/A._P._J._Abdul_Kalam").load()

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [4]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
documents = text_splitter.split_documents(docs)


In [5]:
from langchain_openai.embeddings import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(model='text-embedding-3-small', chunk_size=1)

In [6]:
from langchain_community.vectorstores import FAISS
vectorstore = FAISS.from_documents(documents, embeddings)

In [7]:
query = "Where was APJ Kalam born?"
similar_docs = vectorstore.similarity_search(query, k=2)

In [8]:
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4")

In [9]:
from langchain_core.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain


In [10]:
# Define the prompt
prompt = PromptTemplate(
    input_variables=["context", "input"],
    template="""
Answer the following question using only the context below. Be concise and accurate.

<context>
{context}
</context>

Question: {input}
response:"""
)

# Create document chain
document_chain = create_stuff_documents_chain(llm=llm, prompt=prompt)

In [11]:
# Attach output parser
from langchain_core.output_parsers import StrOutputParser

chain_with_parser = document_chain | StrOutputParser()

In [12]:
# Create retriever and full pipeline
retriever = vectorstore.as_retriever()
retrieval_chain = create_retrieval_chain(retriever, chain_with_parser)

In [13]:
# Final invoke call with structured output
response = retrieval_chain.invoke({
    "input": "Who was APJ Abdul Kalam?"
})

In [14]:
response

{'input': 'Who was APJ Abdul Kalam?',
 'context': [Document(id='52b1fbcd-6fe5-43fd-a571-c3d08ca014d0', metadata={'source': 'https://en.wikipedia.org/wiki/A._P._J._Abdul_Kalam', 'title': 'A. P. J. Abdul Kalam - Wikipedia', 'language': 'en'}, page_content="A.\xa0P.\xa0J. Abdul KalamOfficial portrait in 2002President of IndiaIn office25 July 2002\xa0– 25 July 2007Prime MinisterAtal Bihari VajpayeeManmohan SinghVice PresidentKrishan KantBhairon Singh ShekhawatPreceded byK. R. NarayananSucceeded byPratibha PatilPrincipal Scientific Adviser to the Government of IndiaIn officeNovember 1999\xa0– November 2001PresidentK. R. NarayananPrime MinisterAtal Bihari VajpayeePreceded byOffice establishedSucceeded byRajagopala ChidambaramDirector General of Defence Research and Development OrganisationIn office1992-1999Preceded byRaja RamannaSucceeded byVasudev Kalkunte Aatre\nPersonal detailsBorn(1931-10-15)15 October 1931Rameswaram, Madras Presidency, British India (present day Tamil Nadu, India)Died27

In [15]:
print(response["answer"])

APJ Abdul Kalam was an Indian aerospace scientist and statesman who served as the President of India from 2002 to 2007. He played a significant role in India's civilian space programme, military missile development efforts and the Pokhran-II nuclear tests in 1998. He was known as the "Missile Man of India" for his work on the development of ballistic missile and launch vehicle technology.
