### LangChain

LLMs excel at responding to prompts in a general context, but struggle in a specific domain they were never trained on. For example, an LLM can provide a python code to sync data from DB to CRM. However, it can't tell you how the CRM sync works at policyme.
To do that, we must integrate the LLM with the organization’s internal data sources like confluence docs and code.

LangChain is an open source FRAMEWORK for building applications based on large language models (LLMs) to solve exactly these gaps by streamling intermediate steps to develop such data-responsive applications.

LangChain provides tools and abstractions to improve the customization, accuracy, and relevancy of the information the models generate.

Langchain is designed to develop diverse applications powered by language models more effortlessly, including chatbots, question-answering, content generation, summarizers, and more.

Benefits -
- With LangChain, organizations can repurpose LLMs for domain-specific applications without retraining or fine-tuning.
- LangChain simplifies artificial intelligence (AI) development by abstracting the complexity of data source integrations and prompt refining.

## Components of langchain

On of the most common use case of langchain is to build Retrieval Augmented Generation (RAG) applications like it was demoed by Joseph. If you hope back to the code he shared he was using langchain to create his RAG application. So, I'll try to continue with the similar RAG application to explain various components of a langchain.

(Refer to diagram)

1) Data ingestion -> Load the data from different sources
2) Data transformation -> Split the data into smaller chunks
3) Embeddings -> Convert the chunks into vectors
4) Vector store -> DB where the vectors are stored and can be retrieved with alogorithms like simlarity search
5) Retreival Chain - Interface to query the vector store DB


### Data Ingestion

In [138]:
# https://python.langchain.com/docs/integrations/document_loaders/

## Reading a PDf File
from langchain_community.document_loaders import PyPDFLoader
loader=PyPDFLoader('auth.pdf')
auth_doc = loader.load()
auth_doc

[Document(metadata={'producer': 'Skia/PDF m139', 'creator': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/139.0.0.0 Safari/537.36', 'creationdate': '2025-08-24T23:35:51+00:00', 'title': 'Auth – Current State & How It Works - gurmeet.singh - Confluence', 'moddate': '2025-08-24T23:35:51+00:00', 'source': 'auth.pdf', 'total_pages': 19, 'page': 0, 'page_label': '1'}, page_content='Auth – Current State & How It Works\nBefore April 2025\nUser Authentication Flow\n1. User Login\nWhen a user tries to log in, the frontend sends a request to the backend (BE) to\nauthenticate the user.\n2. Receiving the JWT Token\nThe backend responds in one of two ways:\nIt sets a cookie in the browser called user_jwt_token containing the JWT token, \nor\nIt sends the JWT token in the API response payload (in the body of the response).\n3. Storing the JWT Token\nThe frontend takes the JWT token received from the backend and:\nStores it in the Redux state for easy access i

In [139]:
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://www.policyme.com/blog/best-life-insurance-in-canada")
insurance_doc = loader.load()
insurance_doc

[Document(metadata={'source': 'https://www.policyme.com/blog/best-life-insurance-in-canada', 'title': 'Best Life Insurance in Canada, Ranked (Updated!) | PolicyMe', 'description': 'Looking for the best life insurance? We compare top companies & break down what real families need to know. Honest comparisons, pricing tips, what to watch for.', 'language': 'en-CA'}, page_content='Best Life Insurance in Canada, Ranked (Updated!) | PolicyMe\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n.\nSkip to Main ContentLog InLog InGet My QuoteGet My QuoteLife InsuranceBackLife InsuranceA promise that if you pass away, your family\'s financial future is protected.Get My QuoteGet life insurance coverageProduct InformationLife insurance quoteTerm life insuranceLife insurance calculatorEducationBest life insurance in CanadaLife insurance costIs life insurance worth it?How much

### Data transformation

In [140]:
# (Refer to diagram)
# https://python.langchain.com/docs/concepts/text_splitters/

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter=RecursiveCharacterTextSplitter(chunk_size=500,chunk_overlap=50)
final_documents=text_splitter.split_documents(insurance_doc)
final_documents

[Document(metadata={'source': 'https://www.policyme.com/blog/best-life-insurance-in-canada', 'title': 'Best Life Insurance in Canada, Ranked (Updated!) | PolicyMe', 'description': 'Looking for the best life insurance? We compare top companies & break down what real families need to know. Honest comparisons, pricing tips, what to watch for.', 'language': 'en-CA'}, page_content='Best Life Insurance in Canada, Ranked (Updated!) | PolicyMe'),
 Document(metadata={'source': 'https://www.policyme.com/blog/best-life-insurance-in-canada', 'title': 'Best Life Insurance in Canada, Ranked (Updated!) | PolicyMe', 'description': 'Looking for the best life insurance? We compare top companies & break down what real families need to know. Honest comparisons, pricing tips, what to watch for.', 'language': 'en-CA'}, page_content='.'),
 Document(metadata={'source': 'https://www.policyme.com/blog/best-life-insurance-in-canada', 'title': 'Best Life Insurance in Canada, Ranked (Updated!) | PolicyMe', 'descri

In [141]:
from langchain_text_splitters import HTMLHeaderTextSplitter

url = "https://www.policyme.com/blog/best-life-insurance-in-canada"

headers_to_split_on = [
    ("h1", "Header 1"),
    ("h2", "Header 2"),
    ("h3", "Header 3"),
    ("h4", "Header 4"),
]
html_splitter = HTMLHeaderTextSplitter(headers_to_split_on)
html_header_splits = html_splitter.split_text_from_url(url)
html_header_splits

[Document(metadata={}, page_content='Replace www. with environment END Replace www. with environment queryParamsToCookie, v3, updated Jan 11th, 2024 by Tri Le END queryParamsToCookie START influencer END influencer HIDE HD pages for affiliate END HD pages for affiliate Navbar Event Tracking Code END Navbar Event Tracking Code START September promotion countdown END September promotion countdown Schema Article  \nImprove the font quality Focus State Client-First CSS Reset Button Styles Varaible Spacing Responsiveness Button Styles  \n* {\n    -webkit-font-smoothing: antialiased;\n    -moz-osx-font-smoothing: grayscale;\n    -o-font-smoothing: antialiased;\n    -webkit-appearance:none;\n  }  \n:focus-visible,\n  [data-wf--element-button--general-button---theme="link"] .button:focus-visible {\n    outline: 0.125rem dotted var(--primitives--rust);\n    outline-offset: 0.25rem;\n  } \n  \n  .w-tab-link:focus-visible {\n  \toutline: 0.125rem dotted var(--primitives--rust);\n    outline-offse

### Embeddings

In [142]:
# (Refer to diagram)
# https://python.langchain.com/docs/integrations/text_embedding/
# https://platform.openai.com/docs/guides/embeddings
# you can use ollama or hugging face as well by downloading the models locally 

import os
from dotenv import load_dotenv
from langchain_openai import OpenAIEmbeddings

load_dotenv()
os.environ["OPENAI_API_KEY"]=os.getenv("OPENAI_API_KEY")

embedding=OpenAIEmbeddings(model="text-embedding-3-large")
embedding

OpenAIEmbeddings(client=<openai.resources.embeddings.Embeddings object at 0x11987e210>, async_client=<openai.resources.embeddings.AsyncEmbeddings object at 0x11ab1ae50>, model='text-embedding-3-large', dimensions=None, deployment='text-embedding-ada-002', openai_api_version=None, openai_api_base=None, openai_api_type=None, openai_proxy=None, embedding_ctx_length=8191, openai_api_key=SecretStr('**********'), openai_organization=None, allowed_special=None, disallowed_special=None, chunk_size=1000, max_retries=2, request_timeout=None, headers=None, tiktoken_enabled=True, tiktoken_model_name=None, show_progress_bar=False, model_kwargs={}, skip_empty=False, default_headers=None, default_query=None, retry_min_seconds=4, retry_max_seconds=20, http_client=None, http_async_client=None, check_embedding_ctx_length=True)

In [143]:
text = "This is a test sentence for embedding."
query_result = embedding.embed_query(text)
query_result

[0.019904259592294693,
 0.028880691155791283,
 -0.01948145590722561,
 0.008797552436590195,
 0.026311349123716354,
 -0.015001372434198856,
 -0.0011312010465189815,
 0.0407191701233387,
 -0.021188929677009583,
 -0.024587614461779594,
 0.019822951406240463,
 0.03294610232114792,
 0.01691211573779583,
 -0.03359656780958176,
 0.005707839038223028,
 0.026197517290711403,
 -0.01414763554930687,
 0.02063603326678276,
 -0.020148184150457382,
 -0.021026313304901123,
 0.03097844310104847,
 -0.04709373787045479,
 -0.030669471248984337,
 0.009317925199866295,
 0.051354289054870605,
 -0.01935136318206787,
 0.014383429661393166,
 0.02709190919995308,
 -0.02104257419705391,
 0.009488672949373722,
 0.022392291575670242,
 -0.010586334392428398,
 0.009163440205156803,
 0.009334187023341656,
 0.03958085551857948,
 -0.009578111581504345,
 0.05561484396457672,
 0.023416776210069656,
 3.3984306355705485e-05,
 -0.004211767110973597,
 0.05457409843802452,
 -0.016928378492593765,
 -0.03870272636413574,
 0.0098

In [144]:
len(query_result)  # Dimmensions of the embedding vector

3072

### Vector Store

In [145]:
# (Refer to diagram)
# https://python.langchain.com/docs/integrations/vectorstores/

from langchain_chroma import Chroma

vectorstore = Chroma.from_documents(
    documents=final_documents,
    embedding=embedding,
    persist_directory="chroma_db"  # Directory to persist the vector store
)
vectorstore

<langchain_chroma.vectorstores.Chroma at 0x11dc2a650>

In [146]:
# query the DB
insurance_query = "Runner-up for term life"
retrieved_results = vectorstore.similarity_search(insurance_query) # other algos are available too
print(retrieved_results[0].page_content)

Runner-up for best term life insurance: RBCOur score: ★★★★☆ (4/5)Why we chose it: RBC has two term products and a T100 option which is good value. Their rates are competitive against other providers, especially for term life insurance.Company profile: The Royal Bank of Canada (RBC) was founded in 1864 in Halifax, Nova Scotia and is the largest bank in Canada by revenue. They have an A.M. Best financial strength rating of A+.» Read full RBC life insurance reviewSecond runner up for best term


### Retrieval Chain + tracing wih langsmith

In [147]:
# (Refer to diagram)
# https://platform.openai.com/docs/models

from langchain_openai import ChatOpenAI
llm=ChatOpenAI(model="gpt-5-2025-08-07")
llm

ChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x11dc29dd0>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x11dc2ab10>, root_client=<openai.OpenAI object at 0x11a99fd50>, root_async_client=<openai.AsyncOpenAI object at 0x11dc2b410>, model_name='gpt-5-2025-08-07', model_kwargs={}, openai_api_key=SecretStr('**********'))

In [148]:
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain

prompt = ChatPromptTemplate.from_template(
    """
     You are an expert insurance advisor.
    Answer the question **only** using the provided context. If the answer isn't in the context, say you don't know.

    Question: {input}

    <context>
    {context}
    </context>
    """
)
document_chain=create_stuff_documents_chain(llm,prompt)

In [149]:
## Langsmith Tracking
os.environ["LANGCHAIN_API_KEY"]=os.getenv("LANGCHAIN_API_KEY")
os.environ["LANGCHAIN_TRACING_V2"]="true"
os.environ["LANGSMITH_TRACING"]="true"
os.environ["LANGCHAIN_PROJECT"]=os.getenv("LANGCHAIN_PROJECT")

In [150]:
# https://smith.langchain.com/o/1611c8b3-3733-45a2-a1cb-b491f8fd2548/projects

In [151]:
# Retriever
from langchain.chains import create_retrieval_chain

retriever = vectorstore.as_retriever() # retriever to get data from vectorstore DB
retriever_chain = create_retrieval_chain(retriever, document_chain)
retriever_chain

RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x11dc2a650>, search_kwargs={}), kwargs={}, config={'run_name': 'retrieve_documents'}, config_factories=[])
})
| RunnableAssign(mapper={
    answer: RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
              context: RunnableLambda(format_docs)
            }), kwargs={}, config={'run_name': 'format_inputs'}, config_factories=[])
            | ChatPromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'input'], input_types={}, partial_variables={}, template="\n     You are an expert insurance advisor.\n    Answer the question **only** using the provided context. If the answer isn't in the context, say

In [152]:
# get response from LLM
response = retriever_chain.invoke({"input":"Which company is the runner-up for term life in Canada?"})
response

{'input': 'Which company is the runner-up for term life in Canada?',
 'context': [Document(id='cccf2a54-fcc8-4227-ad43-b301f7a4b1a2', metadata={'title': 'Best Life Insurance in Canada, Ranked (Updated!) | PolicyMe', 'source': 'https://www.policyme.com/blog/best-life-insurance-in-canada', 'description': 'Looking for the best life insurance? We compare top companies & break down what real families need to know. Honest comparisons, pricing tips, what to watch for.', 'language': 'en-CA'}, page_content='Runner-up for best term life insurance: RBCOur score: ★★★★☆ (4/5)Why we chose it:\xa0RBC has two term products and a T100 option which is good value. Their rates are competitive against other providers, especially for term life insurance.Company profile: The Royal Bank of Canada (RBC) was founded in 1864 in Halifax, Nova Scotia and is the largest bank in Canada by revenue. They have an A.M. Best financial strength rating of A+.» Read full RBC life insurance reviewSecond runner up for best te

### Langserve

Integrated with python FastAPI helping deploy Langchain as a REST API

Run - serve.py

### With Memory 

In [153]:
response = retriever_chain.invoke({"input":"What is my name?"})
response

{'input': 'What is my name?',
 'context': [Document(id='1447934d-f6e6-4b88-9d05-9c1f6357c08a', metadata={'language': 'en-CA', 'source': 'https://www.policyme.com/blog/best-life-insurance-in-canada', 'description': 'Looking for the best life insurance? We compare top companies & break down what real families need to know. Honest comparisons, pricing tips, what to watch for.', 'title': 'Best Life Insurance in Canada, Ranked (Updated!) | PolicyMe'}, page_content='What is the best type of life insurance for me?'),
  Document(id='93bedd45-8f7f-43b4-a016-353e3d156aba', metadata={'title': 'Best Life Insurance in Canada, Ranked (Updated!) | PolicyMe', 'source': 'https://www.policyme.com/blog/best-life-insurance-in-canada', 'description': 'Looking for the best life insurance? We compare top companies & break down what real families need to know. Honest comparisons, pricing tips, what to watch for.', 'language': 'en-CA'}, page_content='What is the best type of life insurance for me?'),
  Documen

In [159]:
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain.chains import create_retrieval_chain
from langchain_core.prompts import MessagesPlaceholder
from langchain_community.chat_message_histories import ChatMessageHistory
from operator import itemgetter
# from langchain_core.chat_history import BaseChatMessageHistory

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an expert insurance advisor."),
    MessagesPlaceholder("history"),
    ("human", "Question: {input}\n\n<context>\n{context}\n</context>"),
])

document_chain=create_stuff_documents_chain(llm,prompt)

retriever = vectorstore.as_retriever() # retriever to get data from vectorstore DB
retriever_chain = create_retrieval_chain(retriever, document_chain)
answer_text_chain = retriever_chain | itemgetter("answer")

store={}

def get_session_history(config):
    # Accept either a session_id string or a config dict (both occur across LC versions)
    if isinstance(config, str):
        sid = config or "default"
    elif isinstance(config, dict):
        sid = config.get("configurable", {}).get("session_id", "default")
    else:
        # Some versions pass a RunnableConfig (mapping-like); try getattr then fallback
        try:
            sid = getattr(config, "configurable", {}).get("session_id", "default")
        except Exception:
            sid = "default"

    if sid not in store:
        store[sid] = ChatMessageHistory()
    return store[sid]

with_message_history=RunnableWithMessageHistory(
    answer_text_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history",
)

config={"configurable":{"session_id":"chat1"}}

response = with_message_history.invoke({"input":"I am Gurmeet Singh"}, config=config)
response

'Hi Gurmeet — happy to help. What are you looking to do today?\n- Get a life insurance quote\n- Review/adjust an existing policy\n- Make a claim\n- Something else\n\nIf you want a quick quote, please share:\n- Province of residence\n- Age, sex at birth, smoker/vaper status\n- Health conditions or prescriptions (high level)\n- Relationship status and dependents\n- Annual income, mortgage/debts, and any existing life insurance\n- Desired coverage amount/term (if known) and monthly budget\n\nIf you need to make a claim, the fastest route is to contact the claims team with your policy number (if handy) and a brief description of the situation:\n- Phone: (866) 999-7457\n- Email: info@policyme.com\n\nTell me which path you’d like, and I’ll guide you step by step.'

In [155]:
response = with_message_history.invoke({"input":"What is my name?"}, config=config)
response

'Your name is Gurmeet Singh.'

In [156]:
config={"configurable":{"session_id":"chat2"}}

response = with_message_history.invoke({"input":"What is my name?"}, config=config)
response

'I don’t know your name—it wasn’t provided in your message. If you share it, I’ll use it.\n\nIf you’d like help picking the best type of life insurance, I can ask a few quick questions about your age, health, dependents, budget, and goals.'

### Agentic AI V/S AI Agents

AI agnets refers to indiviual software programs designed to perform specific taks with a degree of autonomy.
- The chatbot we generated above is an AI agent as it answer basic questions based on pre-defined scripts.
- Follow predefined rules for decison making.


Agntic AI is a broader framework where multiple agents can collaborate and make decisions independently to achieve a larger goal.
- Capable to make decisions in realtime based on data
- Network of agents working together to achieve a goal.