# LangChain
LangChain is a framework for developing applications powered by language models. </br>
It provides a standard interface for working with different LLM models, as well as tools for chaining together multiple LLM calls, managing memory, and more. </br>

## Key Features
- **Standard Interface**: LangChain provides a consistent interface for working with different LLMs, making it easier to switch between models.
- **Chaining**: You can chain together multiple LLM calls to create complex workflows.
- **Memory Management**: LangChain provides tools for managing memory, allowing you to store and retrieve information across multiple LLM calls.
- **Tools and Utilities**: LangChain includes a variety of tools and utilities for working with LLMs, such as tokenizers, embeddings, and more.

## Architecture
LangChain's architecture is designed to be modular and extensible. It consists of several main packages:
- **Core**: The core package provides the basic building blocks for working with LLMs, including the standard interface and utilities for chaining and memory management.
- **langchain**: This package contains the main LangChain functionality, including the standard interface for LLMs, tools for chaining, and memory management.
- **langchain_community**: This package contains community-contributed tools and utilities for working with LLMs, such as additional tokenizers, embeddings, and more.
- **Integrations**: LangChain integrates with various LLM providers, allowing you to easily switch between different vendors.


References:

- https://python.langchain.com/docs/concepts/architecture/

## Prompt Templates
Prompt templates are a way to define the structure of a prompt that will be sent to an LLM. </br>
They allow you to create reusable templates that can be filled in with specific values at runtime.

### Types of Prompt Templates

- **Simple Prompt Template**: A basic template that takes a single input and generates a prompt.
- **Chat Prompt Template**: A template designed for chat-based interactions, allowing for multiple messages and roles.
- **Few-Shot Prompt Template**: A template that includes examples of input-output pairs to guide the LLM's response.
- **Custom Prompt Template**: A template that allows for more complex structures and custom formatting.
- **Prompt Template with Variables**: A template that includes variables that can be filled in with specific values at runtime.
- **Prompt Template with Conditionals**: A template that includes conditional logic to generate different prompts based on specific conditions.

In [2]:
# Import necessary libraries and Initialization steps
#!pip install langchain>=0.3 langchain-community>=0.3 langchain-aws>=0.2 boto3==1.38.15 pydantic==2.10.4 faiss-cpu==1.11.0
from enum import Enum
from os import times

import boto3
from langchain.retrievers.multi_query import LineListOutputParser
from langchain_aws import ChatBedrockConverse


class LLMModel(Enum):
    """Enum for Bedrock models."""
    # Anthropic
    CLAUDE_3_5_V1 = 'anthropic.claude-3-5-sonnet-20240620-v1:0'
    CLAUDE_3_5_v2 = 'us.anthropic.claude-3-5-sonnet-20241022-v2:0' # Inference Profile ID
    # Amazon
    NOVA_LITE = 'amazon.nova-lite-v1:0'
    NOVA_PRO = 'amazon.nova-pro-v1:0'
    TITAN_LITE = 'amazon.titan-text-lite-v1'
    TITAN_EXPRESS = 'amazon.titan-text-express-v1'
    META_LLMA3_1B = 'meta.llama3-2-1b-instruct-v1:0'
    META_LLMA3_3B = 'meta.llama3-2-3b-instruct-v1:0'

llm = ChatBedrockConverse(
    client=boto3.client("bedrock-runtime", region_name="us-east-1"),
    model = str(LLMModel.NOVA_PRO.value),
    max_tokens=4096,
    temperature=0.0,
)

In [3]:
from langchain_core.prompts import HumanMessagePromptTemplate, ChatPromptTemplate, AIMessagePromptTemplate

import json

system_message = HumanMessagePromptTemplate.from_template(
    """
    You are a helpful assistant that provides information about mobile phones available in the inventory.
    The inventory is provided in JSON format. Use the information to answer customer queries.
    The inventory is as follows:

    {inventory}

    """
)

assistant_message = AIMessagePromptTemplate.from_template(
    """
    Recommended guidelines for the assistant behavior:
    - Do not recommend phones that are not available in the inventory.
    - Do not recommend expensive phones beyond the customer's budget.
    - Do not recommend phones that do not meet the customer's requirements.
    - Suggest maximum 3 phones.
    """
)

human_message = HumanMessagePromptTemplate.from_template(
    """
    Customer: {customer_query}
    """
)
# Create a chat prompt template
chat_prompt_template = ChatPromptTemplate.from_messages(
    [system_message, assistant_message, human_message]
)

with open('resources/mobile-phones-inventory.json', 'r') as file:
    inventory = json.load(file)

customer_query = "I need a Android phone with a budget of maximum $500. Camera should be 50MP Wide, 8GB RAM, What do you have available?"

final_prompt = chat_prompt_template.format_messages(
    inventory=json.dumps(inventory, indent=2),
    customer_query=customer_query
)

print(llm.invoke(final_prompt).content)



These are the phones that are available in your budget:
- Aura Lite
- Luminory Glow
- Echo Prime
- Zenith Z
- Echo Standard

If you have any additional requirements, please let me know.


## Output Parsers
Output parsers are used to process the output from an LLM and convert it into a structured format. </br>
The combination of Pydantic and LangChain allows you to define data models that can be used to validate and parse the output from an LLM.

In [None]:
from pydantic import BaseModel
from langchain.output_parsers import PydanticOutputParser

class PhoneRecommendation(BaseModel):
    """Model for phone recommendation."""
    brand: str
    model: str
    price: float
    camera: str
    ram: str

parser =PydanticOutputParser(pydantic_object=PhoneRecommendation)

system_message = HumanMessagePromptTemplate.from_template(
    """
    You are a helpful assistant that provides information about mobile phones available in the inventory.
    The inventory is provided in JSON format. Use the information to answer customer queries.
    The inventory is as follows:

    {inventory}

    ---------------------

    The output should be a JSON array of phone recommendations, each recommendation should be a JSON object with the following fields:

    {format_instructions}
    """


)
chat_prompt_template = ChatPromptTemplate.from_messages(
    [system_message, assistant_message, human_message]
)
final_prompt = chat_prompt_template.format_messages(
    inventory=json.dumps(inventory, indent=2),
    customer_query=customer_query,
    format_instructions=parser.get_format_instructions()
)

print(llm.invoke(final_prompt).content)



## Memory
Memory in LangChain allows you to store historical messages and context across multiple interactions with an LLM. </br>
This is particularly useful for chat-based applications where you want to maintain **context** & **continuity** in conversations.

### Types of Memory
- **In-Memory Memory**: Stores messages in memory, suitable for short-lived applications.
- **Persistent Memory**: Stores messages in a database or file system, suitable for long-lived applications.
- **Custom Memory**: Allows you to implement your own memory management logic, suitable for complex applications.


In [None]:
import json
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
import uuid

# MessagesPlaceholder is used to dynamically include the message history in the prompt.
history_message = MessagesPlaceholder(variable_name="history")

# Create a chat prompt template with system, history, and human messages
# And the partial method is used to fill in the inventory data that is known at the time of creating the prompt template.
chat_prompt_template = ChatPromptTemplate.from_messages([system_message, assistant_message, history_message, human_message]).partial(
    inventory=json.dumps(inventory, indent=2)
)

# Combine the chat prompt template with the LLM to create a runnable
# The pipe operator (|) is used to chain operations, where the output of chat_prompt_template (partial formatted messages) becomes the input for llm.
# This allows seamless integration of the prompt template with the language model for generating responses.
runnable = chat_prompt_template | llm

# Create a message history store (in memory for this example)
sessions = {}

# create sessions with unique session IDs to manage conversation history
def get_session_history(session_id: str):
    if session_id not in sessions:
        sessions[session_id] = InMemoryChatMessageHistory()
    return sessions[session_id]

# Use the runnable with message history
chat_with_history = RunnableWithMessageHistory(
    runnable,
    get_session_history,
    input_messages_key="customer_query",
    history_messages_key="history"
)

def interactive_chat():
    session_id = str(uuid.uuid4())

    response = None
    while True:
        # Set the AI message to the initial greeting or the last response
        ai_message = response.content if response else "Sales Assistant: Hello! I'm your phone sales representative. How can I help you today?"
        # Get the user query
        user_input = input(ai_message)

        # Check if user wants to exit
        if user_input.lower() == 'exit':
            print("\nSales Assistant: Thank you for shopping with us! Have a great day!")
            break

        # Get response using the runnable with history
        response = chat_with_history.invoke(
            {"customer_query": user_input},
            {"configurable": {"session_id": session_id}}
        )

        print("Tokens used so far:", response.usage_metadata['total_tokens'])

# Run the interactive conversation
interactive_chat()


## Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation (RAG) is a technique that combines the power of large language models (LLMs) with external knowledge sources to improve the quality and relevance of generated responses. </br>

The process of creating a RAG application typically involves the following steps:
1. **Document Loading**: Load documents from various sources (e.g., web pages, documents, databases) that contain relevant information.
2. **Chunking**: Split the loaded documents into smaller, manageable chunks to facilitate efficient retrieval.
3. **Embedding Creation**: Generate embeddings for the document chunks using a suitable embedding model. These embeddings capture the semantic meaning of the text.
4. **Vector Store Creation**: Store the embeddings in a vector store (e.g., FAISS, Pinecone) to enable efficient similarity search.
5. **Retriever Creation**: Create a retriever that can query the vector store to find relevant document chunks based on user queries.


### Embeddings and Vector Stores
Embeddings are numerical representations of text that capture semantic meaning. </br>
In order to create an embedding vector, we use a LLM to convert text into a high-dimensional vector space where similar texts are closer together. </br>
Once we have the embeddings, we store them in a vector store for future retrival. </br>

![RAG Indexing Process](../resources/images/rag_indexing.png)

### Retrieval and generation

When using trying to find similar documents, we use a retriever to search the vector store for relevant chunks based on the user's query. </br>
For that the query is also converted into an embedding vector using the same embedding model. </br>
Now that we have both chunks and query in the same vector space, we can find the most similar chunks to the query. </br>

![RAG Indexing Process](../resources/images/rag_retrieval_generation.png)

References:
- https://python.langchain.com/docs/tutorials/rag/

In [11]:
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import FAISS
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_aws import BedrockEmbeddings  # Changed LLM import
from langchain_text_splitters import RecursiveCharacterTextSplitter

# --- 1. Load and Chunk Documents ---
print("Step 1: Loading webpage content by known element ids")
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()
if not docs:
    print("No documents loaded. Check the URL or SoupStrainer.")
    exit()

print("Step 2: Splitting documents into chunks...")

# RecursiveCharacterTextSplitter Is Recommended for general text:
# This is a more sophisticated and generally recommended splitter for generic text.
# It attempts to keep semantically related pieces of text together by using a list of separators in a hierarchical order.
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)
all_splits = text_splitter.split_documents(docs)
print(f"Step 2: Document was split into {len(all_splits)} chunks.")

if not all_splits:
    print("No text splits generated. Check the document content or splitter settings.")
    exit()

# --- 2. Create Vector Store and Index Chunks ---
print("\nStep 3: Creating vector store and indexing chunks with AWS Bedrock Embeddings...")
# Instantiate BedrockEmbeddings
embeddings = BedrockEmbeddings(model_id='amazon.titan-embed-text-v2:0')

vector_store = InMemoryVectorStore(embeddings)
# Or use FAISS for larger datasets
# vector_store = FAISS.from_documents(documents=all_splits, embedding=embeddings)
print("Vector store created and documents indexed successfully using AWS Bedrock Embeddings.")



Step 1: Loading webpage content by known element ids
Step 2: Splitting documents into chunks...
Split into 61 chunks.

Step 3: Creating vector store and indexing chunks with AWS Bedrock Embeddings...
Vector store created and documents indexed successfully using AWS Bedrock Embeddings.


Lets explore the process for:
1. Chunks of text Created from the document
2. Embedding vectors created for each chunk
3. Vector Store created with the embeddings


In [17]:
# Lets see the raw document content
print(docs[0])

# Display the first few chunks of text
print("\n--- Text Chunks ---")
for i, chunk in enumerate(all_splits[:5]):
    print(f"Chunk {i+1}:")
    print("MetaData: \n", chunk.metadata)
    print("Content: \n",chunk.page_content)
    print("-" * 80)

# The question will be converted into an embedding vector and used to search the vector store.
## Print the first few embedding vectors
print('The embedding vector of the question', embeddings.embed_query("What is an LLM agent?") )
# vector_store.similarity_search("What is an LLM agent?", k=3)  # Example search to see if the vector store works


--- Text Chunks ---
Chunk 1:
MetaData: 
 {'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}
Content: 
 LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.
Agent System Overview#
In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:

Planning

Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.
Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes a

## LangChain Retrievers
Retrievers in LangChain are used to fetch relevant documents or information based on a user query. </br>
They can be used to retrieve documents from various sources, such as databases, web pages, or vector stores. </br>

### Types of Retrievers
- **Vector Store Retriever**: Retrieves documents based on similarity search in a vector store.
- **Document Loader Retriever**: Retrieves documents from a document loader.
- **Multi-Query Retriever**: Retrieves documents based on multiple queries.
- **EnsembleRetriever**: Combines results from multiple retrievers.

### Reranking
Reranking is a technique used to improve the quality of retrieved documents by reordering them based on relevance to the user query. </br>
Reranking can be done using various methods, such as:
- **Embedding-based Reranking**: Uses embeddings to calculate similarity between the query and retrieved documents.
- **LLM-based Reranking**: Uses a language model to evaluate the relevance of retrieved documents based on the user query.
- **Rule-based Reranking**: Uses predefined rules to reorder retrieved documents based on specific criteria.
- **Metadata-based Reranking**: Uses metadata associated with documents to reorder them based on relevance to the user query.

References:
- https://python.langchain.com/docs/concepts/retrievers/


In [14]:
from typing import List
from langchain_core.documents import Document

def format_docs_for_context(docs: List[Document]) -> str:
    """
    Formats the retrieved documents into a single string for the prompt context.
    """
    return "\n\n-----------\n".join(doc.page_content for doc in docs)


# --- 6. Create Retriever and RAG Chain ---
retriever = vector_store.as_retriever(search_kwargs={"k": 4})  # Retrieve top 4 relevant chunks

print(format_docs_for_context(retriever.invoke("What do you know about Generative Agents?")))

They also discussed the risks, especially with illicit drugs and bioweapons. They developed a test set containing a list of known chemical weapon agents and asked the agent to synthesize them. 4 out of 11 requests (36%) were accepted to obtain a synthesis solution and the agent attempted to consult documentation to execute the procedure. 7 out of 11 were rejected and among these 7 rejected cases, 5 happened after a Web search while 2 were rejected based on prompt only.
Generative Agents Simulation#
Generative Agents (Park, et al. 2023) is super fun experiment where 25 virtual characters, each controlled by a LLM-powered agent, are living and interacting in a sandbox environment, inspired by The Sims. Generative agents create believable simulacra of human behavior for interactive applications.
The design of generative agents combines LLM with memory, planning and reflection mechanisms to enable agents to behave conditioned on past experience, as well as to interact with other agents.

-

## LangChain Chains

Chains in LangChain allow you to create complex workflows by chaining together multiple operations, such as document retrieval, text processing, and LLM generation. </br>
In the following steps, we will create a Retrieval-Augmented Generation (RAG) chain that retrieves relevant documents based on a user query and generates an answer using an LLM.

### LangChain Expression Language (LCEL) "Pipe" Operator

LangChain leverages Python's ability to overload operators (specifically the __or__ method) to implement a functional "pipe" or "chaining" mechanism. </br>
**How it works in LCEL:**</br>
When you write `component_a` | `component_b`, it means "take the output of `component_a` and pass it as the input to `component_b`." </br>
It creates a sequence of operations where the result of one step flows directly into the next.
This makes building complex LLM applications (like RAG chains) much more readable and modular.
Each "component" in an LCEL chain (like retriever, prompt, llm, StrOutputParser) is typically an instance of a Runnable object. LangChain's Runnable classes define the __or__ method, which is what allows this chaining syntax to work.


In [21]:

from langchain_core.prompts import ChatPromptTemplate

from langchain_core.runnables import RunnablePassthrough
from langchain_core.output_parsers import StrOutputParser


template = """Use the following pieces of context to answer the question at the end.
    If you don't know the answer, just say that you don't know, don't try to make up an answer.
    Use three sentences maximum and keep the answer concise.
    Context: {context}
    Question: {question}
    Helpful Answer:"""
prompt = ChatPromptTemplate.from_template(template)


print("\nCreating retriever and RAG chain...")

rag_chain = (
        {"context": retriever | format_docs_for_context, "question": RunnablePassthrough()}
        | prompt
        | llm
        | StrOutputParser()
)
print("RAG chain created.")
print("\n--- Starting RAG Application (with AWS Bedrock Embeddings and LLM) ---")

# example_question = "What are the main components of an LLM agent?"
example_question = "What types  of memory does LLM application use?"  # https://lilianweng.github.io/posts/2023-06-23-agent/#component-two-memory

final_answer = rag_chain.invoke(example_question)
print("Question:", example_question)
print("Answer:", final_answer)




Creating retriever and RAG chain...
RAG chain created.

--- Starting RAG Application (with AWS Bedrock Embeddings and LLM) ---
Question: What types  of memory does LLM application use?
Answer: LLM applications use both short-term memory (in-context learning) and long-term memory (external vector store for infinite recall).
