# Retrieval Augmenented Generation via Langchain

Source: [Langchain RAGs](https://python.langchain.com/docs/tutorials/rag/#setup)

<img src="RAG_Process.png" alt="RAG Process" width="800">

In [1]:
import os
from langchain_openai import ChatOpenAI

Set Environment variables in `.bashrc` or `.zshrc` file:

```bash
export LANGCHAIN_TRACING_V2 = "true"
export LANGCHAIN_API_KEY = ******
export OPENAI_API_KEY = ******
```

In [2]:
!source /Users/johtorr/.zshrc

# 1. RAG with OpenAI GPT-4o-mini Model

In [32]:
import bs4
from langchain import hub
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

In [33]:
# Actiave Model
llm = ChatOpenAI(model="gpt-4o-mini")

# Load, chunk and index the contents of the blog.
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)


In [34]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

print("Number of documents available ", len (splits))

Number of documents available  66


In [35]:
type(splits)

list

In [36]:
splits[0].page_content

'LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\n\n\nMemory'

In [47]:
OpenAIEmbeddings_model = OpenAIEmbeddings()
sample_embedding = OpenAIEmbeddings_model.embed_query("sample text")
print("Embedding Dimension of OpenAI Model is : ", len(sample_embedding)) 

Embedding Dimension of OpenAI Model is :  1536


In [41]:
# Create vector store of embeddings and persist them locally
persist_directory = os.path.join(os.getcwd(), "vector_stores", "openai_vectors")
vectorstore = Chroma.from_documents(documents=splits, 
                                    embedding=OpenAIEmbeddings_model, 
                                    persist_directory=persist_directory)

In [13]:
# Retrieve and generate using the relevant snippets of the blog.
# set number of documents to be returned as 6
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 6})
prompt = hub.pull("rlm/rag-prompt")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

rag_chain.invoke("What is Task Decomposition?")



'Task decomposition is the process of breaking down a complex task into smaller, more manageable steps. Techniques like Chain of Thought (CoT) encourage models to think step-by-step, facilitating this breakdown, while the Tree of Thoughts method explores multiple reasoning possibilities at each step. This allows for better planning and execution by using simple prompts, task-specific instructions, or human inputs.'

# 2. RAG with local llama3.2 model

In [58]:
from langchain_community.chat_models import ChatOllama
from langchain.embeddings import HuggingFaceEmbeddings

local_llm = ChatOllama(
            model="llama3.2:latest",
            temperature=0.1,
            keep_alive="1h",
            max_tokens=8000,
        )

In [59]:
embedding_model = HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")
sample_embedding = embedding_model.embed_query("sample text")
print("Embedding Dimension of bge-base Model is : ", len(sample_embedding)) 

Embedding Dimension of bge-base Model is :  768


In [60]:

persist_directory = os.path.join(os.getcwd(), "vector_stores", "llama_vectors")
vectorstore = Chroma.from_documents(documents=splits,
                      embedding=embedding_model, 
                     persist_directory=persist_directory)

In [68]:
# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 3})
prompt = hub.pull("rlm/rag-prompt")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

response = rag_chain.invoke("What is Task Decomposition?")



In [69]:
print(response)

Task Decomposition is a process where complex tasks are broken down into smaller, more manageable steps. This is often achieved using the Chain of Thought (CoT) prompting technique, which encourages models to "think step by step." It enhances model performance by allowing more computation during testing and provides insight into the model's reasoning.


## Return the sources

In [70]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

# Use create_retrieval_chain object
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

response = rag_chain.invoke({"input": "What is Task Decomposition?"})
print(response["answer"])

Task Decomposition is a technique used to break down complex tasks into smaller, more manageable steps. This process often involves the Chain of Thought (CoT) prompting technique, where a model is instructed to “think step by step” to enhance performance on difficult tasks. By decomposing tasks, it also provides insight into the model's thinking process.


In [71]:
for document in response["context"]:
    print(document)
    print()

page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.' metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}

page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique 