RAG, RAG with Memory, Adaptive RAG, Corrective RAG, self-RAG, Agentive RAG... are you lost? Let me help you with this guide.

1/ Simple RAG
Retrieves relevant documents based on the query and uses them to generate an answer.

2/ Simple RAG with Memory
Extends Simple RAG by maintaining context from previous interactions.

3/ Branched RAG
Performs multiple retrieval steps, refining the search based on intermediate results.

4/ HyDE (Hypothetical Document Embedding)
Generates a hypothetical ideal document before retrieval to improve search relevance.

5/ Adaptive RAG
Dynamically adjusts retrieval and generation strategies based on the query type or difficulty.

6/ Corrective RAG (CRAG)
Iteratively refines generated responses by fact-checking against retrieved information.

7/ Self-RAG
The model critiques and improves its own responses using self-reflection and retrieval.

8/ Agentic RAG
Combines RAG with agentic behavior, allowing for more complex, multi-step problem-solving.


https://python.langchain.com/v0.1/docs/get_started/quickstart/

langchain quick start ^


https://python.langchain.com/docs/integrations/providers/ollama/

Ollama integrations ^

Tool calling:
https://ollama.com/blog/tool-support
https://python.langchain.com/docs/how_to/tool_calling/


- Easy example:
https://github.com/Shubhamsaboo/awesome-llm-apps/blob/main/llama3.1_local_rag/llama3.1_local_rag.py

In [8]:
import torch

# Print the PyTorch version
print(f"PyTorch version: {torch.__version__}")

# Check if CUDA (GPU support) is available
if torch.cuda.is_available():
    print("CUDA is available! GPU is ready to be used.")
    print(f"Number of GPUs available: {torch.cuda.device_count()}")
    print(f"Current GPU: {torch.cuda.get_device_name(torch.cuda.current_device())}")
else:
    print("CUDA is not available. GPU is not set up correctly.")

# Print additional GPU details
if torch.cuda.is_available():
    for i in range(torch.cuda.device_count()):
        print(f"GPU {i}: {torch.cuda.get_device_name(i)}")
        print(f"  - Total Memory: {torch.cuda.get_device_properties(i).total_memory / 1e9} GB")
        print(f"  - Compute Capability: {torch.cuda.get_device_capability(i)}")

if torch.cuda.is_available():
    # Create a random tensor and move it to the GPU
    tensor = torch.rand(3, 3).cuda()
    print("Tensor on GPU:", tensor)
else:
    print("GPU is not available, cannot move tensor to GPU.")


PyTorch version: 2.4.1+cu121
CUDA is available! GPU is ready to be used.
Number of GPUs available: 1
Current GPU: NVIDIA GeForce RTX 4090
GPU 0: NVIDIA GeForce RTX 4090
  - Total Memory: 25.756696576 GB
  - Compute Capability: (8, 9)
Tensor on GPU: tensor([[0.3281, 0.3802, 0.9160],
        [0.5067, 0.3017, 0.2199],
        [0.4328, 0.6402, 0.4089]], device='cuda:0')


In [9]:
import os
from dotenv import load_dotenv
from huggingface_hub import login
from langchain_anthropic import ChatAnthropic

# Print the current working directory (optional for debugging)
print(os.getcwd())

# Set the path to your .env file relative to the current working directory
dotenv_path = os.path.join(os.getcwd(), '../.env')
load_dotenv(dotenv_path)

# Load the API keys from environment variables
openai_api_key = os.getenv("OPENAI_API_KEY")
hf_token = os.getenv("HUGGINGFACE_API_KEY")
anthropic_token = os.getenv("ANTHROPIC_API_KEY")
tavily_token = os.getenv("TAVILY_API_KEY")
langsmith_token = os.getenv("LANGSMITH_API_KEY")
nomic_token = os.getenv("NOMIC_EMBEDDINGS_API_KEY")

# Set the Hugging Face token as an environment variable (if not already done)
if hf_token:
    os.environ["HUGGINGFACE_API_KEY"] = hf_token
if openai_api_key:
    os.environ["OPENAI_API_KEY"] = openai_api_key
if anthropic_token:
    os.environ["ANTHROPIC_API_KEY"] = anthropic_token
if tavily_token:
    os.environ["TAVILY_API_KEY"] = tavily_token
if langsmith_token:
    os.environ["LANGSMITH_API_KEY"] = langsmith_token  # Remove tavily_token dependency here
if nomic_token:
    os.environ["NOMIC_EMBEDDINGS_API_KEY"] = nomic_token

# Verify that the Nomic API key is loaded correctly
# print("Nomic API Key:", nomic_token)

# Set up LangChain tracing environment variables
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"


/workspaces/custom_ollama_docker/notebooks


# Ollama model set up

In [10]:
from langchain_community.chat_models import ChatOllama

local_llm = "llama3.2"

llm = ChatOllama(model=local_llm, temperature=0)


# LangChain Components

In [11]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

# Define prompt template
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a world-class technical documentation writer."),
    ("user", "{input}")
])

# Output parser to convert messages to strings
output_parser = StrOutputParser()

# Chain the components together
chain = prompt | llm | output_parser

# Example invocation
result = chain.invoke({"input": "Explain how LangSmith can help with testing."})
print(result)


**LangSmith: A Testing Framework for Efficient and Effective Test Automation**

LangSmith is an open-source, Python-based testing framework designed to simplify the process of writing and maintaining automated tests for software applications. By leveraging LangSmith, developers can significantly improve their test automation efforts, ensuring that their codebase is thoroughly tested and reliable.

**Key Features of LangSmith:**

1. **Simple and Intuitive API**: LangSmith provides a clean and easy-to-use API, making it simple to write tests without requiring extensive knowledge of testing frameworks.
2. **Support for Multiple Test Types**: LangSmith supports various test types, including unit tests, integration tests, and end-to-end tests, allowing developers to choose the best approach for their specific use case.
3. **Extensive Library of Pre-built Tests**: LangSmith comes with a library of pre-built tests that can be easily integrated into existing projects, reducing the time and eff

# Building Applications

Document and Data retrieval times:

When to use Crawl4AI:

    If your primary focus is web crawling and extracting structured data from complex websites, Crawl4AI would likely be the better choice.
    It is optimized for handling dynamic content, JavaScript-heavy sites, and offers useful features for crawling at scale.
    It integrates well with LLM workflows, making it easy to extract and preprocess content directly for AI tasks.

When to use LangChain:

    If your focus is on building multi-modal AI applications that require integrating web scraping with multiple data sources (e.g., databases, APIs, files), LangChain is the better choice.
    LangChain shines when you need to combine data sources and LLMs, but might not be ideal for complex web crawling out of the box.

In [12]:
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Load documents from the web
loader = WebBaseLoader("https://www.nba.com/")
docs = loader.load()

# Split documents into chunks
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(docs)
# Print sample document chunks
print("Sample split document chunks:", documents[:2])


Sample split document chunks: [Document(metadata={'source': 'https://www.nba.com/', 'title': 'The official site of the NBA for the latest NBA Scores, Stats & News. | NBA.com', 'description': 'The official site of the National Basketball Association. Follow the action on NBA scores, schedules, stats, news, Team and Player news.', 'language': 'en'}, page_content='The official site of the NBA for the latest NBA Scores, Stats & News. | NBA.com'), Document(metadata={'source': 'https://www.nba.com/', 'title': 'The official site of the NBA for the latest NBA Scores, Stats & News. | NBA.com', 'description': 'The official site of the National Basketball Association. Follow the action on NBA scores, schedules, stats, news, Team and Player news.', 'language': 'en'}, page_content="Navigation ToggleNBAHomeTicketsKey Dates2024-25 Regular SeasonEmirates NBA Cup ScheduleLeague Pass ScheduleNational TV GamesFeaturedNBA TVHomeTop Stories2024-25 Team Previews24 StorylinesTransactionsHall of Fame: Class o

In [13]:
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings

# 2. Create Ollama embeddings and vector store
# Create embeddings using Ollama's local model
embeddings = OllamaEmbeddings(model="llama3.2")

# Create a Chroma vector store from the split document chunks using the generated embeddings
vectorstore = Chroma.from_documents(documents=documents, embedding=embeddings)

# Create a retriever from the Chroma vector store for document retrieval based on input queries
retriever = vectorstore.as_retriever()

# Verify the retriever by checking sample results
sample_query = "What are some recent news updates in the NBA?"
sample_retrieval = retriever.invoke(sample_query)
print("Sample retrieved documents based on query:", sample_retrieval)

Sample retrieved documents based on query: [Document(metadata={'description': 'The official site of the National Basketball Association. Follow the action on NBA scores, schedules, stats, news, Team and Player news.', 'language': 'en', 'source': 'https://www.nba.com/', 'title': 'The official site of the NBA for the latest NBA Scores, Stats & News. | NBA.com'}, page_content='The official site of the NBA for the latest NBA Scores, Stats & News. | NBA.com'), Document(metadata={'description': 'The official site of the National Basketball Association. Follow the action on NBA scores, schedules, stats, news, Team and Player news.', 'language': 'en', 'source': 'https://www.nba.com/', 'title': 'The official site of the NBA for the latest NBA Scores, Stats & News. | NBA.com'}, page_content="era? Lakers unveil City Edition uniform and court Hawks-Heat postponed due to hurricane WNBA playoffs: Lynx advance to Finals Raptors' Barrett out for rest of preseason Marshall retiring as Mavs CEO at end o

In [14]:
# Import the necessary classes for formatting messages
from langchain_core.messages import HumanMessage

# Function to format and combine document content into a single context string
def combine_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Function to call Llama 3.2 model with the given question and combined context
def ollama_llm(question, context):
    # Format the prompt for Llama 3.2
    formatted_prompt = f"Question: {question}\n\nContext: {context}"
    
    # Convert the input into a list of BaseMessages (HumanMessage in this case)
    input_message = [HumanMessage(content=formatted_prompt)]

    # Call Llama 3.2 model via the Ollama chat interface
    response = llm.invoke(input_message)  # Pass the list of messages instead of a dictionary

    # Return the content from the response (use .content instead of ['content'])
    return response.content  # Access the content attribute of the AIMessage object

# Define the RAG chain: Retrieve relevant documents and generate a response
def rag_chain(question):
    # Retrieve relevant documents using the retriever
    retrieved_docs = retriever.invoke(question)
    # Combine the retrieved document chunks into a single context
    formatted_context = combine_docs(retrieved_docs)
    # Generate a response using the Llama 3.2 model
    return ollama_llm(question, formatted_context)

# Test RAG Chain with a sample question
test_question = "What are some recent NBA team standings and player performance updates?"
test_response = rag_chain(test_question)
print(f"Question: {test_question}")
print("Response:", test_response)


Question: What are some recent NBA team standings and player performance updates?
Response: Here are some recent NBA team standings and player performance updates:

**Team Standings:**

1. Boston Celtics (preseason standings)
2. Milwaukee Bucks (preseason standings)
3. Oklahoma City Thunder (preseason standings)
4. Cleveland Cavaliers (preseason standings)
5. Sacramento Kings (preseason standings)
6. Minnesota Timberwolves (preseason standings)
7. Atlanta Hawks (preseason standings)
8. Charlotte Hornets (preseason standings)
9. Miami Heat (preseason standings)
10. Toronto Raptors (preseason standings)

**Player Performance Updates:**

1. Zaccharie Risacher (Atlanta Hawks) - impressed in preseason debut with 18 points, 7-for-9 shooting, 3 rebounds, and 2 assists.
2. Stephen Curry (Golden State Warriors) - expected to lead the way for the Warriors in the 2024-25 season.
3. LeBron James (Los Angeles Lakers) - still delivering at a high level despite being in his 40s.
4. Kevin Durant (Broo

# Local llama3.2 Rag with Memory

Conversation Retrieval Chain:
Enhancement: Adds chat history to enable context-aware interactions.
Adjustments:

    Update retrieval to consider chat_history.
    Modify prompts to include conversation context.


https://python.langchain.com/v0.2/docs/tutorials/qa_chat_history/

In [15]:
# # Import necessary classes
# from langchain.prompts.chat import (
#     ChatPromptTemplate,
#     SystemMessagePromptTemplate,
#     HumanMessagePromptTemplate,
#     MessagesPlaceholder,
# )
# from langchain.schema import AIMessage, HumanMessage
# from langchain.memory import ConversationBufferMemory  # Import conversation memory class
# from langchain.chains import create_retrieval_chain, create_history_aware_retriever
# from langchain.chains.combine_documents import create_stuff_documents_chain

# # Define the conversation-aware prompt template
# conv_prompt = ChatPromptTemplate.from_messages([
#     SystemMessagePromptTemplate.from_template("You are a knowledgeable assistant on NBA topics."),
#     MessagesPlaceholder(variable_name="chat_history"),  # Placeholder for the chat history
#     SystemMessagePromptTemplate.from_template("Here is the relevant context:\n\n{context}"),
#     HumanMessagePromptTemplate.from_template("{input}")
# ])

# # Verify the input variables in the prompt template
# print("Prompt Input Variables:", conv_prompt.input_variables)

# # Create the document chain using the prompt template (This step might not be needed)
# document_chain = create_stuff_documents_chain(
#     llm=llm,
#     prompt=conv_prompt,
#     document_variable_name="context"
# )

# # Create a history-aware retriever using the model and retriever
# history_aware_retriever = create_history_aware_retriever(
#     llm=llm,
#     retriever=retriever,
#     prompt=conv_prompt
# )

# # Create a conversation buffer memory object to store chat history
# memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

# # Create the retrieval chain using only the history-aware retriever and memory
# retrieval_chain = create_retrieval_chain(
#     retriever=history_aware_retriever,
#     memory=memory  # Use ConversationBufferMemory for storing chat history
# )

# # Initialize an empty chat history
# chat_history = []

# # Function to add messages to chat history and maintain context
# def add_to_chat_history(question, response):
#     """Adds a question and response to the chat history."""
#     chat_history.extend([
#         HumanMessage(content=question),
#         AIMessage(content=response["answer"])  # Assuming response has 'answer' key for the LLM's response
#     ])

# # Define a sample question for the conversation
# question_1 = "Can you tell me about the best NBA teams in 2023?"
# # Invoke the retrieval chain with the first question
# response_1 = retrieval_chain.invoke({
#     "question": question_1,  # First question
#     "chat_history": chat_history  # Initial empty chat history
# })

# # Print the first response
# print("Response to Question 1:", response_1["answer"])

# # Add the first question and its response to chat history
# add_to_chat_history(question_1, response_1)

# # Define a follow-up question for the conversation
# question_2 = "How did the Boston Celtics perform during the 2023 playoffs?"
# # Invoke the retrieval chain with the follow-up question
# response_2 = retrieval_chain.invoke({
#     "question": question_2,
#     "chat_history": chat_history  # Pass updated chat history
# })

# # Print the second response
# print("Response to Question 2:", response_2["answer"])

# # Add the second question and its response to chat history
# add_to_chat_history(question_2, response_2)

# # Define another follow-up question to demonstrate chat history usage
# question_3 = "Who was the top performer for the Celtics in the 2023 playoffs?"
# response_3 = retrieval_chain.invoke({
#     "question": question_3,
#     "chat_history": chat_history  # Pass the entire chat history so far
# })

# # Print the third response
# print("Response to Question 3:", response_3["answer"])

# # Add the third question and its response to chat history
# add_to_chat_history(question_3, response_3)


# Agents with Tool

    Agents with Tool Selection:
        Agents can choose appropriate tools (e.g., retrieval, search) to answer queries.
        Enhances application flexibility and capability.

    Search Integration:
        Tavily Search Tool: Provides up-to-date information from the web.
        Integration:

https://python.langchain.com/docs/integrations/retrievers/tavily/

In [16]:
# Import Tavily Search Retriever
from langchain_community.retrievers import TavilySearchAPIRetriever
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# Initialize Tavily retriever to retrieve 3 search results
tavily_retriever = TavilySearchAPIRetriever(k=3)

# Modify prompt to restrict LLM from making up information
prompt = ChatPromptTemplate.from_template(
    """Answer the question based only on the context provided. Do not make up additional information.
    
    Context: {context}

    Question: {question}"""
)

# Format retrieved documents with URLs
def format_docs(docs):
    for i, doc in enumerate(docs):
        print(f"Retrieved Document {i+1}:\n{doc.page_content}\nSource: {doc.metadata.get('source')}")
    return "\n\n".join(f"Source {i+1} (URL: {doc.metadata.get('source')}):\n{doc.page_content}" for i, doc in enumerate(docs))

# Define chain with Tavily search integration
search_chain = (
    {"context": tavily_retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Test search chain with a question
search_response = search_chain.invoke("Who was the number 1 pick in the 2024 NBA Draft?")
print("Search Response:", search_response)

Retrieved Document 1:
Zaccharie Risacher: Who is the No. 1 overall pick in the 2024 NBA Draft? | CNN Football Tennis Golf Motorsport US Sports Olympics Climbing Esports Hockey CNN10 About CNN Zaccharie Risacher was drafted with the No. 1 overall pick by the Atlanta Hawks in the 2024 NBA Draft. Risacher was selected by the Atlanta Hawks with the first overall pick in the 2024 NBA Draft on Wednesday, making history in the process. According to the NBA’s official draft profile on Risacher, although he has positives to his game, the teenager is “unlikely to be an instant-impact player in Year 1, but down the road his shooting and decision-making could make him a valuable rotation piece.”
Source: https://www.cnn.com/2024/06/27/sport/zaccharie-risacher-atlanta-hawks-nba-draft-spt-intl/index.html
Retrieved Document 2:
The full 2024 NBA Draft order has been set. The Atlanta Hawks overcame long odds and won the draft's No. 1 pick on Sunday in the Draft Lottery.The Hawks, who finished a 36-46 re

LangSmith

    Role: Assists in tracing, debugging, and understanding the internal workings of LangChain applications.
    Features:
        Visualize and inspect chain executions.
        Monitor multiple LLM calls within complex applications.
    Setup Reminder: Ensure LANGCHAIN_TRACING_V2 and LANGCHAIN_API_KEY are set.

# Streamlit Local Agent Attempt:

In [17]:
%%writefile ../src/llama3_tavily_app.py

import streamlit as st
from langchain_community.retrievers import TavilySearchAPIRetriever
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_community.chat_models import ChatOllama
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from dotenv import load_dotenv
import os

# Load environment variables
dotenv_path = os.path.join(os.getcwd(), '.env')
load_dotenv(dotenv_path)

# Set up API keys
tavily_token = os.getenv("TAVILY_API_KEY")

# Set up the local LLM model
local_llm = "llama3.2"
llm = ChatOllama(model=local_llm, temperature=0)

# Function to retrieve documents from NBA website and create retriever
@st.cache_resource
def setup_retriever():
    loader = WebBaseLoader("https://www.nba.com/")
    docs = loader.load()
    text_splitter = RecursiveCharacterTextSplitter()
    documents = text_splitter.split_documents(docs)
    embeddings = OllamaEmbeddings(model=local_llm)
    vectorstore = Chroma.from_documents(documents=documents, embedding=embeddings)
    return vectorstore.as_retriever()

# Function to perform RAG-based responses
def rag_chain(question, retriever):
    retrieved_docs = retriever.invoke(question)
    formatted_context = "\n\n".join([doc.page_content for doc in retrieved_docs])
    return llm.invoke([{"role": "user", "content": f"Question: {question}\n\nContext: {formatted_context}"}]).content

# Set up Tavily Search Retriever
tavily_retriever = TavilySearchAPIRetriever(k=3)

# Set up a search chain with Tavily integration
prompt_template = ChatPromptTemplate.from_template(
    """Answer the question based only on the context provided. Do not make up additional information.
    
    Context: {context}

    Question: {question}"""
)
search_chain = (
    {"context": tavily_retriever | (lambda docs: "\n\n".join([f"Source {i+1} (URL: {doc.metadata.get('source')}):\n{doc.page_content}" for i, doc in enumerate(docs)])), 
     "question": RunnablePassthrough()}
    | prompt_template
    | llm
    | StrOutputParser()
)

# Streamlit UI
st.title("Web-Integrated AI Assistant")
st.write("This assistant can respond to your queries using information retrieved from the web.")

# Input field for user question
user_question = st.text_input("Ask a question:", "Who was the number 1 pick in the 2024 NBA Draft?")

# Buttons to select RAG-based or Tavily-based responses
if st.button("RAG Response"):
    retriever = setup_retriever()
    response = rag_chain(user_question, retriever)
    st.subheader("RAG-based Response")
    st.write(response)

if st.button("Tavily Search Response"):
    search_response = search_chain.invoke(user_question)
    st.subheader("Tavily Search-based Response")
    st.write(search_response)


Writing ../src/llama3_tavily_app.py
