- pip install langchain_groq
- pip install langchain_huggingface
- pip install sentence-transformers
- pip install onnxruntime


## **Step 2: Load Environment Variables**

In [1]:
import os
from dotenv import load_dotenv
load_dotenv()

# Fetch the API keys from environment variables
groq_api_key = os.getenv("GROQ_API_KEY")
hf_token = os.getenv("HF_TOKEN")


## **Step 3: Initialize the Language Model (LLM)**

In [2]:
from langchain_groq import ChatGroq

# Initialize the Groq language model
llm = ChatGroq(groq_api_key=groq_api_key, model_name="Llama3-8b-8192")
llm


ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x00000182283CA950>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x000001822875D990>, model_name='Llama3-8b-8192', model_kwargs={}, groq_api_key=SecretStr('**********'))

## **Step 4: Set Up Hugging Face Embeddings for Text Representation**

Install and configure Hugging Face embeddings using sentence-transformers to convert text into vectors.

In [3]:
from langchain_huggingface import HuggingFaceEmbeddings

# Create HuggingFace embeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")


  from .autonotebook import tqdm as notebook_tqdm


## **Step 5: Load and Process Web Content**

You can scrape content from a web page and process it into chunks for indexing.

In [7]:
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Web scraping to load content from a blog
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)

# Load documents from the web
docs = loader.load()

# Split the content into smaller chunks for indexing
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

# Now 'splits' will contain the documents in smaller chunks ready for embedding


USER_AGENT environment variable not set, consider setting it to identify your requests.


## **Step 6: Set Up Chroma for Vector Storage**
Use Chroma to store and retrieve document embeddings.

In [8]:
from langchain.vectorstores import Chroma

# Chroma is used to store the embeddings for retrieval
vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings)
retriever = vectorstore.as_retriever()
retriever


VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x0000018244153E50>, search_kwargs={})

## **Step 7:ChatPromptTemplate**

In [None]:
from langchain.chains import create_retrieval_chain
from langchain_core.prompts import ChatPromptTemplate

# Define the system prompt
system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

# Create the chat prompt template
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

## **Step 8: Create a Retrieval Chain for Q&A**

Now that the content is indexed, we create a chain for querying the information.

In [None]:
# The `create_stuff_documents_chain` function combines a set of documents (usually retrieved by a retriever) 
# into a prompt to generate an answer.Mainly used when we need to combine multiple documents.
from langchain.chains.combine_documents import create_stuff_documents_chain


In [None]:
# The `create_stuff_documents_chain` function is used to create a chain that generates answers by combining the retrieved documents' context and the 
# language model (LLM).It takes two parameters: 1. `llm`: The language model and 2. `prompt`: A `ChatPromptTemplate` that defines the structure of the 
# prompt used by the LLM.The `question_answer_chain` will now contain a callable chain that combines the context from retrieved 
# documents (passed as part of the prompt) and processes it using the language model to generate answers.
question_answer_chain = create_stuff_documents_chain(llm, prompt)


In [None]:
# The `create_retrieval_chain` function is used to create a Retrieval-Augmented Generation (RAG) chain.It takes two parameters:
# 1. `retriever`: The retriever is responsible for retrieving relevant documents from the data store based on the input query.
# 2. `question_answer_chain`: This is the chain that will generate an answer based on the retrieved documents (from the `retriever`).

# The `rag_chain` is a combination of a retriever and a question-answering chain, where:
# - The retriever first searches for the most relevant documents.
# - These documents are then passed into the `question_answer_chain`, which processes them and generates an answer.
# 
# This creates a pipeline where the system retrieves relevant information before generating an answer, making it more contextually aware.
rag_chain = create_retrieval_chain(retriever, question_answer_chain)


In [16]:
# Step 3: Query the system with a question
response = rag_chain.invoke({"input": "What is Self-Reflection"})

# Step 4: Retrieve and print the answer from the response
print(response['answer'])

Self-Reflection is a mechanism in the Reflexion framework that allows autonomous agents to improve iteratively by refining past action decisions and correcting previous mistakes. It plays a crucial role in real-world tasks where trial and error are inevitable.


## **Step 10: Manage Session-Based Chat History**

We can store chat history by session ID, which helps to maintain a session for ongoing interactions.

In [24]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

store = {}


def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

In [39]:
# The `conversational_rag_chain.invoke()` function is used to execute the conversational retrieval-augmented generation (RAG) chain.It takes two main arguments:
# 1. `{"input": "What is Task Decomposition?"}`: The input query that the user asks. In this case, it's asking about "Task Decomposition".
# 2. `config={ "configurable": {"session_id": "abc123"} }`: The session ID is used to store and retrieve the chat history specific to this session. 
#    This allows for a personalized conversation where the context (previous conversation) is remembered between queries.
#
# The `session_id` ensures that the conversation can continue with the correct context, making it "conversational."
# The response will contain the answer generated based on the current query and the context of the previous chat history stored under the session ID "abc123".

response = conversational_rag_chain.invoke(
    {"input": "What is Task Decomposition?"},
    config={
        "configurable": {"session_id": "abc123"}  # This ensures the session is tracked and history is maintained
    },
)["answer"]


In [48]:
conversational_rag_chain.invoke(
    {"input": "What is Task Decomposition?"},
    config={"configurable": {"session_id": "abc123"}},
)["answer"]

'Task Decomposition is a technique used to break down complex tasks into smaller, more manageable steps. This is achieved by instructing a model to think step by step, allowing it to utilize more test-time computation to decompose hard tasks into simpler steps.'

In [52]:
conversational_rag_chain.invoke(
    {"input": "What are common ways of doing it?"},
    config={"configurable": {"session_id": "abc1"}},
)["answer"]

'I don\'t know what "it" refers to. Could you please clarify or provide more context?'

In [None]:
conversational_rag_chain.invoke(
    {"input": "What are the Types of Memory"},
    config={
        "configurable": {"session_id": "chat 1"}
    },  # constructs a key "abc123" in `store`.
)["answer"]