In [1]:
### Env

import os
os.environ['LANGCHAIN_TRACING_V2'] = 'true' # enables tracing 
os.environ["LANGCHAIN_API_KEY"] = "xxx"
os.environ["LANGCHAIN_PROJECT"] = "RAG-feedback-and-few-shot"

### Creating a RAG bot

In [2]:
### Index
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

urls = [
    "https://lilianweng.github.io/posts/2023-06-23-agent/",
    "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
    "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
]

docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=250, chunk_overlap=0
)
doc_splits = text_splitter.split_documents(docs_list)

# Add to vectorDB
vectorstore = Chroma.from_documents(
    documents=doc_splits,
    collection_name="rag-chroma",
    embedding=OpenAIEmbeddings(),
)
retriever = vectorstore.as_retriever(k=4)

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [3]:
### RAG bot

import openai
from langsmith import traceable
from langsmith.wrappers import wrap_openai

class RagBot:

    def __init__(self, retriever, model: str = "gpt-4o"):
        self._retriever = retriever
        # Wrapping the client instruments the LLM
        self._client = wrap_openai(openai.Client())
        self._model = model

    @traceable()
    def retrieve_docs(self, question):
        return self._retriever.invoke(question)

    @traceable()
    def invoke_llm(self, question, docs):
        response = self._client.chat.completions.create(
            model=self._model,
            messages=[
                {
                    "role": "system",
                    "content": "You are a helpful AI assistant for question answering."
                    " Use the following docs to answer the user question.\n\n"
                    f"## Docs\n\n{docs}",
                },
                {"role": "user", "content": question},
            ],
        )

        # Evaluators will expect "answer" and "contexts"
        return {
            "answer": response.choices[0].message.content,
            "contexts": [str(doc.page_content) for doc in docs],
        }

    @traceable()
    def get_answer(self, question: str):
        docs = self.retrieve_docs(question)
        return self.invoke_llm(question, docs)

rag_bot = RagBot(retriever)

In [11]:
response = rag_bot.get_answer("How does ReAct agent work?")
response["answer"][:150]

'The ReAct (Reasoning and Acting) agent integrates reasoning and acting capabilities within Large Language Models (LLMs) by extending the action space '

### Setting an online evaluator

Now, we see traces logged to our project.

We can add an evaluator to our project.

Let's do document grading, which is a very useful check! 

https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_crag.ipynb

Grade documents:

https://docs.smith.langchain.com/tutorials/Developers/rag#evaluator
 
`Recall:`
```
You are a teacher grading a quiz. 

You will be given: 

1/ a QUESTION 

2/ a set of 4 comma separated FACTS provided by the student

You are grading RELEVANCE RECALL. 

A score of 1 means that ANY of the FACTS are relevant to the QUESTION. 

A score of 0 means that NONE of the FACTS are relevant to the QUESTION. 

1 is the highest (best) score. 0 is the lowest score you can give. 

Explain your reasoning in a step-by-step manner. Ensure your reasoning and conclusion are correct. Avoid simply stating the correct answer at the outset.

Here are some examples to guide your grading:

{{Few-shot examples}}
```

And precision:

```
You are a teacher grading a quiz.

You will be given: 

1/ a QUESTION

2/ a set of comma separated FACTS provided by the student 

You are grading RELEVANCE PRECISION. 

Consider each of the comma separated facts.

A score of 1 means that ALL of the FACTS are relevant to the QUESTION. 

A score of 0 means that ANY of the FACTS are not relevant to the QUESTION.

1 is the highest (best) score.

0 is the lowest score you can give.

Explain your reasoning in a step-by-step manner. 

Ensure your reasoning and conclusion are correct. 

Avoid simply stating the correct answer at the outset.

Here are some examples to guide your grading:

{{Few-shot examples}}
```

In [10]:
response = rag_bot.get_answer("What is the difference between ReAct and Reflexion approaches for self-reflection?")
response["answer"]

'Both ReAct and Reflexion are approaches designed to enhance the capabilities of autonomous agents, particularly in reasoning and decision-making tasks, but they differ in their mechanisms and focuses for self-reflection:\n\n### ReAct (Reason + Act)\n1. **Integration of Reasoning and Acting:** ReAct combines reasoning and acting within a large language model (LLM) by extending the action space to include both task-specific actions and natural language.\n2. **Prompting Structure:** It follows a specific structure for the LLM to think and act, typically using a repeated sequence of "Thought: … Action: … Observation: …".\n3. **Self-Reflection by Design:** Although ReAct suggests including reasoning steps, it doesn\'t explicitly describe a structured framework for self-correcting past actions in the way Reflexion does.\n\n### Reflexion\n1. **Dynamic Memory and Self-Reflection:** Reflexion equips agents with dynamic memory and self-reflection capabilities to iteratively improve their reason

In [9]:
response = rag_bot.get_answer("What are the types of LLM memory?")
response["answer"]

'The types of memory in LLM-powered autonomous agents, as described by Lilian Weng, are derived from the concepts of human memory. Here are the types highlighted:\n\n1. **Sensory Memory**: The earliest stage of memory, which retains impressions of sensory information (visual, auditory, etc.) after the original stimuli have ended. Sensory memory typically lasts for only a few seconds and includes subcategories such as:\n   - **Iconic Memory**: Visual sensory memory.\n   - **Echoic Memory**: Auditory sensory memory.\n   - **Haptic Memory**: Tactile sensory memory.\n\nThe idea is to draw a parallel between human memory processes and the memory components used in LLM-powered systems, though the practical implementations may differ.'

In [8]:
response = rag_bot.get_answer("What is the Memory and Retrieval model in Generative Agents simulation?")
response["answer"]

"In Generative Agents simulation, the memory and retrieval model are key components that enable the agent to behave in a human-like manner by utilizing past experiences. Here's a detailed breakdown:\n\n### Memory:\n1. **Memory Stream**: This is a long-term memory module that acts as an external database, recording a comprehensive list of the agent's experiences in natural language. It allows the agent to have an extensive history to reference when making decisions.\n\n2. **Types of Memory**:\n   - **Short-term Memory**: Utilized for in-context learning, enabling the model to handle immediate tasks and interactions.\n   - **Long-term Memory**: Retains and recalls information over extended periods. This is achieved by leveraging an external vector store for fast retrieval, providing the agent with a considerable amount of stored data.\n\n### Retrieval Model:\n1. **Context Surfacing**: The retrieval model surfaces relevant context to inform the agent's behavior. It considers three primary