# RAG con LangGraph

Aiuta a risolvere i problemi di :
- hallucination
- lack of consistency
- mancanza di informazioni recenti

Consiste nell'utilizzare una knowledge base per supportare la risposta del modello alla domanda che è stata posta.



Un agent è caratterizzato da:
- è in grado di pianificare (planning): quindi è in grado di suddividere un task in sotto-task
- è dotato di memoria
- è in grado di utilizzare tools (strumenti esterni che consentono al modello di interagire col mondo esterno)


Notebook tratto da

https://www.youtube.com/watch?v=0i9NzY_b3pg

L'implementazione proposta NON utilizza un react agent 
![image.png](attachment:image.png)

ma utilizza LangGraph che consente un maggior controllo sull'esecuzione del processo
![image-2.png](attachment:image-2.png)

In questo modo si delega all'LLM compiti più specifici delegando il controllo di flusso al framework. Il flusso del processo è quindi definito in anticipo e questo consente di evitare i problemi che possono essere presenti nel delegare tutto il controllo del processo al llm. Questo vale a maggior ragione per modelli meno "potenti"

![image-3.png](attachment:image-3.png)

Installato Ollama secondo le istruzioni presenti sul sito per Ubuntu

curl -fsSL https://ollama.com/install.sh | sh


In [10]:
#pip install langchain langchain_community langchain-nomic "nomic[local]" langchain-ollama scikit-learn langgraph tavily-python bs4

In [11]:
#!ollama pull llama3.2:3b-instruct-fp16 

Per web search ho creato un account gratuito a Tavily (https://app.tavily.com/home) e ho creato l'api key

In [12]:
import os
from dotenv import load_dotenv

load_dotenv()
os.environ["TAVILY_API_KEY"]

'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'

In [13]:
### LLM
from langchain_ollama import ChatOllama
local_llm = 'llama3.2:3b-instruct-fp16'

# Carico il modello Llama3.2 sia in modalità "standard" che in modalità "json"
#
llm = ChatOllama(model=local_llm, temperature=0)
#
llm_json_mode = ChatOllama(model=local_llm, temperature=0, format='json')

Definisco il vectore store dove memorizzare i miei documenti locali dopo aver creato gli embeddings

In [14]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import SKLearnVectorStore
from langchain_community.vectorstores import Chroma
from langchain_nomic.embeddings import NomicEmbeddings

urls = [
    "https://lilianweng.github.io/posts/2023-06-23-agent/",
    "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
    "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
]

# Load documents
docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]

# Split documents
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1000, chunk_overlap=200
)
doc_splits = text_splitter.split_documents(docs_list)

# Add to vectorDB
vectorstore = SKLearnVectorStore.from_documents(
    documents=doc_splits,
    embedding=NomicEmbeddings(model="nomic-embed-text-v1.5", inference_mode="local"),
)

#vectorstore = Chroma.from_documents(
#    documents=doc_splits,
#    collection_name="rag-chroma1"
#    embedding=NomicEmbeddings(model="nomic-embed-text-v1.5", inference_mode="local"),
#)

# Create retriever
retriever = vectorstore.as_retriever(k=3)

Test di estrazione dei documenti dal vector store. L'invocazione restituisce i 3 documenti (k=3) maggiormente attinenti alla domanda in input

In [16]:
retriever.invoke("Agent Memory")


[Document(metadata={'id': '9d103475-98ba-4ec2-a6ea-b5bed1e512c4', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': "LLM Powered Autonomous Agents | Lil'Log", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}, page_content='Reflection mechanism: synthesizes memories into higher level inferences over time and guides the agent’s future behavior. They are higher-level summaries of past events (<- note that this is a bit different from self-reflection above)\n\nPrompt LM with 100 most recent obse

## Router step

Definisco lo step che si occupa di decidere se instradare la richiesta verso l'estrazione dal vectore store o verso la ricerca web<br>
La scelta viene effettuata dall'LLM sulla base:
- delle informazioni presenti all'interno del prompt riguardo le possibili opzioni
- della domanda fornita in input

Utilizzo LLM con output in JSON

In [17]:
### Router
import json
from langchain_core.messages import HumanMessage, SystemMessage

# Prompt 
router_instructions = """You are an expert at routing a user question to a vectorstore or web search.

The vectorstore contains documents related to agents, prompt engineering, and adversarial attacks.
                                    
Use the vectorstore for questions on these topics. For all else, and especially for current events, use web-search.

Return JSON with single key, datasource, that is 'websearch' or 'vectorstore' depending on the question."""

# Test del router
# Definisco 3 possibili domande per verificarne il corretto instradamento

test_web_search = llm_json_mode.invoke([SystemMessage(content=router_instructions)] +
                                       [HumanMessage(content="Who is favored to win the NFC Championship game in the 2024 season?")])
test_web_search_2 = llm_json_mode.invoke([SystemMessage(content=router_instructions)]
                                         + [HumanMessage(content="What are the models released today for llama3.2?")])
test_vector_store = llm_json_mode.invoke([SystemMessage(content=router_instructions)]
                                         + [HumanMessage(content="What are the types of agent memory?")])

print(json.loads(test_web_search.content),
      json.loads(test_web_search_2.content),
      json.loads(test_vector_store.content))


{'datasource': 'websearch'} {'datasource': 'websearch'} {'datasource': 'vectorstore'}


## Retrieval Grader step

Definisco lo step che si occupa di definire se il documento estratto dal vectore store è rilevante per la domanda di input.<br>

Utilizzo LLM con output in JSON

In [21]:
### Retrieval Grader 

# Doc grader instructions 
doc_grader_instructions = """You are a grader assessing relevance of a retrieved document to a user question.

If the document contains keyword(s) or semantic meaning related to the question, grade it as relevant."""

# Grader prompt
doc_grader_prompt = """Here is the retrieved document: \n\n {document} \n\n Here is the user question: \n\n {question}. 

This carefully and objectively assess whether the document contains at least some information that is relevant to the question.

Return JSON with single key, binary_score, that is 'yes' or 'no' score to indicate whether the document contains at least some information that is relevant to the question."""


# Test del funzionamento del Doc Grader
question = "What is Chain of thought prompting?"
#question = "What is the recipe of pizza?"

# Estraggo dal vectore store i documenti relativi alla domanda
docs = retriever.invoke(question)

# estraggi il testo contenuto in uno dei documenti estratti
doc_txt = docs[1].page_content
print(doc_txt)

# Valorizzo il prompt passando il testo e la domanda
doc_grader_prompt_formatted = doc_grader_prompt.format(document=doc_txt, question=question)

# Invoco il modeddo passando SystemMessage e HumanMessage
result = llm_json_mode.invoke([SystemMessage(content=doc_grader_instructions)] + 
                              [HumanMessage(content=doc_grader_prompt_formatted)])

# Visualizzo il risultato in formato JSON
json.loads(result.content)

For example to produce education materials for kids,

Describe what is quantum physics to a 6-year-old.

And safe content,

... in language that is safe for work.
In-context instruction learning (Ye et al. 2023) combines few-shot learning with instruction prompting. It incorporates multiple demonstration examples across different tasks in the prompt, each demonstration consisting of instruction, task input and output. Note that their experiments were only on classification tasks and the instruction prompt contains all label options.
Definition: Determine the speaker of the dialogue, "agent" or "customer".
Input: I have successfully booked your tickets.
Ouput: agent

Definition: Determine which category the question asks for, "Quantity" or "Location".
Input: What's the oldest building in US?
Ouput: Location

Definition: Classify the sentiment of the given movie review, "positive" or "negative".
Input: i'll bet the video game is a lot more fun than the film.
Output:
Self-Consistency Samp

{'binary_score': 'yes'}

## Generate Step

Definisco lo step in cui utilizzo l'LLM per fornire la risposta testuale alla domanda in input, ricevendo all'interno del prompt i documenti estratti dal vectore store

In [22]:
## Generate

# Prompt
rag_prompt = """You are an assistant for question-answering tasks. 

Here is the context to use to answer the question:

{context} 

Think carefully about the above context. 

Now, review the user question:

{question}

Provide an answer to this questions using only the above context. 

Use three sentences maximum and keep the answer concise.

Answer:"""

# Post-processing
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Test del funzionamento del generator
docs = retriever.invoke(question)

# Formatto i documenti estratti dal retriever prima di passarli nel context
docs_txt = format_docs(docs)

# Creo il promt che verrà inserito nel HumanMessage
rag_prompt_formatted = rag_prompt.format(context=docs_txt, question=question)

# invoco LLM per ottenere la risposta 
generation = llm.invoke([HumanMessage(content=rag_prompt_formatted)])

print(generation.content)


Chain of Thought (CoT) prompting is a technique used in natural language processing to generate human-like responses by iteratively asking questions and refining the search space through external search queries, such as Wikipedia APIs. CoT prompting involves decomposing problems into multiple thought steps, generating multiple thoughts per step, and evaluating each state using a classifier or majority vote. The goal is to find an optimal instruction that leads to the desired output, which can be achieved by optimizing prompt parameters directly on the embedding space via gradient descent or searching over a pool of model-generated instruction candidates.


## Search tool

Definisco il tool da utilizzare per la ricerca su web

In [23]:
### Search
from langchain_community.tools.tavily_search import TavilySearchResults
web_search_tool = TavilySearchResults(k=3)
web_search_tool.invoke("LLM models")

[{'url': 'https://arxiv.org/abs/2307.06435',
  'content': 'A Comprehensive Overview of Large Language Models. Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks and beyond. This success of LLMs has led to a large influx of research contributions in this direction. These works encompass diverse topics such as architectural innovations'},
 {'url': 'https://www.geeksforgeeks.org/top-20-llm-models/',
  'content': 'Large Language Model commonly known as an LLM, refers to a neural network equipped with billions of parameters and trained extensively on extensive datasets of unlabeled text.This training typically involves self-supervised or semi-supervised learning techniques. In this article, we explore about Top 20 LLM Models and get to know how each model has distinct features and applications.'},
 {'url': 'https://www.ibm.com/topics/large-language-models',
  'content': 'AI governance and traceability are also fundamental asp

## Hallucination Grader step

Definisco lo step che si occupa di definire se la risposta fornita è attinente ai documenti che la supportano.

Utilizzo LLM con output in JSON

In [24]:
### Hallucination Grader 

# Hallucination grader instructions 
hallucination_grader_instructions = """

You are a teacher grading a quiz. 

You will be given FACTS and a STUDENT ANSWER. 

Here is the grade criteria to follow:

(1) Ensure the STUDENT ANSWER is grounded in the FACTS. 

(2) Ensure the STUDENT ANSWER does not contain "hallucinated" information outside the scope of the FACTS.

Score:

A score of yes means that the student's answer meets all of the criteria. This is the highest (best) score. 

A score of no means that the student's answer does not meet all of the criteria. This is the lowest possible score you can give.

Explain your reasoning in a step-by-step manner to ensure your reasoning and conclusion are correct. 

Avoid simply stating the correct answer at the outset."""

# Grader prompt
hallucination_grader_prompt = """FACTS: \n\n {documents} \n\n STUDENT ANSWER: {generation}. 

Return JSON with two two keys, binary_score is 'yes' or 'no' score to indicate whether the STUDENT ANSWER is grounded in the FACTS. And a key, explanation, that contains an explanation of the score."""

# Test using documents and generation from above 
hallucination_grader_prompt_formatted = hallucination_grader_prompt.format(documents=docs_txt, generation=generation.content)
#
result = llm_json_mode.invoke([SystemMessage(content=hallucination_grader_instructions)] + [HumanMessage(content=hallucination_grader_prompt_formatted)])

json.loads(result.content)

{'binary_score': 'yes',
 'explanation': 'The student answer provides a clear and accurate description of Chain of Thought (CoT) prompting, its components, and its goals. It also mentions various techniques used in CoT prompting, such as external search queries, prompt tuning, and automatic prompt engineering. The answer demonstrates an understanding of the concept and its applications in natural language processing.'}

## Answer Grader step

Definisco lo step che si occupa di definire se la risposta fornita è attinente  alla domanda di input.<br>

Utilizzo LLM con output in JSON

In [25]:
### Answer Grader 

# Answer grader instructions 
answer_grader_instructions = """You are a teacher grading a quiz. 

You will be given a QUESTION and a STUDENT ANSWER. 

Here is the grade criteria to follow:

(1) The STUDENT ANSWER helps to answer the QUESTION

Score:

A score of yes means that the student's answer meets all of the criteria. This is the highest (best) score. 

The student can receive a score of yes if the answer contains extra information that is not explicitly asked for in the question.

A score of no means that the student's answer does not meet all of the criteria. This is the lowest possible score you can give.

Explain your reasoning in a step-by-step manner to ensure your reasoning and conclusion are correct. 

Avoid simply stating the correct answer at the outset."""

# Grader prompt
answer_grader_prompt = """QUESTION: \n\n {question} \n\n STUDENT ANSWER: {generation}. 

Return JSON with two two keys, binary_score is 'yes' or 'no' score to indicate whether the STUDENT ANSWER meets the criteria. And a key, explanation, that contains an explanation of the score."""

# Test 
#question = "What are the vision models released today as part of Llama 3.2?"
question = "Who the best football players in Italy?"
answer = "The Llama 3.2 models released today include two vision models: Llama 3.2 11B Vision Instruct and Llama 3.2 90B Vision Instruct, which are available on Azure AI Model Catalog via managed compute. These models are part of Meta's first foray into multimodal AI and rival closed models like Anthropic's Claude 3 Haiku and OpenAI's GPT-4o mini in visual reasoning. They replace the older text-only Llama 3.1 models."

# Test using question and generation from above 
answer_grader_prompt_formatted = answer_grader_prompt.format(question=question, generation=answer)
result = llm_json_mode.invoke([SystemMessage(content=answer_grader_instructions)] + [HumanMessage(content=answer_grader_prompt_formatted)])
json.loads(result.content)


{'binary_score': 'no',
 'explanation': "The student's answer does not meet the criteria for answering the question about the best football players in Italy. The answer provided is unrelated to football and appears to be discussing AI models, specifically Llama 3.2 models released by Meta. There is no mention of Italian football players or teams. The answer does not help to answer the question at all."}

## Graph state
The graph state schema contains keys that we want to:

Pass to each node in our graph
Optionally, modify in each node of our graph

In pratica lo stato contiene le informazioni che si vogliono rendere persistenti nel corso dell'esecuzione del graph

In [26]:
import operator
from typing_extensions import TypedDict
from typing import List, Annotated

class GraphState(TypedDict):
    """
    Graph state is a dictionary that contains information we want to propagate to, and modify in, each graph node.
    """
    question : str # User question
    generation : str # LLM generation
    web_search : str # Binary decision to run web search
    max_retries : int # Max number of retries for answer generation 
    answers : int # Number of answers generated
    loop_step: Annotated[int, operator.add] 
    documents : List[str] # List of retrieved documents

Each node in our graph is simply a function that:

(1) Take state as an input

(2) Modifies state

(3) Write the modified state to the state schema (dict)

See conceptual docs here.

Each edge routes between nodes in the graph.

## Definizione di NODES e EDGES

Ognuno degli steo precedentemente creati deve essere definito come una funzione.<br>
Ognuna di queste funzioni rappresente un nodo all'interno del graph.<br>
Questi nodi ricevono in input uno stato e lo modificano in qualche modo

In [27]:
from langchain.schema import Document
from langgraph.graph import END

### Nodes
def retrieve(state):
    """
    Retrieve documents from vectorstore

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, documents, that contains retrieved documents
    """
    print("---RETRIEVE---")
    question = state["question"]

    # Write retrieved documents to documents key in state
    documents = retriever.invoke(question)
    return {"documents": documents}

def generate(state):
    """
    Generate answer using RAG on retrieved documents

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, generation, that contains LLM generation
    """
    print("---GENERATE---")
    question = state["question"]
    documents = state["documents"]
    loop_step = state.get("loop_step", 0)
    
    # RAG generation
    docs_txt = format_docs(documents)
    rag_prompt_formatted = rag_prompt.format(context=docs_txt, question=question)
    generation = llm.invoke([HumanMessage(content=rag_prompt_formatted)])
    return {"generation": generation, "loop_step": loop_step+1}

def grade_documents(state):
    """
    Determines whether the retrieved documents are relevant to the question
    If any document is not relevant, we will set a flag to run web search

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Filtered out irrelevant documents and updated web_search state
    """

    print("---CHECK DOCUMENT RELEVANCE TO QUESTION---")
    question = state["question"]
    documents = state["documents"]
    
    # Score each doc
    filtered_docs = []
    web_search = "No" 
    for d in documents:
        doc_grader_prompt_formatted = doc_grader_prompt.format(document=d.page_content, question=question)
        result = llm_json_mode.invoke([SystemMessage(content=doc_grader_instructions)] + [HumanMessage(content=doc_grader_prompt_formatted)])
        grade = json.loads(result.content)['binary_score']
        # Document relevant
        if grade.lower() == "yes":
            print("---GRADE: DOCUMENT RELEVANT---")
            filtered_docs.append(d)
        # Document not relevant
        else:
            print("---GRADE: DOCUMENT NOT RELEVANT---")
            # We do not include the document in filtered_docs
            # We set a flag to indicate that we want to run web search
            web_search = "Yes"
            continue
    return {"documents": filtered_docs, "web_search": web_search}
    
def web_search(state):
    """
    Web search based based on the question

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Appended web results to documents
    """

    print("---WEB SEARCH---")
    question = state["question"]
    documents = state.get("documents", [])

    # Web search
    docs = web_search_tool.invoke({"query": question})
    web_results = "\n".join([d["content"] for d in docs])
    web_results = Document(page_content=web_results)
    documents.append(web_results)
    return {"documents": documents}

### Edges

def route_question(state):
    """
    Route question to web search or RAG 

    Args:
        state (dict): The current graph state

    Returns:
        str: Next node to call
    """

    print("---ROUTE QUESTION---")
    route_question = llm_json_mode.invoke([SystemMessage(content=router_instructions)] + [HumanMessage(content=state["question"])])
    source = json.loads(route_question.content)['datasource']
    if source == 'websearch':
        print("---ROUTE QUESTION TO WEB SEARCH---")
        return "websearch"
    elif source == 'vectorstore':
        print("---ROUTE QUESTION TO RAG---")
        return "vectorstore"

def decide_to_generate(state):
    """
    Determines whether to generate an answer, or add web search

    Args:
        state (dict): The current graph state

    Returns:
        str: Binary decision for next node to call
    """

    print("---ASSESS GRADED DOCUMENTS---")
    question = state["question"]
    web_search = state["web_search"]
    filtered_documents = state["documents"]

    if web_search == "Yes":
        # All documents have been filtered check_relevance
        # We will re-generate a new query
        print("---DECISION: NOT ALL DOCUMENTS ARE RELEVANT TO QUESTION, INCLUDE WEB SEARCH---")
        return "websearch"
    else:
        # We have relevant documents, so generate answer
        print("---DECISION: GENERATE---")
        return "generate"

def grade_generation_v_documents_and_question(state):
    """
    Determines whether the generation is grounded in the document and answers question

    Args:
        state (dict): The current graph state

    Returns:
        str: Decision for next node to call
    """

    print("---CHECK HALLUCINATIONS---")
    question = state["question"]
    documents = state["documents"]
    generation = state["generation"]
    max_retries = state.get("max_retries", 3) # Default to 3 if not provided

    hallucination_grader_prompt_formatted = hallucination_grader_prompt.format(documents=format_docs(documents), generation=generation.content)
    result = llm_json_mode.invoke([SystemMessage(content=hallucination_grader_instructions)] + [HumanMessage(content=hallucination_grader_prompt_formatted)])
    grade = json.loads(result.content)['binary_score']

    # Check hallucination
    if grade == "yes":
        print("---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---")
        # Check question-answering
        print("---GRADE GENERATION vs QUESTION---")
        # Test using question and generation from above 
        answer_grader_prompt_formatted = answer_grader_prompt.format(question=question, generation=generation.content)
        result = llm_json_mode.invoke([SystemMessage(content=answer_grader_instructions)] + [HumanMessage(content=answer_grader_prompt_formatted)])
        grade = json.loads(result.content)['binary_score']
        if grade == "yes":
            print("---DECISION: GENERATION ADDRESSES QUESTION---")
            return "useful"
        elif state["loop_step"] <= max_retries:
            print("---DECISION: GENERATION DOES NOT ADDRESS QUESTION---")
            return "not useful"
        else:
            print("---DECISION: MAX RETRIES REACHED---")
            return "max retries"  
    elif state["loop_step"] <= max_retries:
        print("---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---")
        return "not supported"
    else:
        print("---DECISION: MAX RETRIES REACHED---")
        return "max retries"

## Control Flow

Dopo aver definito i nodi e gli edges, è necessario costruire il graph, registrando i nodi e gli edges

In [28]:
from langgraph.graph import StateGraph
from IPython.display import Image, display

workflow = StateGraph(GraphState)

# Define the nodes
workflow.add_node("websearch", web_search) # web search
workflow.add_node("retrieve", retrieve) # retrieve
workflow.add_node("grade_documents", grade_documents) # grade documents
workflow.add_node("generate", generate) # generate

# Build graph
workflow.set_conditional_entry_point(
    route_question,
    {
        "websearch": "websearch",
        "vectorstore": "retrieve",
    },
)
workflow.add_edge("websearch", "generate")
workflow.add_edge("retrieve", "grade_documents")
workflow.add_conditional_edges(
    "grade_documents",
    decide_to_generate,
    {
        "websearch": "websearch",
        "generate": "generate",
    },
)
workflow.add_conditional_edges(
    "generate",
    grade_generation_v_documents_and_question,
    {
        "not supported": "generate",
        "useful": END,
        "not useful": "websearch",
        "max retries": END,
    },
)

# Compile
graph = workflow.compile()
display(Image(graph.get_graph().draw_mermaid_png()))

<IPython.core.display.Image object>

In [29]:
inputs = {"question": "What are the types of agent memory?", "max_retries": 3}
for event in graph.stream(inputs, stream_mode="values"):
    print(event)

---ROUTE QUESTION---
---ROUTE QUESTION TO RAG---
{'question': 'What are the types of agent memory?', 'max_retries': 3, 'loop_step': 0}
---RETRIEVE---
{'question': 'What are the types of agent memory?', 'max_retries': 3, 'loop_step': 0, 'documents': [Document(metadata={'id': 'dff28981-f747-4873-a56a-9f6eb61a16b6', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': "LLM Powered Autonomous Agents | Lil'Log", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}, page_content="LLM Powered Autonomous A

---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---ASSESS GRADED DOCUMENTS---
---DECISION: NOT ALL DOCUMENTS ARE RELEVANT TO QUESTION, INCLUDE WEB SEARCH---
{'question': 'What are the types of agent memory?', 'web_search': 'Yes', 'max_retries': 3, 'loop_step': 0, 'documents': [Document(metadata={'id': 'dff28981-f747-4873-a56a-9f6eb61a16b6', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': "LLM Powered Autonomous Agents | Lil'Log", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’

{'question': 'What are the types of agent memory?', 'web_search': 'Yes', 'max_retries': 3, 'loop_step': 0, 'documents': [Document(metadata={'id': 'dff28981-f747-4873-a56a-9f6eb61a16b6', 'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'title': "LLM Powered Autonomous Agents | Lil'Log", 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:', 'language': 'en'}, page_content="LLM Powered Autonomous Agents | Lil'Log\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nLil'Log\n\n\n\n\n\n\n\n\n\n\n\n\n\

---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION ADDRESSES QUESTION---
{'question': 'What are the types of agent memory?', 'generation': AIMessage(content='There are two main types of memory in AI agents: short-term memory for context and long-term memory using external storage, such as vector databases. Limited memory agents have a limited capacity and do not retain past experiences or events for an extended period. These agents can still enhance the capabilities of reactive agents by incorporating historical data into their decision-making processes.', additional_kwargs={}, response_metadata={'model': 'llama3.2:3b-instruct-fp16', 'created_at': '2024-10-28T17:21:15.292649571Z', 'message': {'role': 'assistant', 'content': ''}, 'done_reason': 'stop', 'done': True, 'total_duration': 34764899491, 'load_duration': 15143119, 'prompt_eval_count': 1026, 'prompt_eval_duration': 20100461000, 'eval_count': 7