#### Install the dependencies

In [20]:
! pip install langchain-community tiktoken langchain-openai langchainhub chromadb langchain langgraph tavily-python langchain-groq



In [21]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import SentenceTransformerEmbeddings
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_groq import ChatGroq
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.tools import tool

#### Enter GROQ API Key

In [22]:
import os

os.environ["GROQ_API_KEY"] = "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"
os.environ['TAVILY_API_KEY'] = "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"

#### Load Connection to LLM


In [69]:
llm = ChatGroq(
    model="llama-3.1-8b-instant",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
)

#### Download data

In [24]:
# Data related with Agents, Prompt Engineering and LLMs

urls = [
    "https://lilianweng.github.io/posts/2023-06-23-agent/",
    "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
    "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
]

docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=500, chunk_overlap=20
)
doc_splits = text_splitter.split_documents(docs_list)

for ind, doc in enumerate(doc_splits):
    doc.metadata['chunk_id'] = ind  # Assign chunk_id

In [25]:
len(doc_splits)

89

In [26]:
doc_splits[0].metadata

{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/',
 'title': "LLM Powered Autonomous Agents | Lil'Log",
 'description': 'Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refine them for future steps, thereby improving the quality of final results.\n\n\nMemory\

#### Use a Sentence-Transformer model

In [27]:
embedding_model = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

#### Create Vector Index

In [28]:
# Post-processing
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

def print_results(docs):
    for doc in docs[:5]:
        print(f"Chunk ID: {doc.metadata['chunk_id']}")
        print(doc.page_content.replace("\n", " "))
        print("-"*500)

In [37]:
prompt = """You are an assistant for question-answering tasks.
            Use the following pieces of retrieved context to answer the question.
            If no context is present or if you don't know the answer, just say that you don't know.
            Do not make up the answer unless it is there in the provided context.

            Question:
            {question}

            Context:
            {context}

            Answer:
         """

prompt_template = ChatPromptTemplate.from_template(prompt)

# Chain
rag_chain = prompt_template | llm | StrOutputParser()

In [30]:
from chromadb import PersistentClient

try:
    client = PersistentClient(path="./chroma_db")
    client.delete_collection("rag-chroma")
except:
    pass

In [31]:
vectorstore = Chroma.from_documents(
    documents=doc_splits,
    collection_name="rag-chroma",
    embedding=embedding_model,
    persist_directory="./chroma_db"
)


#### Question

In [32]:
question = "What are the types of agent memory?"


#### Simple Retriever

In [74]:
simple_retriever = vectorstore.as_retriever(search_type="similarity",
                                                search_kwargs={"k": 5})

In [75]:
docs = simple_retriever.invoke(question)

print_results(docs)

Chunk ID: 7
Fig. 7. Comparison of AD, ED, source policy and RL^2 on environments that require memory and exploration. Only binary reward is assigned. The source policies are trained with A3C for "dark" environments and DQN for watermaze.(Image source: Laskin et al. 2023) Component Two: Memory# (Big thank you to ChatGPT for helping me draft this section. I’ve learned a lot about the human brain and data structure for fast MIPS in my conversations with ChatGPT.) Types of Memory# Memory can be defined as the processes used to acquire, store, retain, and later retrieve information. There are several types of memory in human brains.   Sensory Memory: This is the earliest stage of memory, providing the ability to retain impressions of sensory information (visual, auditory, etc) after the original stimuli have ended. Sensory memory typically only lasts for up to a few seconds. Subcategories include iconic memory (visual), echoic memory (auditory), and haptic memory (touch).   Short-Term Memor

#### RAG output with Simple Retriever

In [76]:
docs = format_docs(docs)

# Run
generation = rag_chain.invoke({"context": docs, "question": question})
print(generation)

According to the provided context, the types of agent memory mentioned are:

1. Sensory Memory: This is the earliest stage of memory, providing the ability to retain impressions of sensory information (visual, auditory, etc) after the original stimuli have ended.
2. Short-Term Memory (STM) or Working Memory: It stores information that we are currently aware of and needed to carry out complex cognitive tasks such as learning and reasoning.
3. Long-Term Memory (LTM): Long-term memory can store information for a remarkably long time, ranging from a few days to decades, with an essentially unlimited storage capacity. There are two subtypes of LTM:
   a. Explicit / declarative memory: This is memory of facts and events, and refers to those memories that can be consciously recalled, including episodic memory (events and experiences) and semantic memory (facts and concepts).
   b. Implicit / procedural memory: This type of memory is unconscious and involves skills and routines that are perfor

##### MultiQueryRetriever

In [77]:
from langchain.retrievers.multi_query import MultiQueryRetriever
# Set logging for the queries
import logging

mq_retriever = MultiQueryRetriever.from_llm(
    retriever=simple_retriever, llm=llm,
    include_original=True,
)

logging.basicConfig()
# so we can see what queries are generated by the LLM
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

In [78]:
docs = mq_retriever.get_relevant_documents(question)

print_results(docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['Here are three alternative versions of the user question to retrieve relevant documents from a vector database:', 'What are the types of agent memory that have been studied in the field of artificial intelligence?', 'What types of memory do agents use in various applications of artificial intelligence, such as robotics or game playing?', 'What are the different categories or classifications of agent memory that have been proposed or implemented in the literature on artificial intelligence and cognitive architectures?']


Chunk ID: 42
Question clustering: Embed questions and run $k$-means for clustering. Demonstration selection: Select a set of representative questions from each cluster; i.e. one demonstration from one cluster. Samples in each cluster are sorted by distance to the cluster centroid and those closer to the centroid are selected first. Rationale generation: Use zero-shot CoT to generate reasoning chains for selected questions and construct few-shot prompt to run inference.  Augmented Language Models# A survey on augmented language models by Mialon et al. (2023) has great coverage over multiple categories of language models augmented with reasoning skills and the ability of using external tools. Recommend it. Retrieval# Often we need to complete tasks that require latest knowledge after the model pretraining time cutoff or internal/private knowledge base. In that case, the model would not know the context if we don’t explicitly provide it in the prompt. Many methods for Open Domain Question

#### RAG output with Multiquery Retriever

In [79]:
docs = format_docs(docs)

# Run
generation = rag_chain.invoke({"context": docs, "question": question})
print(generation)

The types of agent memory mentioned in the context are:

1. Sensory memory: This is the earliest stage of memory, providing the ability to retain impressions of sensory information (visual, auditory, etc) after the original stimuli have ended. Sensory memory typically only lasts for up to a few seconds.
2. Short-term memory (STM) or Working Memory: It stores information that we are currently aware of and needed to carry out complex cognitive tasks such as learning and reasoning. Short-term memory is believed to have the capacity of about 7 items (Miller 1956) and lasts for 20-30 seconds.
3. Long-term memory (LTM): Long-term memory can store information for a remarkably long time, ranging from a few days to decades, with an essentially unlimited storage capacity. There are two subtypes of LTM:
   - Explicit / declarative memory: This is memory of facts and events, and refers to those memories that can be consciously recalled, including episodic memory (events and experiences) and semant

##### Hybrid Search

In [45]:
!pip install rank_bm25



In [80]:
from langchain.retrievers import BM25Retriever, EnsembleRetriever

simple_retriever = vectorstore.as_retriever(search_type="similarity",
                                                search_kwargs={"k": 5})

bm25_retriever = BM25Retriever.from_documents(documents=doc_splits)
bm25_retriever.k = 2

ensemble_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, simple_retriever],
    weights=[0.5, 0.5]
)

In [81]:
docs = ensemble_retriever.get_relevant_documents(question)

print_results(docs)

Chunk ID: 17
Retrieval model: surfaces the context to inform the agent’s behavior, according to relevance, recency and importance.  Recency: recent events have higher scores Importance: distinguish mundane from core memories. Ask LM directly. Relevance: based on how related it is to the current situation / query.   Reflection mechanism: synthesizes memories into higher level inferences over time and guides the agent’s future behavior. They are higher-level summaries of past events (<- note that this is a bit different from self-reflection above)  Prompt LM with 100 most recent observations and to generate 3 most salient high-level questions given a set of observations/statements. Then ask LM to answer those questions.   Planning & Reacting: translate the reflections and the environment information into actions  Planning is essentially in order to optimize believability at the moment vs in time. Prompt template: {Intro of an agent X}. Here is X's plan today in broad strokes: 1) Relation

#### RAG output with Hybrid Retriever

In [82]:
docs = format_docs(docs)

# Run
generation = rag_chain.invoke({"context": docs, "question": question})
print(generation)

According to the provided context, the types of agent memory mentioned are:

1. Sensory Memory: This is the earliest stage of memory, providing the ability to retain impressions of sensory information (visual, auditory, etc) after the original stimuli have ended.
2. Short-Term Memory (STM) or Working Memory: It stores information that we are currently aware of and needed to carry out complex cognitive tasks such as learning and reasoning.
3. Long-Term Memory (LTM): Long-term memory can store information for a remarkably long time, ranging from a few days to decades, with an essentially unlimited storage capacity. There are two subtypes of LTM:
	* Explicit / declarative memory: This is memory of facts and events, and refers to those memories that can be consciously recalled, including episodic memory (events and experiences) and semantic memory (facts and concepts).
	* Implicit / procedural memory: This type of memory is unconscious and involves skills and routines that are performed au

#### Retrieval with Reranker

In [49]:
from langchain_community.cross_encoders import HuggingFaceCrossEncoder
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain.retrievers import ContextualCompressionRetriever

In [83]:
simple_retriever = vectorstore.as_retriever(search_type="similarity",
                                                search_kwargs={"k": 5})

In [84]:
model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base")
compressor = CrossEncoderReranker(model=model, top_n=5)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=simple_retriever
)

In [85]:
docs = compression_retriever.get_relevant_documents(question)

print_results(docs)

Chunk ID: 0
LLM Powered Autonomous Agents | Lil'Log                                        Lil'Log                  |       Posts     Archive     Search     Tags     FAQ     emojisearch.app                LLM Powered Autonomous Agents      Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng       Table of Contents    Agent System Overview  Component One: Planning  Task Decomposition  Self-Reflection   Component Two: Memory  Types of Memory  Maximum Inner Product Search (MIPS)   Component Three: Tool Use  Case Studies  Scientific Discovery Agent  Generative Agents Simulation  Proof-of-Concept Examples   Challenges  Citation  References      Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerfu

#### RAG output with Simple Retriever with Reranker

In [86]:
docs = format_docs(docs)

# Run
generation = rag_chain.invoke({"context": docs, "question": question})
print(generation)

According to the provided context, the types of memory mentioned are:

1. Sensory Memory: This is the earliest stage of memory, providing the ability to retain impressions of sensory information (visual, auditory, etc) after the original stimuli have ended.
2. Short-Term Memory (STM) or Working Memory: It stores information that we are currently aware of and needed to carry out complex cognitive tasks such as learning and reasoning.
3. Long-Term Memory (LTM): Long-term memory can store information for a remarkably long time, ranging from a few days to decades, with an essentially unlimited storage capacity. There are two subtypes of LTM:
	* Explicit / declarative memory: This is memory of facts and events, and refers to those memories that can be consciously recalled, including episodic memory (events and experiences) and semantic memory (facts and concepts).
	* Implicit / procedural memory: This type of memory is unconscious and involves skills and routines that are performed automati

#### LLMChainFilter

In [87]:

from langchain.retrievers.document_compressors import LLMChainFilter
from langchain.retrievers.document_compressors import DocumentCompressorPipeline

simple_retriever = vectorstore.as_retriever(search_type="similarity",
                                                search_kwargs={"k": 5})

#  decides which of the initially retrieved documents to filter out and which ones to return
relevant_filter = LLMChainFilter.from_llm(llm=llm)

model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base")
reranker = CrossEncoderReranker(model=model, top_n=5)

pipeline_compressor = DocumentCompressorPipeline(
    transformers=[relevant_filter]
)

# retrieves the documents similar to query and then applies the filter
compression_retriever = ContextualCompressionRetriever(
    base_compressor=pipeline_compressor, base_retriever=simple_retriever
)

In [88]:
docs = compression_retriever.get_relevant_documents(question)

print_results(docs)

Chunk ID: 17
Retrieval model: surfaces the context to inform the agent’s behavior, according to relevance, recency and importance.  Recency: recent events have higher scores Importance: distinguish mundane from core memories. Ask LM directly. Relevance: based on how related it is to the current situation / query.   Reflection mechanism: synthesizes memories into higher level inferences over time and guides the agent’s future behavior. They are higher-level summaries of past events (<- note that this is a bit different from self-reflection above)  Prompt LM with 100 most recent observations and to generate 3 most salient high-level questions given a set of observations/statements. Then ask LM to answer those questions.   Planning & Reacting: translate the reflections and the environment information into actions  Planning is essentially in order to optimize believability at the moment vs in time. Prompt template: {Intro of an agent X}. Here is X's plan today in broad strokes: 1) Relation

#### RAG output with Simple Retriever and LLM Chain Filter

In [89]:
docs = format_docs(docs)

# Run
generation = rag_chain.invoke({"context": docs, "question": question})
print(generation)

According to the provided context, the types of agent memory mentioned are:

1. Short-term memory: This is used for in-context learning and storing information temporarily.
2. Long-term memory: This provides the agent with the capability to retain and recall information over extended periods.

Note that there is no mention of other types of agent memory in the provided context.


#### Composite Retriever: Combining all the techniques

In [70]:
simple_retriever = vectorstore.as_retriever(search_type="similarity",
                                                search_kwargs={"k": 5})

mq_retriever = MultiQueryRetriever.from_llm(
    retriever=simple_retriever, llm=llm,
    include_original=True,
)

bm25_retriever = BM25Retriever.from_documents(documents=doc_splits)
bm25_retriever.k = 2

ensemble_retriever = EnsembleRetriever(
    retrievers=[bm25_retriever, mq_retriever],
    weights=[0.7, 0.3]
)


relevant_filter = LLMChainFilter.from_llm(llm=llm)

model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base")
reranker = CrossEncoderReranker(model=model, top_n=5)

pipeline_compressor = DocumentCompressorPipeline(
    transformers=[relevant_filter, reranker]
)



In [71]:
compression_retriever = ContextualCompressionRetriever(
    base_compressor=pipeline_compressor, base_retriever=ensemble_retriever
)

In [72]:
docs = compression_retriever.get_relevant_documents(question)

print_results(docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['Here are three alternative versions of the user question to retrieve relevant documents from a vector database:', 'What are the types of agent memory that have been studied in the field of artificial intelligence?', 'What types of memory do agents use in various applications of artificial intelligence, such as robotics or game playing?', 'What are the different categories or classifications of agent memory that have been proposed or implemented in the literature on artificial intelligence and cognitive architectures?']


Chunk ID: 0
LLM Powered Autonomous Agents | Lil'Log                                        Lil'Log                  |       Posts     Archive     Search     Tags     FAQ     emojisearch.app                LLM Powered Autonomous Agents      Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng       Table of Contents    Agent System Overview  Component One: Planning  Task Decomposition  Self-Reflection   Component Two: Memory  Types of Memory  Maximum Inner Product Search (MIPS)   Component Three: Tool Use  Case Studies  Scientific Discovery Agent  Generative Agents Simulation  Proof-of-Concept Examples   Challenges  Citation  References      Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerfu

#### RAG output with Final Retriever

In [73]:
docs = format_docs(docs)

# Run
generation = rag_chain.invoke({"context": docs, "question": question})
print(generation)

According to the provided context, the types of agent memory mentioned are:

1. Short-term memory: This is considered to be the in-context learning (See Prompt Engineering) and utilizes the model's short-term memory to learn.
2. Long-term memory: This provides the agent with the capability to retain and recall (infinite) information over extended periods, often by leveraging an external vector store and fast retrieval.
3. Sensory memory: This is used for learning embedding representations for raw inputs, including text, image, or other modalities.
4. Maximum Inner Product Search (MIPS): This is used to alleviate the restriction of finite attention span by saving the embedding representation of information into a vector store database that can support fast maximum inner-product search (MIPS).

Note that the context does not mention "Reflection mechanism" as a type of agent memory, but rather as a process that synthesizes memories into higher-level inferences over time and guides the age