In [1]:
### Setting up the env
import os
os.environ["OPENAI_API_KEY"] = input("Enter your key")

## Loading and data ingestion.

In [2]:
from llama_index.readers.web import BeautifulSoupWebReader
reader = BeautifulSoupWebReader()

In [3]:
# reading the article from the web source.
documents =  reader.load_data(["https://lilianweng.github.io/posts/2023-06-23-agent/"])
len(documents)

1

In [4]:
" ".join(documents[0].text.split()[:1000])

'LLM Powered Autonomous Agents | Lil\'Log Lil\'Log Posts Archive Search Tags FAQ emojisearch.app LLM Powered Autonomous Agents Date: June 23, 2023 | Estimated Reading Time: 31 min | Author: Lilian Weng Table of Contents Agent System Overview Component One: Planning Task Decomposition Self-Reflection Component Two: Memory Types of Memory Maximum Inner Product Search (MIPS) Component Three: Tool Use Case Studies Scientific Discovery Agent Generative Agents Simulation Proof-of-Concept Examples Challenges Citation References Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver. Agent System Overview# In a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by sever

In [6]:
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

Settings.text_splitter = SentenceSplitter(chunk_size=512, chunk_overlap=10)
Settings.llm = OpenAI("gpt-4o")


## Storing the vector store

In [7]:
import chromadb
from llama_index.core import VectorStoreIndex
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext

# initialize client, setting path to save data
db = chromadb.PersistentClient(path="./chroma_db")

# create collection
chroma_collection = db.get_or_create_collection("web_source_collection")

# assign chroma as the vector_store to the context
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

# create your index
index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
    transformations=[SentenceSplitter(chunk_size=512, chunk_overlap=10)]
)

## Retriever and Quering.

In [8]:
from llama_index.core import get_response_synthesizer
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine

# configure retriever
retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=3,
)

# configure response synthesizer
response_synthesizer = get_response_synthesizer()

# assemble query engine
query_engine = RetrieverQueryEngine(
    retriever=retriever,
    response_synthesizer=response_synthesizer,
)

In [9]:
# query
response = query_engine.query("What is task decomposition ?")
print(response)

Task decomposition is a technique used to enhance model performance on complex tasks by breaking down a large task into smaller, more manageable steps. This approach allows for a clearer interpretation of the model's thinking process and can be implemented through various methods, such as prompting a model to think step by step, using task-specific instructions, or incorporating human inputs. Additionally, task decomposition can involve exploring multiple reasoning possibilities at each step, creating a structured approach to problem-solving.


### Visualizing retirieved chunks for the above query.

In [10]:
retrieved_chunks = query_engine.retrieve("What is task decomposition ?")
len(retrieved_chunks)

2

In [11]:
retrieved_chunks[0].text

'Task Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. 

In [12]:
retrieved_chunks[1].text

'Challenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.\n\n\nReliability of natural language interface: Current agent system relies on natural language as an interface between LLMs and external components such as memory and tools. However, the reliability of model outputs is questionable, as LLMs may make formatting errors and occasionally exhibit rebellious behavior (e.g. refuse to follow an instruction). Consequently, much of the agent demo code focuses on parsing model output.\n\n\nCitation#\nCited as:\n\nWeng, Lilian. (Jun 2023). “LLM-powered Autonomous Agents”. Lil’Log. https://lilianweng.github.io/posts/2023-06-23-agent/.\n\nOr\n@article{weng2023agent,\n  title   = "LLM-powered Autonomous Agents",\n  author  = "Weng, Lilian",\n  journal = "lilia

In [13]:
# query
response = query_engine.query("what is the role of tools in an agent and how can we use them ?")
print(response)

Tools play a crucial role in enhancing the capabilities of agents, particularly those powered by large language models (LLMs). By equipping agents with external tools, they can perform tasks that go beyond their inherent physical and cognitive limits. These tools can include neural modules like deep learning models or symbolic modules such as calculators, currency converters, and weather APIs. The use of tools allows agents to handle complex tasks more effectively, such as browsing the internet, executing code, or performing scientific experiments. To use these tools effectively, it is important for the agent to know when and how to utilize them, which is determined by the agent's ability to recognize the need for a tool and to execute the appropriate actions using the tool. Fine-tuning models to learn how to use external tool APIs can further enhance their performance and output quality.


In [14]:
retrieved_chunks = query_engine.retrieve("what is the role of tools in an agent and how can we use them ?")
len(retrieved_chunks)

2

In [15]:
retrieved_chunks[0].text

'It quantizes a data point $x_i$ to $\\tilde{x}_i$ such that the inner product $\\langle q, x_i \\rangle$ is as similar to the original distance of $\\angle q, \\tilde{x}_i$ as possible, instead of picking the closet quantization centroid points.\n\n\nFig. 9. Comparison of MIPS algorithms, measured in recall@10. (Image source: Google Blog, 2020)\nCheck more MIPS algorithms and performance comparison in ann-benchmarks.com.\nComponent Three: Tool Use#\nTool use is a remarkable and distinguishing characteristic of human beings. We create, modify and utilize external objects to do things that go beyond our physical and cognitive limits. Equipping LLMs with external tools can significantly extend the model capabilities.\n\nFig. 10. A picture of a sea otter using rock to crack open a seashell, while floating in the water. While some other animals can use tools, the complexity is not comparable with humans. (Image source: Animals using tools)\nMRKL (Karpas et al. 2022), short for “Modular Rea

In [16]:
retrieved_chunks[1].text

'The lack of expertise may cause LLMs not knowing its flaws and thus cannot well judge the correctness of task results.\nBoiko et al. (2023) also looked into LLM-empowered agents for scientific discovery, to handle autonomous design, planning, and performance of complex scientific experiments. This agent can use tools to browse the Internet, read documentation, execute code, call robotics experimentation APIs and leverage other LLMs.\nFor example, when requested to "develop a novel anticancer drug", the model came up with the following reasoning steps:\n\ninquired about current trends in anticancer drug discovery;\nselected a target;\nrequested a scaffold targeting these compounds;\nOnce the compound was identified, the model attempted its synthesis.\n\nThey also discussed the risks, especially with illicit drugs and bioweapons. They developed a test set containing a list of known chemical weapon agents and asked the agent to synthesize them. 4 out of 11 requests (36%) were accepted to

In [17]:
# query
response = query_engine.query("Explain in steps, how does huggingGPT works ?")
print(response)

HuggingGPT operates through a structured process involving four key stages:

1. **Task Planning**: The system begins by using a language model to interpret user requests and break them down into multiple tasks. Each task is characterized by attributes such as task type, ID, dependencies, and arguments. This stage involves parsing the input and planning the tasks using few-shot examples to guide the process.

2. **Model Selection**: Once tasks are defined, the language model selects appropriate expert models to handle each task. This selection is framed as a multiple-choice question, where the model chooses from a list of available models. Task type-based filtration is applied to manage the limited context length.

3. **Task Execution**: The selected models execute the tasks based on the planned sequence and dependencies. Each task is processed according to its specific requirements, utilizing the capabilities of the chosen models.

4. **Response Summarization**: After task execution, t

In [18]:
# query
response = query_engine.query("What are the three levels at which an API-Bank benchmark evaulates the agent capabilities ?")
print(response)

The API-Bank benchmark evaluates agent capabilities at three levels:

1. **Level-1**: This level assesses the ability to call the API. The model must determine whether to call a given API, execute the call correctly, and respond appropriately to the API's returns based on its description.

2. **Level-2**: This level examines the ability to retrieve the API. The model needs to search for potential APIs that could meet the user's requirements and learn how to use them by reading their documentation.

3. **Level-3**: This level evaluates the ability to plan API usage beyond just retrieval and calling. For unclear user requests, the model may need to conduct multiple API calls to resolve the task, such as scheduling group meetings or booking travel arrangements.
