This code demonstrates a complete setup for using a combination of web crawling, document storage, and retrieval-augmented generation to answer queries. It uses a vector database to store document embeddings for later retrieval, coupled with a large language model for generating informed responses. The process ranges from configuring the web crawler, loading and storing documents, to querying a language model that makes use of this stored information. The overall architecture illustrates a practical implementation of retrieval-augmented generation (RAG), which combines the strengths of information retrieval and generative language models to provide enhanced response capabilities.

In [None]:
import os

# Import required modules and set up configuration for document loading.
# 'embedding_type' specifies which embedding model to use, OpenAI in this case.
# 'openai_api_key' is the API key needed for accessing OpenAI's services.
# 'dbpath' denotes the path where the vector database will be stored locally.

from loader import UrlWalker, DocumentLoader

config = {
    "embedding_type": 'openai',
    "openai_api_key": os.environ['API_KEY'],
    "dbpath": "./vectordb_demo"
}


Initialization of the DocumentLoader class using the configuration defined above.<br>
This loader handles the embedding and storing of documents.

In [None]:
loader = DocumentLoader(config) 

Definition of the root URL and the maximum crawling depth.<br/>
The UrlWalker (web crawler) will start from this URL and explore up to the specified depth.

In [3]:
url = "https://lilianweng.github.io/posts/2023-06-23-agent/"
max_depth = 2

Initialize the UrlWalker and start crawling from the specified URL to the specified depth.<br/>
The result is a collection of URLs that have been visited and processed.

In [None]:
crawler = UrlWalker(url, max_depth)
crawler.crawl()

Display the first 10 URLs that have been visited by the crawler.<br/>
This helps to confirm which pages were considered during the crawling process.

In [None]:
list(crawler.visited_urls)[0:10]

Configuration for the Large Language Model (LLM).<br/>
This includes details about the embedding model, API provider, specific LLM model, the API key for authentication, and database path for vector storage.<br/>
These settings prepare the system to generate responses augmented by the retrieved documents.


In [5]:
config = {
    "embedding_type": 'openai',                    # Embeddings model. Must match Vector db data
    "api_provider": "openai",                      # API provider. Currently Ollama and OpenAI are supported
    "model": "gpt-4o-mini",                        # LLM model
    "openai_api_key": os.environ['API_KEY'],       # API key
    "dbpath": "./vectordb_demo"                    # path to store vector db
    
}


Create an Endpoint instance with the LLM configuration.<br/>
This endpoint will be used to process queries by augmenting with relevant information retrieved from the vector database.

In [6]:
from endpoint import Endpoint
ep = Endpoint(config)

Example query processing using the Endpoint.<br>
The system generates an initial response and subsequent responses based on previous conversation history and retrieved documents.<br/>
This demonstrates how knowledge can be leveraged from persisted data.

In [7]:
result = ep.get_response_with_history("How LLM agent memory work?", [])

query = 'How it can be implemented?'
result = ep.get_response_with_history(query, result.history)


Output the final generated response from the LLM.<br/>
This is the text generated as an answer to the last query, incorporating all relevant context from the history.

In [8]:
result.output

{'question': 'How it can be implemented?',
 'chat_history': "Human: How LLM agent memory work?\nAI: In a LLM-powered autonomous agent system, memory functions as a long-term memory module that records a comprehensive list of the agent's experiences in natural language. Here are the key components of how LLM agent memory works:\n\n1. **Memory Stream**: This is an external database that stores observations and events provided by the agent. Each element in this memory represents a specific experience.\n\n2. **Retrieval Model**: This model surfaces relevant context to inform the agent's behavior based on three criteria:\n   - **Recency**: Recent events are given higher scores.\n   - **Importance**: The model distinguishes between mundane and core memories, which can be assessed by asking the language model directly.\n   - **Relevance**: This is based on how related the memory is to the current situation or query.\n\n3. **Reflection Mechanism**: This synthesizes memories into higher-level i

Display the sources of the documents used in the last response.<br/>
This shows which URLs or documents contributed to the information used in generating the response.

In [16]:
[d.metadata['source'] for d in result.documents]

['https://lilianweng.github.io/posts/2023-06-23-agent/',
 'https://lilianweng.github.io/posts/2023-06-23-agent/',
 'https://lilianweng.github.io/posts/2023-06-23-agent/',
 'https://lilianweng.github.io/posts/2023-06-23-agent/']