<b>Stateful Agent with Infinite Recall</b>

In [5]:
from IPython.display import Image

In [2]:
from pydantic import BaseModel, Field, ValidationError
import json

If we continue the OS metaphor, the agent's context window is RAM, and the Vector Database is the Hard Drive. But unlike a standard SQL database where you search by exact keywords, a Vector Database allows the agent to search by concept and meaning. We now need to integrate a local vector database (e.g. Meta's FAISS or Postgres pgvector) into your agent's toolset.

<b>The Mathematical Foundation: How the Agent "Reads" Disk </b><br> 
Before writing the tools you must understand what happens when the agent pages memory.When the agent executes a search for "customer refund policy", that text string is passed through an embedding model (like text-embedding-3-small) which converts it into a high-dimensional dense vector $\vec{q} \in \mathbb{R}^n$ (where $n$ is typically 1536).<br>
The Vector DB stores all your past documents as vectors $\vec{v}_i$. To find the most relevant memory, FAISS calculates the Euclidean distance (L2 norm) between the agent's query vector and every stored vector:$$d(\vec{q}, \vec{v}_i) = ||\vec{q} - \vec{v}_i||_2 = \sqrt{\sum_{j=1}^{n} (q_j - v_{ij})^2}$$The database returns the top $K$ vectors with the smallest distance, allowing the agent to instantly retrieve semantically related facts even if the exact keywords don't match.

<b>Step 1: Initialize the "Disk Drive" </b> <br>
First, we set up FAISS and a metadata store. FAISS only stores the math (the vectors); we need a simple Python list or dictionary to store the actual text that maps to those vectors.

In [1]:
import numpy as np
import faiss

# 1. Define the Vector Space
VECTOR_DIMENSION = 1536 # Matches the embedding model's output size
disk_index = faiss.IndexFlatL2(VECTOR_DIMENSION) 

# 2. Define the Metadata Store (maps vector IDs to original text)
disk_metadata = []

def get_embedding(text: str) -> np.ndarray:
    """
    Simulates calling an embedding API (OpenAI, HuggingFace, etc.)
    Returns a 1D numpy array of float32, length 1536.
    """
    # ... standard API call to get vector ...
    return np.random.rand(VECTOR_DIMENSION).astype('float32') # Mock vector

ModuleNotFoundError: No module named 'faiss'

<b>Step 2: Build the Agent's Memory Tools </b><br>
Now, we create the specific tools the agent will use during its heartbeat loop. Notice how we enforce request_heartbeat = True on the search tool. This ensures that after the agent fetches the data, it is immediately forced to read it rather than waiting for the user to respond.

In [2]:
from pydantic import BaseModel, Field

# --- TOOL SCHEMA DEFINITIONS ---

class ArchivalMemoryInsert(BaseModel):
    request_heartbeat: bool = Field(default=True)
    fact_text: str = Field(description="The detailed fact or document to save to long-term memory.")

class ArchivalMemorySearch(BaseModel):
    request_heartbeat: bool = Field(default=True) # Force the agent to read the result!
    search_query: str = Field(description="The semantic concept you want to search your memory for.")
    top_k: int = Field(default=3, description="Number of memories to page into context.")

# --- TOOL IMPLEMENTATIONS ---

def archival_memory_insert(args: ArchivalMemoryInsert) -> str:
    """Writes a new fact to the Vector DB."""
    # 1. Convert text to math
    vector = get_embedding(args.fact_text)
    
    # 2. Reshape for FAISS (requires a 2D matrix, even for 1 vector)
    vector_matrix = np.array([vector])
    
    # 3. Store the math in FAISS, store the text in our metadata list
    disk_index.add(vector_matrix)
    disk_metadata.append(args.fact_text)
    
    return f"Successfully saved to archival storage. Memory ID: {len(disk_metadata) - 1}"

def archival_memory_search(args: ArchivalMemorySearch) -> str:
    """Pages facts from the Vector DB back into the agent's active context."""
    if disk_index.ntotal == 0:
        return "Archival memory is currently empty."
        
    # 1. Convert query to math
    query_vector = get_embedding(args.search_query)
    query_matrix = np.array([query_vector])
    
    # 2. Perform the Mathematical Search
    distances, indices = disk_index.search(query_matrix, args.top_k)
    
    # 3. Format the results for the LLM to read
    results = []
    for i, idx in enumerate(indices[0]):
        if idx != -1: # FAISS returns -1 if there aren't enough vectors
            retrieved_text = disk_metadata[idx]
            score = distances[0][i]
            results.append(f"[Match {i+1} | Distance: {score:.4f}] {retrieved_text}")
            
    if not results:
        return "No relevant memories found."
        
    # Return a massive string block. The heartbeat loop appends this to the context window.
    return "\n".join(results)

<b>Step 3: The Architecture in Action (Control Flow) </b><br>
Let's look at how this plays out inside your heartbeat event loop from the previous step.

User asks a hard question: "What were the key takeaways from that 50-page technical spec I gave you last month?"

LLM Execution 1: The LLM realizes it doesn't know. It calls ArchivalMemorySearch(search_query="technical spec takeaways last month", top_k=5).

Python Execution: Your script runs the FAISS search, grabs the 5 closest text chunks, and formats them into a single string.

Context Update: The python script appends {"role": "system", "content": "Tool Result: [Match 1] The spec emphasized... [Match 2]..."} to the state_memory list.

The Heartbeat: Because the tool requested a heartbeat, the python script does not wait for the user. It immediately loops and passes the newly updated context window back to the LLM.

LLM Execution 2: The LLM reads the system prompt containing the retrieved text, synthesizes the answer, and finally calls SendMessage(message="Based on the spec, the key takeaways were...").

Pause: The agent sets request_heartbeat = False and the thread goes to sleep, waiting for the user.