Module 6 — Agent Memory + Multi-Step Reasoning
Objectives
- Add conversational memory (ConversationBufferMemory) to RAG.
- Reformulate follow-up questions using chat history.
- Answer with citations via your FAISS-backed retriever.
- Build a simple agent with tools: internal search + calculator.
- Keep per-session history and show it influences answers.

Prereqs
- You have an index at ./data/indexes/my_corpus from Module 3/4.
- You ran Module 5 (retriever + LLM).
- Set an LLM key (prefer Gemini):
- export GOOGLE_API_KEY=...
- Optional fallback: export OPENAI_API_KEY=...

Install dependencies (if needed)
- langchain, langchain-core, langchain-community, langchain-google-genai
- sentence-transformers (for MiniLM queries)
- faiss-cpu
- tqdm

In [1]:
%pip install -q langchain langchain-core langchain-community langchain-google-genai google-generativeai
%pip install -q sentence-transformers faiss-cpu tqdm
import os
import sys
import json
from pathlib import Path
from typing import Any, Dict, List, Tuple, Optional
import numpy as np
from tqdm import tqdm
# FAISS import
try:
    import faiss  # noqa: F401
except Exception:
    import faiss_cpu as faiss  # type: ignore
print(f"Python: {sys.version.split()[0]} | FAISS ok")

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 24.3.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Note: you may need to restart the kernel to use updated packages.
Python: 3.12.9 | FAISS ok



[notice] A new release of pip is available: 24.3.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


Load index + docstore (from Module 3/4)

In [2]:
INDEX_DIR = Path("./../data/indexes/my_corpus")
assert (INDEX_DIR / "index.faiss").exists(), f"Missing FAISS index at {INDEX_DIR}. Run Module 3/4."

def load_index(in_dir: Path) -> Tuple[faiss.Index, np.ndarray, List[Dict[str, Any]], Dict[str, Any]]:
    index = faiss.read_index(str(in_dir / "index.faiss"))
    vectors = np.load(in_dir / "vectors.npy")
    docstore = []
    with (in_dir / "docstore.jsonl").open("r", encoding="utf-8") as f:
        for line in f:
            if line.strip():
                docstore.append(json.loads(line))
    manifest = json.loads((in_dir / "manifest.json").read_text(encoding="utf-8"))
    return index, vectors, docstore, manifest

index, vectors, store, manifest = load_index(INDEX_DIR)
print(f"Index loaded: dim={vectors.shape[1]}, count={vectors.shape[0]}, backend={manifest.get('backend')}")

Index loaded: dim=768, count=19, backend=gemini


Query embedding backend must match the index backend (manifest)

In [3]:
from dotenv import load_dotenv
load_dotenv(r"C:\ML\LU-LiveClasses\DocumentAI\.env")
GOOGLE_API_KEY = os.getenv("GOOGLE_API_KEY", "")

class MiniLMBackend:
    def __init__(self, model_name: str = "sentence-transformers/all-MiniLM-L6-v2"):
        from sentence_transformers import SentenceTransformer
        import torch
        device = "cuda" if torch.cuda.is_available() else "cpu"
        self.model = SentenceTransformer(model_name, device=device)
        self.dim = self.model.get_sentence_embedding_dimension()
    def encode(self, texts: List[str], batch_size: int = 64) -> np.ndarray:
        return self.model.encode(texts, batch_size=batch_size, convert_to_numpy=True, normalize_embeddings=False).astype(np.float32)

class GeminiBackend:
    def __init__(self, model_name: str = "text-embedding-004", api_key: Optional[str] = None):
        assert api_key, "GOOGLE_API_KEY required for Gemini embeddings."
        import google.generativeai as genai
        genai.configure(api_key=api_key)
        self.genai = genai
        self.dim = 768
        self.model_name = model_name
    def encode(self, texts: List[str], batch_size: int = 16) -> np.ndarray:
        vecs: List[List[float]] = []
        for i in tqdm(range(0, len(texts), batch_size), desc="Gemini embed"):
            for t in texts[i:i+batch_size]:
                resp = self.genai.embed_content(model=self.model_name, content=t[:4000])
                vecs.append(resp["embedding"])
        return np.asarray(vecs, dtype=np.float32)

def l2_normalize(vecs: np.ndarray) -> np.ndarray:
    norms = np.linalg.norm(vecs, axis=1, keepdims=True)
    norms[norms == 0] = 1.0
    return vecs / norms

backend_name = manifest.get("backend", "minilm").lower()
if backend_name == "minilm":
    query_backend = MiniLMBackend()
elif backend_name == "gemini":
    query_backend = GeminiBackend(api_key=GOOGLE_API_KEY)
else:
    raise ValueError("Unknown backend in manifest.")

assert query_backend.dim == vectors.shape[1], "Embedding dim mismatch with index."

Build a LangChain retriever around the FAISS index
- Returns LC Documents with metadata for citations
- Optional: MMR could be added; we keep it simple here

In [4]:
from langchain_core.documents import Document

def embed_query(q: str) -> np.ndarray:
    v = query_backend.encode([q], batch_size=1)
    return l2_normalize(v)

def search_faiss(index: faiss.Index, qv: np.ndarray, top_k: int = 5) -> Tuple[np.ndarray, np.ndarray]:
    D, I = index.search(qv, top_k)
    return D[0], I[0]

def retrieve(query: str, top_k: int = 5) -> List[Document]:
    qv = embed_query(query)
    D, I = search_faiss(index, qv, top_k)
    docs: List[Document] = []
    for idx in I[:top_k]:
        rec = store[idx]
        docs.append(Document(page_content=rec["text"], metadata=rec.get("metadata", {})))
    return docs

LLM setup (Gemini preferred; OpenAI fallback)

In [5]:
USE_GEMINI = bool(GOOGLE_API_KEY)

if USE_GEMINI:
    from langchain_google_genai import ChatGoogleGenerativeAI
    llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=0.2, max_output_tokens=512)
    print("Using Gemini for answers.")
else:
    OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "")
    if OPENAI_API_KEY:
        from langchain_openai import ChatOpenAI
        llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)
        print("Using OpenAI fallback.")
    else:
        raise EnvironmentError("No LLM configured. Set GOOGLE_API_KEY or OPENAI_API_KEY.")

Using Gemini for answers.


Conversation memory basics
- We use ConversationBufferMemory to store history as messages.
- We'll reformulate follow-up questions into standalone questions using the LLM + history, then retrieve and answer.

In [6]:
from langchain.memory import ConversationBufferMemory
from langchain_core.prompts import ChatPromptTemplate

memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

# Prompt to turn a follow-up question into a standalone question using chat history
condense_prompt = ChatPromptTemplate.from_messages([
    ("system", "Given a conversation and a follow-up question, rewrite the follow-up to be a standalone question that can be understood without the chat history. Use only information from the history."),
    ("human", "Chat history:\n{history}\n\nFollow-up question: {question}\n\nStandalone question:")
])

def messages_to_str(msgs) -> str:
    lines = []
    for m in msgs:
        role = getattr(m, "type", getattr(m, "role", "user"))
        content = getattr(m, "content", "")
        lines.append(f"{role}: {content}")
    return "\n".join(lines)

def condense_question_with_history(question: str) -> str:
    msgs = memory.load_memory_variables({})["chat_history"]
    history_str = messages_to_str(msgs) if msgs else ""
    prompt = condense_prompt.format(history=history_str, question=question)
    
    resp = llm.invoke(prompt)
    
    return resp.content.strip()

  memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)


Answering prompt with citations
- We pass retrieved context and ask the model to answer with bracketed citations like [doc_id p.page]

In [7]:
answer_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a concise assistant. Use ONLY the provided context to answer. Always cite sources in brackets like [doc_id p.page]. If the answer is not in the context, say you don't know."),
    ("human", "Question: {question}\n\nContext:\n{context}\n\nAnswer:")
])

def format_docs(docs: List[Document]) -> str:
    lines = []
    for d in docs:
        m = d.metadata
        cite = f"{m.get('doc_id','?')} p.{m.get('page','?')}"
        lines.append(f"[{cite}] {d.page_content}")
    return "\n\n".join(lines)

def answer_with_context(question: str, docs: List[Document]) -> str:
    ctx = format_docs(docs)
    prompt = answer_prompt.format(question=question, context=ctx)
    resp = llm.invoke(prompt)
    return resp.content.strip()

Chat function with memory
- 1) Condense with history
- 2) Retrieve
- 3) Answer with citations
- 4) Save turn in memory

In [8]:
def chat(user_input: str, top_k: int = 5) -> str:
    standalone_q = condense_question_with_history(user_input) if memory.chat_memory.messages else user_input
    docs = retrieve(standalone_q, top_k=top_k)
    answer = answer_with_context(standalone_q, docs)
    # Save original user input + answer to memory so follow-ups work naturally
    memory.save_context({"input": user_input}, {"output": answer})
    return answer

Try a short conversation with follow-ups
- Adjust queries to your corpus (e.g., contracts: termination, confidentiality)

In [None]:
turns = [
    "What is HDL?",
    "There is a similar abbrevation with starting with 'L'",
    "Which is bad for heart ?"
]

for q in turns:
    print(f"\nUser: {q}")
    print("Assistant:", chat(q))

# Inspect stored history (optional)
print("\n--- Conversation history snapshot ---")
for m in memory.chat_memory.messages[-6:]:
    print(m.type, ":", m.content[:120].replace("\n"," "), "...")


User: What is HDL?


Gemini embed: 100%|██████████| 1/1 [00:00<00:00,  1.32it/s]


Assistant: HDL, or high-density lipoprotein, is "good cholesterol" that removes cholesterol from the body, preventing buildup in the arteries and protecting against heart disease [heart-health p.1].  For women, HDL should be greater than 50 mg/dL, and for men, greater than 40 mg/dL [heart-health p.2].

User: There is a same with starting with 'L'


Gemini embed: 100%|██████████| 1/1 [00:00<00:00,  2.19it/s]


Assistant: HDL cholesterol should be greater than 50 mg/dL for women and greater than 40 mg/dL for men. [heart-health p.2]

User: Which is bad for heart ?


Gemini embed: 100%|██████████| 1/1 [00:00<00:00,  2.19it/s]


Assistant: To protect against heart disease, HDL cholesterol levels should be greater than 50 mg/dL for women and greater than 40 mg/dL for men [heart-health p.2].

--- Conversation history snapshot ---
human : What is HDL? ...
ai : HDL, or high-density lipoprotein, is "good cholesterol" that removes cholesterol from the body, preventing buildup in th ...
human : There is a same with starting with 'L' ...
ai : HDL cholesterol should be greater than 50 mg/dL for women and greater than 40 mg/dL for men. [heart-health p.2] ...
human : Which is bad for heart ? ...
ai : To protect against heart disease, HDL cholesterol levels should be greater than 50 mg/dL for women and greater than 40 m ...


Basic agent-style chaining with tools
- Tools:
  - search_corpus: query your internal vector DB and return top passages with citations
  - calculator: do simple arithmetic
- We’ll use a classic ReAct-style agent via initialize_agent for broad compatibility.

In [14]:
from langchain.tools import Tool
from langchain.agents import initialize_agent, AgentType
import re

def search_corpus_tool_fn(query: str, k: int = 4) -> str:
    docs = retrieve(query, top_k=k)
    out_lines = []
    for d in docs:
        m = d.metadata
        cite = f"{m.get('doc_id','?')} p.{m.get('page','?')}"
        snippet = d.page_content.strip().replace("\n", " ")
        out_lines.append(f"[{cite}] {snippet}")
    return "\n\n".join(out_lines) if out_lines else "No results."

def extract_numbers_from_string(text: str) -> str:
    """
    Extracts all integer and floating-point numbers from a string.
    Handles commas and returns the numbers as a string representation of a list.
    """
    # This regex finds numbers that can be integers or floats, have commas, and are optionaly negative.
    # It finds sequences like "1,234.56", "1000", "-77", and "0.25"
    number_strings = re.findall(r'-?\d{1,3}(?:,?\d{3})*(?:\.\d+)?', text)
    
    if not number_strings:
        return "No numbers found in the text."
        
    extracted_numbers = []
    for num_str in number_strings:
        # Remove commas and convert to a float
        cleaned_str = num_str.replace(',', '')
        try:
            extracted_numbers.append(float(cleaned_str))
        except ValueError:
            # This handles cases where the regex might match something that isn't a valid number
            continue
            
    return str(extracted_numbers)

tools = [
    Tool(
        name="search_corpus",
        func=search_corpus_tool_fn,
        description="Search the internal document corpus for relevant passages with citations. Input: a natural language query."
    ),
    Tool(
        name="number extractor",
        func=extract_numbers_from_string,
        description="Extract all numbers from a string, including integers and floats. Returns a list of numbers as strings."
    ),
]

agent_memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

agent = initialize_agent(
    tools=tools,
    llm=llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
    memory=agent_memory,
)

Run the agent on a multi-step query
- Example: find a policy from the corpus and then compute something from it

In [16]:
queries = [
    "Find document containing Total cholesterol limit",
    "give me the cholesterol limit number if found in previous query using the number extractor tool",
    ""
]

for q in queries:
    print(f"\nUser: {q}")
    resp = agent.invoke({"input": q})
    print("Assistant:", resp["output"])


User: Find document containing Total cholesterol limit


[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to search the corpus for documents mentioning "Total cholesterol limit".
Action: search_corpus
Action Input: "Total cholesterol limit"[0m

Gemini embed: 100%|██████████| 1/1 [00:00<00:00,  1.96it/s]




[heart-health p.2] Preventing or Managing Heart Disease  The American Heart Association identified seven steps, called “Life’s Simple 7™”, to improve health.  These guidelines reflect those established for the GLB program.  1. Eat a healthy diet   Eat at least 4.5 cups of fruits and vegetables a day.   Have at least two 3.5 ounce servings of fish a week (preferably oily fish).   Eat at least 3 servings of fiber-rich whole grains a day.   Limit sugar-sweetened beverages to not more than 450 calories (36 ounces) per week.   Eat less than 1500 mg of sodium a day.  2. Maintain a healthy body weight  3. Take charge of cholesterol   Goals for cholesterol - Think 50, 100, 150, 200   HDL – greater than 50 mg/dL(for women)   Greater than 40 mg/dL (for men)   LDL – less than 130 mg/dL (under 100 mg/dL is optimal)   Triglycerides – less than 150 mg/dL   Total cholesterol – less than 200 mg/dL  a. Testing is recommended starting at age 20.  b. Have your cholesterol profile done at lea

Gemini embed: 100%|██████████| 1/1 [00:00<00:00,  1.94it/s]



Observation: [36;1m[1;3m[How-to-Monitor-Cholesterol-BP-Weight p.2] ist. You can work together to create a

[spandan_facts p.None] O04   15 Facts About the Heart That You Didn't Know  Can beat outside body 08 Heart cells stop dividing early in life Laughter benefits O09 Heart can enlarge due to exercise or disease Represents love & emotion 10 Heart disease is the top global cause of death  Heart beats Over 100,000 times daily 11 Blood Reaches nearly every cell in the body  Women’s hearts beat faster than men'ss 12 Heart is Located slightly left of center in the chest  Electrical system is Controlled by the SA node 13 Begins beating about three weeks after conception  14 Generates enough pressure  Has its own coronary arteries for oxygenated blood to squirt blood up to 30 feet  15 Produces hormones like ANP for blood pressure regulation

[What-Is-High-Blood-Pressure p.2] throughout the week. Add muscle-strengthening activity at  least two days per week for more health benefits. •	 Tak

Gemini embed: 100%|██████████| 1/1 [00:00<00:00,  1.73it/s]



Observation: [36;1m[1;3m[How-to-Monitor-Cholesterol-BP-Weight p.2] ist. You can work together to create a

[What-Is-High-Blood-Pressure p.2] throughout the week. Add muscle-strengthening activity at  least two days per week for more health benefits. •	 Take medication the way your health care professional tells you. •	 Know what your blood pressure should be and work to

[spandan_facts p.None] O04   15 Facts About the Heart That You Didn't Know  Can beat outside body 08 Heart cells stop dividing early in life Laughter benefits O09 Heart can enlarge due to exercise or disease Represents love & emotion 10 Heart disease is the top global cause of death  Heart beats Over 100,000 times daily 11 Blood Reaches nearly every cell in the body  Women’s hearts beat faster than men'ss 12 Heart is Located slightly left of center in the chest  Electrical system is Controlled by the SA node 13 Begins beating about three weeks after conception  14 Generates enough pressure  Has its own coronary arte

Notes on memory and agents
- The RAG chat used ConversationBufferMemory to rewrite follow-ups into standalone queries.
- The agent keeps a separate memory, allowing it to reference earlier tool results in follow-ups.
- In production, prefer windowed memory (ConversationBufferWindowMemory) or summaries for long chats.
- For more structured memory across sessions, consider a store (e.g., Redis) and RunnableWithMessageHistory.

Troubleshooting
- If follow-ups fail, print the condensed standalone question to debug.
- If answers hallucinate, lower LLM temperature and ensure the answering prompt forbids external knowledge.
- If the agent doesn’t use tools, strengthen tool descriptions (include when to use).
- Ensure embedding backend matches manifest; mixing backends breaks retrieval.
- For long sessions, use ConversationBufferWindowMemory(k=6) or summarize history periodically.