# Graph Memory System Tests

This notebook provides deterministic smoke tests for the graph memory system.

Tests include:
- Basic operations (add and retrieve memories)
- Store and load from file (persistence)
- Store 100 memories and verify graph structure

All tests use deterministic mock embeddings (no API calls) and fixed seeds for reproducibility.


In [3]:
import sys
import os
import tempfile
import numpy as np
from unittest.mock import patch

# Add project root to path
sys.path.insert(0, os.path.dirname(os.path.abspath('.')))

from graph.graph_memory import GraphMemory


## Helper: Mock Embedding

Create deterministic mock embeddings for testing (no API calls needed)


In [4]:
def mock_embedding(text: str) -> np.ndarray:
    """
    Create deterministic mock embedding based on text.
    
    Uses text hash as seed for reproducibility.
    Returns normalized vector of shape (1536,).
    """
    seed = abs(hash(text)) % 10000
    rng = np.random.RandomState(seed)
    vec = rng.randn(1536)
    norm = np.linalg.norm(vec)
    if norm == 0:
        raise RuntimeError("Zero-norm embedding vector")
    return vec / norm


## Test 1: Basic Operations

Test adding a few memories and retrieving them


In [5]:
print("=" * 60)
print("Test 1: Basic Operations")
print("=" * 60)

# Create temporary file
fd, temp_path = tempfile.mkstemp(suffix='.json')
os.close(fd)

try:
    with patch('graph.graph_memory.get_embedding', side_effect=mock_embedding):
        memory = GraphMemory(temp_path)
        
        # Add memories
        texts = ["I like coffee", "Python programming", "Machine learning"]
        print(f"\nAdding {len(texts)} memories...")
        
        for i, text in enumerate(texts, 1):
            node_id = memory.add_memory(text)
            print(f"  {i}. Added memory {node_id}: {text}")
            assert node_id > 0, f"Invalid node ID: {node_id}"
        
        # Verify storage
        stats = memory.show_stats()
        print(f"\nGraph statistics:")
        print(f"  Nodes: {stats['nodes']}")
        print(f"  Edges: {stats['edges']}")
        print(f"  Density: {stats['density']}")
        assert stats["nodes"] == 3, f"Expected 3 nodes, got {stats['nodes']}"
        assert stats["edges"] >= 0, "Invalid edge count"
        assert 0.0 <= stats["density"] <= 1.0, f"Invalid density: {stats['density']}"
        
        # Retrieve memories
        print(f"\nRetrieving memories for query: 'programming'...")
        results = memory.retrieve_memories("programming", k=2)
        print(f"  Found {len(results)} results:")
        for i, result in enumerate(results, 1):
            print(f"    {i}. {result}")
        
        assert len(results) == 2, f"Expected 2 results, got {len(results)}"
        assert all(isinstance(r, str) for r in results), "All results must be strings"
        assert all(len(r) > 0 for r in results), "All results must be non-empty"
        
        print("\nTest 1 PASSED")
        
finally:
    # Clean up
    if os.path.exists(temp_path):
        os.remove(temp_path)


Test 1: Basic Operations

Adding 3 memories...
  1. Added memory 1: I like coffee
  2. Added memory 2: Python programming
  3. Added memory 3: Machine learning

Graph statistics:
  Nodes: 3
  Edges: 3
  Density: 1.0

Retrieving memories for query: 'programming'...
  Found 2 results:
    1. Python programming
    2. I like coffee

Test 1 PASSED


## Test 2: Store and Load

Test storing memories and loading them back from file


In [6]:
print("=" * 60)
print("Test 2: Store and Load")
print("=" * 60)

# Create temporary file
fd, temp_path = tempfile.mkstemp(suffix='.json')
os.close(fd)

try:
    with patch('graph.graph_memory.get_embedding', side_effect=mock_embedding):
        # First session: add memories
        print("\nSession 1: Storing memories...")
        memory1 = GraphMemory(temp_path)
        texts = ["Coffee is good", "Python programming", "Machine learning"]
        
        node_ids = []
        for i, text in enumerate(texts, 1):
            node_id = memory1.add_memory(text)
            node_ids.append(node_id)
            print(f"  {i}. Stored memory {node_id}: {text}")
        
        print(f"\n  Stored {len(texts)} memories")
        
        # Verify file was created
        assert os.path.exists(temp_path), "Memory file was not created"
        print(f"  Memory file created: {temp_path}")
        
        # Second session: load memories
        print("\nSession 2: Loading memories from file...")
        memory2 = GraphMemory(temp_path)
        stats = memory2.show_stats()
        
        print(f"  Loaded {stats['nodes']} nodes, {stats['edges']} edges")
        assert stats["nodes"] == 3, f"Expected 3 nodes, got {stats['nodes']}"
        assert stats["edges"] >= 0, "Invalid edge count"
        
        # Verify texts are preserved
        print("\n  Verifying stored texts...")
        for node_id in node_ids:
            text = memory2.store.get_node_text(node_id)
            print(f"    Node {node_id}: {text}")
            assert text in texts, f"Text mismatch for node {node_id}"
        
        # Verify embeddings are preserved
        print("\n  Verifying stored embeddings...")
        for node_id in node_ids:
            embedding = memory2.store.get_node_embedding(node_id)
            assert embedding.shape == (1536,), f"Invalid embedding shape: {embedding.shape}"
            assert not np.any(np.isnan(embedding)), "Embedding contains NaN"
            assert not np.any(np.isinf(embedding)), "Embedding contains Inf"
        
        # Test retrieval
        print("\n  Testing retrieval...")
        results = memory2.retrieve_memories("programming", k=1)
        print(f"    Query: 'programming' -> Found: {results[0]}")
        assert len(results) >= 1, "Retrieval returned no results"
        
        print("\nTest 2 PASSED")
        
finally:
    if os.path.exists(temp_path):
        os.remove(temp_path)


Test 2: Store and Load

Session 1: Storing memories...
  1. Stored memory 1: Coffee is good
  2. Stored memory 2: Python programming
  3. Stored memory 3: Machine learning

  Stored 3 memories
  Memory file created: C:\Users\xeangao\AppData\Local\Temp\tmpef464meb.json

Session 2: Loading memories from file...
  Loaded 3 nodes, 3 edges

  Verifying stored texts...
    Node 1: Coffee is good
    Node 2: Python programming
    Node 3: Machine learning

  Verifying stored embeddings...

  Testing retrieval...
    Query: 'programming' -> Found: Coffee is good

Test 2 PASSED


## Test 3: Store 99 Memories (Bacon Essays)

Test storing 99 sentences from Bacon's first 3 essays and verify they are all saved correctly


In [7]:
print("=" * 60)
print("Test 3: Store 99 Memories (Bacon Essays)")
print("=" * 60)

# Create temporary file
fd, temp_path = tempfile.mkstemp(suffix='.json')
os.close(fd)

try:
    with patch('graph.graph_memory.get_embedding', side_effect=mock_embedding):
        memory = GraphMemory(temp_path)
        
        # Load texts from 3essay.txt (99 sentences from first 3 essays)
        essay_file = "3essay.txt"
        with open(essay_file, 'r', encoding='utf-8') as f:
            texts = [line.strip() for line in f if line.strip()]
        
        assert len(texts) == 99, f"Expected 99 texts, got {len(texts)}"
        node_ids = []
        
        print(f"\nAdding {len(texts)} memories from {essay_file}...")
        for i, text in enumerate(texts, 1):
            node_id = memory.add_memory(text)
            node_ids.append(node_id)
            
            # Show progress every 20 memories
            if i % 20 == 0:
                print(f"  Progress: {i}/{len(texts)} memories added...")
        
        print(f"\n  All {len(texts)} memories added!")
        
        # Verify all nodes were added
        stats = memory.show_stats()
        print(f"\nGraph statistics:")
        print(f"  Nodes: {stats['nodes']}")
        print(f"  Edges: {stats['edges']}")
        print(f"  Density: {stats['density']}")
        
        assert stats["nodes"] == 99, f"Expected 99 nodes, got {stats['nodes']}"
        assert stats["edges"] >= 0, "Invalid edge count"
        assert 0.0 <= stats["density"] <= 1.0, f"Invalid density: {stats['density']}"
        print(f"\n  All {len(texts)} nodes stored correctly")
        
        # Verify node IDs
        assert min(node_ids) == 1, f"Invalid min node ID: {min(node_ids)}"
        assert max(node_ids) == 99, f"Invalid max node ID: {max(node_ids)}"
        assert len(set(node_ids)) == 99, "Duplicate node IDs found"
        print(f"  Node IDs correct: {min(node_ids)} to {max(node_ids)}")
        
        # Verify all embeddings are valid
        print(f"\n  Verifying embeddings...")
        for node_id in node_ids[:10]:  # Check first 10
            embedding = memory.store.get_node_embedding(node_id)
            assert embedding.shape == (1536,), f"Invalid embedding shape for node {node_id}"
            assert not np.any(np.isnan(embedding)), f"NaN in embedding for node {node_id}"
            assert not np.any(np.isinf(embedding)), f"Inf in embedding for node {node_id}"
        print(f"  Verified first 10 embeddings are valid")
        
        # Verify we can retrieve nodes
        print(f"\n  Verifying node retrieval...")
        for node_id in node_ids[:5]:  # Check first 5
            text = memory.store.get_node_text(node_id)
            assert text in texts, f"Text not found for node {node_id}"
            print(f"    Node {node_id}: {text[:50]}...")
        
        # Test retrieval
        print(f"\n  Testing retrieval with query: 'truth'...")
        results = memory.retrieve_memories("truth", k=5)
        print(f"  Found {len(results)} results:")
        for i, result in enumerate(results, 1):
            print(f"    {i}. {result[:60]}...")
        
        assert len(results) == 5, f"Expected 5 results, got {len(results)}"
        assert all(isinstance(r, str) for r in results), "All results must be strings"
        assert all(len(r) > 0 for r in results), "All results must be non-empty"
        
        print("\nTest 3 PASSED")
        
finally:
    if os.path.exists(temp_path):
        os.remove(temp_path)


Test 3: Store 99 Memories (Bacon Essays)

Adding 99 memories from 3essay.txt...
  Progress: 20/99 memories added...
  Progress: 40/99 memories added...
  Progress: 60/99 memories added...
  Progress: 80/99 memories added...

  All 99 memories added!

Graph statistics:
  Nodes: 99
  Edges: 291
  Density: 0.06

  All 99 nodes stored correctly
  Node IDs correct: 1 to 99

  Verifying embeddings...
  Verified first 10 embeddings are valid

  Verifying node retrieval...
    Node 1: What is truth? said jesting Pilate, and would not ...
    Node 2: Certainly there be, that delight in giddiness, and...
    Node 3: And though the sects of philosophers of that kind ...
    Node 4: But it is not only the difficulty and labor, which...
    Node 5: One of the later school of the Grecians, examineth...

  Testing retrieval with query: 'truth'...
  Found 5 results:
    1. Therefore it is most necessary, that the church, by doctrine...
    2. Surely in counsels concerning religion, that counsel of the

## Test 3: Graph Structure Analysis

Display graph edges and compute Laplacian matrix


In [8]:
# Recreate the same graph structure for analysis
fd, temp_path = tempfile.mkstemp(suffix='.json')
os.close(fd)

try:
    with patch('graph.graph_memory.get_embedding', side_effect=mock_embedding):
        memory = GraphMemory(temp_path)
        
        # Load texts from 3essay.txt (same as Test 3)
        essay_file = "3essay.txt"
        with open(essay_file, 'r', encoding='utf-8') as f:
            texts = [line.strip() for line in f if line.strip()]
        
        assert len(texts) == 99, f"Expected 99 texts, got {len(texts)}"
        
        for text in texts:
            memory.add_memory(text)
        
        # Display graph edges
        print("=" * 60)
        print("Graph Edge Information")
        print("=" * 60)
        edges = list(memory.store.graph.edges(data=True))
        print(f"\nTotal edges: {len(edges)}")
        print(f"\nFirst 10 edges (node_id1, node_id2, weight):")
        for i, (u, v, data) in enumerate(edges[:10], 1):
            weight = data.get("weight", 1.0)
            print(f"  {i}. ({u}, {v}, {weight:.6f})")
        
        # Compute Laplacian matrix
        print(f"\n" + "=" * 60)
        print("Computing Laplacian Matrix")
        print("=" * 60)
        import networkx as nx
        graph = memory.store.graph
        
        # Get sorted node IDs for matrix ordering
        node_ids = sorted(memory.store.get_all_node_ids())
        n_nodes = len(node_ids)
        
        # Use NetworkX to compute weighted Laplacian matrix
        laplacian_nx = nx.laplacian_matrix(graph, nodelist=node_ids, weight='weight')
        laplacian = laplacian_nx.toarray().astype(np.float64)
        
        print(f"\nLaplacian matrix shape: {laplacian.shape}")
        print(f"Min value: {np.min(laplacian):.6f}")
        print(f"Max value: {np.max(laplacian):.6f}")
        print(f"Trace (sum of diagonal): {np.trace(laplacian):.6f}")
        print(f"Matrix is symmetric: {np.allclose(laplacian, laplacian.T)}")
        
        # Save Laplacian matrix to file for SCOPE project
        laplacian_file = "laplacian_matrix.npy"
        np.save(laplacian_file, laplacian)
        print(f"\nLaplacian matrix saved to: {laplacian_file}")
        print(f"Load in SCOPE project with: laplacian = np.load('{laplacian_file}')")
        
finally:
    if os.path.exists(temp_path):
        os.remove(temp_path)


Graph Edge Information

Total edges: 291

First 10 edges (node_id1, node_id2, weight):
  1. (1, 2, 0.014030)
  2. (1, 3, 0.009871)
  3. (1, 4, 0.036908)
  4. (1, 5, 0.010852)
  5. (1, 6, -0.012408)
  6. (1, 7, -0.013408)
  7. (1, 9, 0.021642)
  8. (1, 10, 0.017066)
  9. (1, 12, 0.043153)
  10. (1, 13, 0.030813)

Computing Laplacian Matrix

Laplacian matrix shape: (99, 99)
Min value: -0.089426
Max value: 0.614703
Trace (sum of diagonal): 25.712860
Matrix is symmetric: True

Laplacian matrix saved to: laplacian_matrix.npy
Load in SCOPE project with: laplacian = np.load('laplacian_matrix.npy')


## SCOPE Output Visualization

Visualization of the graph structure processed by SCOPE (initial result):

![SCOPE Output](output.png)

Note: This is an initial visualization result from SCOPE processing of the 99-node graph structure. The visualization shows the graph topology and structure derived from the Laplacian matrix.


## Summary

All tests completed. The graph memory system:
- Can store and retrieve memories
- Can persist memories to file and load them back
- Can handle 100 memories correctly
- All embeddings are valid (no NaN or Inf)
- Graph structure is correct (node IDs, edges, density)
