[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hawksight-AI/semantica/blob/main/cookbook/advanced/11_Advanced_Context_Engineering.ipynb)

# Advanced Context Engineering

## Overview

This notebook covers advanced topics in context engineering using Semantica. We will explore custom memory management strategies, tuning hybrid retrieval, and extending the system with custom graph builders.

### Learning Objectives

- **Custom Memory Pruning**: Implement importance-based pruning instead of FIFO.
- **Hybrid Retrieval Tuning**: Optimize weights for vector, graph, and keyword search.
- **Custom Extensions**: Register custom graph building methods.
- **Performance Optimization**: Balance token limits and retrieval latency.

---

## 1. Setup

We'll start by setting up a mock vector store and importing necessary components.

In [None]:
from typing import List, Dict, Any, Optional
from semantica.context import AgentMemory, AgentContext, ContextGraph, ContextRetriever, VectorStore
from semantica.context import registry

# Mock Vector Store (same as in introduction)
class MockVectorStore(VectorStore):
    def __init__(self):
        self.items = {}
        self.counter = 0
    def add(self, texts, metadata=None, **kwargs):
        ids = []
        for i, text in enumerate(texts):
            id_ = f"id_{self.counter}"
            self.items[id_] = {"text": text, "metadata": metadata[i] if metadata else {}}
            ids.append(id_)
            self.counter += 1
        return ids
    def search(self, query, limit=5, **kwargs):
        return [{
            "id": k, "content": v["text"], "score": 0.85, "metadata": v["metadata"]
        } for k, v in list(self.items.items())[:limit]]
    def delete(self, ids, **kwargs):
        return True

vs = MockVectorStore()
kg = ContextGraph()

## 2. Custom Memory Pruning Strategy

By default, `AgentMemory` uses a FIFO (First-In-First-Out) strategy combined with a token limit to prune short-term memory. However, you might want to keep "important" memories longer regardless of their age.

Let's subclass `AgentMemory` to implement an importance-based pruning strategy.

In [None]:
class ImportanceAwareMemory(AgentMemory):
    def _prune_short_term_memory(self):
        """
        Custom pruning: Always keep items marked as 'important' in metadata,
        then prune others based on token limits.
        """
        if not self.short_term_memory:
            return

        # Separate important items
        important_items = [item for item in self.short_term_memory if item.metadata.get("important")]
        other_items = [item for item in self.short_term_memory if not item.metadata.get("important")]
        
        # Calculate tokens used by important items
        important_tokens = sum(self._count_tokens(item.content) for item in important_items)
        
        # Calculate remaining budget
        remaining_tokens = max(0, self.token_limit - important_tokens)
        
        # Prune other items to fit remaining budget
        kept_others = []
        current_tokens = 0
        
        # Iterate in reverse (newest first) to keep recent items
        for item in reversed(other_items):
            item_tokens = self._count_tokens(item.content)
            if current_tokens + item_tokens <= remaining_tokens:
                kept_others.insert(0, item)
                current_tokens += item_tokens
            else:
                break # Stop once we hit the limit
                
        # Reconstruct memory: Important items + kept recent items
        # Sort by timestamp to maintain order
        all_kept = sorted(important_items + kept_others, key=lambda x: x.timestamp)
        self.short_term_memory = all_kept

# Test the custom memory
memory = ImportanceAwareMemory(vector_store=vs, token_limit=100)

# Add an old important memory
memory.store("IMPORTANT: User's name is Alice", metadata={"important": True})

# Fill with filler memories
for i in range(20):
    memory.store(f"Filler memory {i} " * 5) # Consumes tokens

print(f"Short-term items: {len(memory.short_term_memory)}")
print("First item (should be the important one):", memory.short_term_memory[0].content)

## 3. Tuning Hybrid Retrieval

Hybrid retrieval combines scores from vector search and graph traversal. You can tune the `hybrid_alpha` parameter to weight these components.

- `hybrid_alpha = 0.0`: Pure Vector Search
- `hybrid_alpha = 1.0`: Pure Graph Search
- `hybrid_alpha = 0.5`: Balanced (Default)

Additionally, `max_expansion_hops` controls how far we traverse the graph from retrieved nodes.

In [None]:
# Populate graph with some structure
kg.add_node("python", "concept", "Python")
kg.add_node("ml", "concept", "Machine Learning")
kg.add_edge("python", "ml", "used_for")

retriever = ContextRetriever(
    memory_store=memory,
    knowledge_graph=kg,
    vector_store=vs,
    hybrid_alpha=0.7,      # Favor graph connections
    max_expansion_hops=2   # Traverse deeper
)

results = retriever.retrieve("Python")
for res in results:
    print(f"Source: {res.source}, Score: {res.score:.2f}")

## 4. Extending with Custom Methods

Semantica's registry system allows you to plug in custom logic. Let's register a custom graph builder that creates a star graph topology.

In [None]:
def star_graph_builder(center_entity, satellites, **kwargs):
    """
    Builds a star graph where all satellites connect to the center.
    """
    nodes = []
    edges = []
    
    # Center node
    nodes.append({"id": "center", "label": center_entity, "type": "CENTER"})
    
    for i, sat in enumerate(satellites):
        sat_id = f"sat_{i}"
        nodes.append({"id": sat_id, "label": sat, "type": "SATELLITE"})
        edges.append({"source": "center", "target": sat_id, "relation": "connects_to"})
        
    return {"nodes": nodes, "edges": edges}

# Register the method
registry.method_registry.register("graph", "star_builder", star_graph_builder)

# Verify registration
print("Available graph methods:", registry.method_registry.list_all("graph"))

# Use it (conceptual - typically used via build_context_graph wrapper)
graph_data = star_graph_builder("Central Hub", ["Spoke 1", "Spoke 2"])
print(f"Created graph with {len(graph_data['nodes'])} nodes and {len(graph_data['edges'])} edges.")

## 5. Best Practices for Production

1. **Token Limits**: Align `token_limit` with your LLM's context window minus the prompt template size.
2. **Vector Store**: Use a production-grade vector store (e.g., Pinecone, Weaviate, Qdrant) instead of the mock store.
3. **Asynchronous Operations**: For high-throughput systems, consider wrapping storage operations in async tasks (though the core logic is synchronous for simplicity).
4. **Entity Resolution**: Implement a robust `EntityLinker` strategy to prevent graph fragmentation (e.g., "Alice" vs "Alice S.").