# Chunk Grouping

When prompts contain diverse information from multiple sources - tool outputs, memory retrievals, database facts, API responses, and user context - presenting these items in an ungrouped list creates cognitive load and comprehension difficulties for language models. Without clear organization, the model must identify relationships between scattered items, determine which facts relate to which subtasks, and mentally categorize information while processing. This scattered presentation leads to missed connections, difficulty identifying relevant information for specific reasoning steps, and reduced ability to leverage related facts together. The problem intensifies as context grows and the number of heterogeneous information sources increases.

Chunk grouping addresses this challenge by organizing related information into structured blocks based on category, source, semantic similarity or purpose. Rather than presenting a flat list of facts and tool results, this technique creates clear sections for tools, memories, retrieved knowledge, task-specific data and other logical groupings. Information within each group shares common characteristics, making it easier for models to locate relevant context and understand relationships. This structured organization dramatically improves information accessibility and reasoning coherence.

This notebook demonstrates how to implement chunk grouping. We will explore demonstrations of ungrouped context confusion, simple category-based grouping for common types like tools and memories, and hierarchical grouping with nested structures. The techniques shown here are essential for building complex prompts in RAG systems, multi-agent architectures, and any scenario where diverse information sources must be coherently presented.

In [1]:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from typing import List, Optional, Dict, Any, Set
import numpy as np
import os

### Initialize the language model and embeddings

In [2]:
# Using gpt-4o-mini for cost-effective experimentation
llm = ChatOpenAI(model="gpt-4o-mini", api_key=os.getenv("OPENAI_API_KEY", "").strip(), temperature=0)
embeddings = OpenAIEmbeddings(api_key=os.getenv("OPENAI_API_KEY", "").strip())

## Ungrouped context confusion demonstration
Before implementing grouping solutions, we need to demonstrate how ungrouped context degrades model comprehension and reasoning quality. When information from diverse sources appears in a flat, unstructured list, models must expend cognitive resources on categorization and relationship discovery rather than focusing on the core task. Tool outputs mix with user preferences, database facts intermingle with memories, and retrieved documents scatter across the context without clear organization. This forces the model to mentally group information while simultaneously reasoning about the task.

We will create two versions of the same prompt - one with ungrouped mixed context and one with clearly grouped sections - and compare the model's ability to effectively use the information and produce coherent responses. By measuring response quality and information utilization, we can demonstrate the importance of chunk grouping for complex prompts with diverse information sources.

In [3]:
def create_ungrouped_context() -> str:
    """
    Create a prompt with mixed, ungrouped context from multiple sources.
    
    Returns:
        Prompt with scattered information
    """
    prompt = """
Available Context:

- User prefers concise responses with bullet points
- Weather API result: New York, 72°F, partly cloudy, humidity 65%
- Customer purchased Product A on 2023-08-15 for $299
- Database query result: Customer has Gold tier status since 2022-01-10
- The company was founded in 1995 and specializes in consumer electronics
- User's previous question: "What's my order status?"
- Tool output (inventory_check): Product A has 47 units in stock
- Customer's last login was 2023-09-14 at 10:32 AM
- Retrieved document: Product A warranty covers 2 years of parts and labor
- User mentioned they prefer email communication over phone calls
- Tool output (shipping_status): Order #12345 shipped via UPS, tracking: 1Z999AA10123456784
- Customer support notes: Customer called about similar issue in March 2023
- Product A has average rating of 4.5/5 stars from 1,247 reviews
- Database: Customer's email is customer@example.com, verified
- Retrieved document: Return policy allows 30-day returns with receipt

Question: Help the customer track their recent order and provide relevant product information.
"""
    return prompt

def create_grouped_context() -> str:
    """
    Create the same prompt with clearly grouped context sections.
    
    Returns:
        Prompt with organized information groups
    """
    prompt = """
=== USER PREFERENCES ===
- Prefers concise responses with bullet points
- Prefers email communication over phone calls
- Previous question: "What's my order status?"

=== CUSTOMER DATA ===
- Gold tier status since 2022-01-10
- Email: customer@example.com (verified)
- Last login: 2023-09-14 at 10:32 AM
- Support history: Called about similar issue in March 2023

=== ORDER INFORMATION ===
- Purchased Product A on 2023-08-15 for $299
- Order #12345 status: Shipped via UPS
- Tracking number: 1Z999AA10123456784

=== PRODUCT INFORMATION ===
- Product A: Consumer electronics device
- Average rating: 4.5/5 stars (1,247 reviews)
- Warranty: 2 years parts and labor
- Return policy: 30-day returns with receipt
- Current inventory: 47 units in stock

=== EXTERNAL DATA ===
- Weather (New York): 72°F, partly cloudy, humidity 65%
- Company info: Founded 1995, specializes in consumer electronics

Question: Help the customer track their recent order and provide relevant product information.
"""
    return prompt

# Test both context presentations
print("Testing Chunk Grouping Effects")
print("=" * 80)

print("\n1. UNGROUPED CONTEXT (Mixed Information)")
print("-" * 80)
ungrouped = create_ungrouped_context()
print(ungrouped[:400] + "\n...")

# Get response to ungrouped context
response_ungrouped = llm.invoke([HumanMessage(content=ungrouped)])

print("\nResponse to Ungrouped Context:")
print("-" * 80)
print(response_ungrouped.content[:400] + "\n...")

print("\n" + "="*80)
print("\n2. GROUPED CONTEXT (Organized Sections)")
print("-" * 80)
grouped = create_grouped_context()
print(grouped[:400] + "\n...")

# Get response to grouped context
response_grouped = llm.invoke([HumanMessage(content=grouped)])

print("\nResponse to Grouped Context:")
print("-" * 80)
print(response_grouped.content[:400] + "\n...")

Testing Chunk Grouping Effects

1. UNGROUPED CONTEXT (Mixed Information)
--------------------------------------------------------------------------------

Available Context:

- User prefers concise responses with bullet points
- Weather API result: New York, 72°F, partly cloudy, humidity 65%
- Customer purchased Product A on 2023-08-15 for $299
- Database query result: Customer has Gold tier status since 2022-01-10
- The company was founded in 1995 and specializes in consumer electronics
- User's previous question: "What's my order status?"
- Tool 
...

Response to Ungrouped Context:
--------------------------------------------------------------------------------
- **Order Status**: 
  - Order #12345 has shipped via UPS.
  - Tracking number: 1Z999AA10123456784.

- **Product Information**: 
  - **Product**: Product A
  - **Purchase Date**: 2023-08-15
  - **Price**: $299
  - **Warranty**: 2 years of parts and labor.
  - **Stock Availability**: 47 units in stock.
  - **Average Rating**: 4

The comparison uses identical information presented with different organizational structures.
- The ungrouped version intermixes user preferences, tool outputs, database results, retrieved documents and external data in a flat list with no categorization. The model must scan all 15 items to find related information, mentally grouping facts about the order, product, and customer while formulating a response.
- The grouped version organizes the same facts into five clear sections: user preferences, customer data, order information, product information and external data. This structure makes it immediately clear which facts relate to tracking the order (Order Information section) versus product details (Product Information section).

Empirical testing shows grouped context improves information utilization, meaning the model successfully incorporates more relevant facts into responses and produces more comprehensive answers. The improvement is particularly pronounced for tasks requiring synthesis across multiple information categories, where ungrouped presentation leads to relevant facts being overlooked.

## Pattern 1: Source-based grouping (most common)
In real production systems, we almost always know where our information comes from. We made specific API calls, database queries or RAG retrievals. The simplest and most effective approach is to group information by its source using clear section headers. No complex classes, no automatic categorization, no importance scoring - just straightforward organization that mirrors our system architecture.

This pattern works for most of production AI agents because the information sources are explicit and well-defined. A customer service bot knows it retrieved user profile data, order history and product documentation. A RAG system knows which documents it retrieved. The grouping logic is trivial: put each source in its own section with a clear header.

In [4]:
# Pattern 1: Simple Source-Based Grouping - This is what you'll actually use in production 90% of the time

def create_grouped_prompt_simple(user_info, conversation_history, rag_results, query):
    """
    Create a grouped prompt using simple string formatting.
    
    This is the practical approach used in production systems.
    No classes, no enums, no complexity - just clear sections.
    
    Args:
        user_info: String with user profile/preferences
        conversation_history: String with recent messages
        rag_results: String with retrieved documents
        query: Current user question
    
    Returns:
        Formatted prompt string
    """
    prompt = f"""## USER PROFILE
{user_info}

## CONVERSATION HISTORY
{conversation_history}

## RETRIEVED KNOWLEDGE
{rag_results}

## CURRENT QUESTION
{query}
"""
    return prompt

# Example usage with real data
user_info = """Name: Sarah Chen
Gold tier member since 2022
Prefers: Concise, bullet-point responses
Contact: email preferred over phone"""

conversation_history = """User: What's my order status?
Assistant: Let me check that for you.
User: Also, what's the warranty on this product?"""

rag_results = """Product A: Wireless headphones, $299
Warranty: 2 years parts and labor
Return policy: 30-day returns with receipt
Rating: 4.5/5 stars (1,247 reviews)"""

query = "Can I return this if it doesn't work well?"

# Create the prompt - that's it!
prompt = create_grouped_prompt_simple(user_info, conversation_history, rag_results, query)

print("Source-Based Grouping (Production Pattern)")
print("=" * 80)
print(prompt)

Source-Based Grouping (Production Pattern)
## USER PROFILE
Name: Sarah Chen
Gold tier member since 2022
Prefers: Concise, bullet-point responses
Contact: email preferred over phone

## CONVERSATION HISTORY
User: What's my order status?
Assistant: Let me check that for you.
User: Also, what's the warranty on this product?

## RETRIEVED KNOWLEDGE
Product A: Wireless headphones, $299
Warranty: 2 years parts and labor
Return policy: 30-day returns with receipt
Rating: 4.5/5 stars (1,247 reviews)

## CURRENT QUESTION
Can I return this if it doesn't work well?



## Pattern 2: Relevance ordering (RAG systems)
When we have multiple items from the same source that need prioritization, relevance ordering is the practical solution. RAG systems already compute relevance scores during retrieval - simply sort by those scores and present the top-k most relevant items first. This leverages primacy effects where models pay more attention to early context without requiring complex importance scoring or semantic clustering.

The key insight is that we already have the relevance information from our retrieval step. Do not throw it away and re-compute with clustering algorithms. Just use it directly to order your context.

In [5]:
# Pattern 2: Relevance-Based Ordering - Use existing scores from your RAG system - don't re-compute!

from typing import List, Tuple

def format_by_relevance(items: List[Tuple[str, float]], top_k: int = 5) -> str:
    """
    Format items by relevance score (already computed by RAG system).
    
    Args:
        items: List of (content, relevance_score) tuples
        top_k: Number of top items to include
    
    Returns:
        Formatted string with items in relevance order
    """
    # Sort by relevance score (highest first)
    sorted_items = sorted(items, key=lambda x: x[1], reverse=True)[:top_k]
    
    # Format as simple numbered list
    formatted = []
    for i, (content, score) in enumerate(sorted_items, 1):
        formatted.append(f"{i}. {content}")
    
    return "\n".join(formatted)

# Example: RAG system returned these documents with scores
# In practice, your vector database (Pinecone, Weaviate, etc.) returns these
rag_results = [
    ("Return policy: 30-day returns with original receipt and packaging", 0.89),
    ("Warranty covers manufacturing defects for 2 years from purchase", 0.85),
    ("Refund processing typically takes 5-7 business days", 0.82),
    ("Product rating: 4.5/5 stars based on 1,247 customer reviews", 0.45),
    ("Shipping available to all 50 US states and Puerto Rico", 0.23),
    ("Product dimensions: 7.1 x 6.2 x 3.3 inches, weight 0.5 lbs", 0.18),
]

# Format top 3 most relevant
context = format_by_relevance(rag_results, top_k=3)

print("Relevance-Based Ordering (RAG Pattern)")
print("=" * 80)
print("\nMost Relevant Information:")
print(context)

Relevance-Based Ordering (RAG Pattern)

Most Relevant Information:
1. Return policy: 30-day returns with original receipt and packaging
2. Warranty covers manufacturing defects for 2 years from purchase
3. Refund processing typically takes 5-7 business days


## Pattern 3: Simple nested sections (when needed)
Sometimes we need hierarchical organization - customer information naturally breaks down into profile, orders, and support history. But we do not need recursive tree structures or category classes. Just use indented strings or nested f-strings. If our data naturally has parent-child relationships, reflect that in simple nested formatting. The visual hierarchy comes from indentation and headers, not from complex object models.

This pattern is useful when a single flat section would be too large (20+ items) and the information has natural subcategories.

In [6]:
# Pattern 3: Simple Nested Sections - Use when we have natural subcategories, but keep it simple

def create_nested_prompt(customer_data: dict, order_data: dict, query: str) -> str:
    """
    Create a prompt with simple nested sections using indentation.
    
    No complex classes or tree structures - just organized strings.
    
    Args:
        customer_data: Dict with profile, history, etc.
        order_data: Dict with order details
        query: User question
    
    Returns:
        Formatted prompt with nested sections
    """
    prompt = f"""## CUSTOMER INFORMATION

### Profile
  • Name: {customer_data['name']}
  • Email: {customer_data['email']}
  • Tier: {customer_data['tier']}

### Preferences
  • Communication: {customer_data['comm_pref']}
  • Response style: {customer_data['style_pref']}

## CURRENT ORDER

### Order Details
  • Order ID: {order_data['id']}
  • Product: {order_data['product']}
  • Date: {order_data['date']}

### Shipping
  • Carrier: {order_data['carrier']}
  • Tracking: {order_data['tracking']}
  • Status: {order_data['status']}

## QUESTION
{query}
"""
    return prompt

# Example data
customer = {
    'name': 'Sarah Chen',
    'email': 'sarah@example.com',
    'tier': 'Gold member since 2022',
    'comm_pref': 'Email preferred',
    'style_pref': 'Concise, bullet points'
}

order = {
    'id': '#12345',
    'product': 'Wireless headphones ($299)',
    'date': '2023-08-15',
    'carrier': 'UPS Ground',
    'tracking': '1Z999AA10123456784',
    'status': 'In transit, estimated delivery Sept 16'
}

query = "When will my order arrive?"

prompt = create_nested_prompt(customer, order, query)

print("Simple Nested Sections (When Needed)")
print("=" * 80)
print(prompt)
print("=" * 80)
print("\nWhen to Use Nested Sections:")
print("✓ Single section would have 20+ items")
print("✓ Natural subcategories exist (Profile, Orders, Support)")
print("✓ Parent-child relationships are clear")
print("\nWhen NOT to Use:")
print("✗ Less than 15 total items (flat sections are clearer)")
print("✗ No natural groupings (use relevance ordering instead)")
print("✗ Just because you can (simpler is better)")


Simple Nested Sections (When Needed)
## CUSTOMER INFORMATION

### Profile
  • Name: Sarah Chen
  • Email: sarah@example.com
  • Tier: Gold member since 2022

### Preferences
  • Communication: Email preferred
  • Response style: Concise, bullet points

## CURRENT ORDER

### Order Details
  • Order ID: #12345
  • Product: Wireless headphones ($299)
  • Date: 2023-08-15

### Shipping
  • Carrier: UPS Ground
  • Tracking: 1Z999AA10123456784
  • Status: In transit, estimated delivery Sept 16

## QUESTION
When will my order arrive?


When to Use Nested Sections:
✓ Single section would have 20+ items
✓ Natural subcategories exist (Profile, Orders, Support)
✓ Parent-child relationships are clear

When NOT to Use:
✗ Less than 15 total items (flat sections are clearer)
✗ No natural groupings (use relevance ordering instead)
✗ Just because you can (simpler is better)


-When to use nested sections:
    - Single section would have 20+ items.
    - Natural subcategories exist (Profile, Orders, Support).
    - Parent-child relationships are clear.
-When not to use:
    - Less than 15 total items (flat sections are clearer)
    - No natural groupings (use relevance ordering instead)
    - Just because you can (simpler is better)

## Decision tree: Which pattern to use?
Use this simple decision tree to choose the right approach for our use case:

```
START: We have information to organize in a prompt
│
├─ Do we have < 5 total items?
│  └─ YES → Don't group at all, just list them
│
├─ Do We know the source of each piece of information?
│  └─ YES → Use Pattern 1: Source-based grouping
│
├─ Is this a RAG system with relevance scores?
│  └─ YES → Use Pattern 2: Relevance ordering
│
├─ Do you have 20+ items with natural subcategories?
│  └─ YES → Use Pattern 3: Simple nested sections
│
└─ Still not sure?
   └─ Default to Pattern 1: Source-Based Grouping
      It works for almost everything.
```

**Important**: If we are considering semantic clustering or automatic categorization, ask yourself:
- Do we really not know where our data came from?
- Do we really have > 100 items from completely unknown sources?
- Have we tried the simple approaches first?

99% of the time, the answer is to use source-based grouping.

In [8]:
# Complete practical example combining patterns

def build_production_prompt(user_info, conversation, rag_results_with_scores, query):
    """
    Real-world prompt construction combining practical patterns.
    
    This is what production code actually looks like:
    - Source-based sections for known data sources
    - Relevance ordering for RAG results
    - Simple formatting, no complex classes
    """
    # Format RAG results by relevance (Pattern 2)
    top_docs = sorted(rag_results_with_scores, key=lambda x: x[1], reverse=True)[:3]
    rag_context = "\n".join([f"• {doc}" for doc, _ in top_docs])
    
    # Build prompt with source-based sections (Pattern 1)
    prompt = f"""## USER CONTEXT
{user_info}

## RECENT CONVERSATION
{conversation}

## RELEVANT KNOWLEDGE (top 3 by relevance)
{rag_context}

## CURRENT QUESTION
{query}

Please provide a helpful, concise response based on the context above.
"""
    return prompt

# Usage example
user = "Gold tier member, prefers email, likes brief responses"
convo = "User asked about order status, confirmed shipping address"
rag = [
    ("Warranty: 2 years parts and labor", 0.88),
    ("Return policy: 30 days with receipt", 0.92),
    ("Shipping: 5-7 business days", 0.15),
]
question = "Can I return this?"

final_prompt = build_production_prompt(user, convo, rag, question)

print("Production-Ready Prompt Construction")
print("=" * 80)
print(final_prompt)

Production-Ready Prompt Construction
## USER CONTEXT
Gold tier member, prefers email, likes brief responses

## RECENT CONVERSATION
User asked about order status, confirmed shipping address

## RELEVANT KNOWLEDGE (top 3 by relevance)
• Return policy: 30 days with receipt
• Warranty: 2 years parts and labor
• Shipping: 5-7 business days

## CURRENT QUESTION
Can I return this?

Please provide a helpful, concise response based on the context above.

