# Intermediate Representation and Chunk Mapping

This tutorial shows how to work with the IntermediateRepresentation (IR) to understand the relationship between structured prompts and their rendered text.

The IR is the bridge between structure and output:
- **Structure**: StructuredPrompt with elements and hierarchy
- **IR**: Chunks that map back to elements
- **Output**: Final text or multi-modal content

This enables structured optimization, debugging, and analysis.

In [1]:
from t_prompts import dedent, prompt

## Creating an IntermediateRepresentation

Call `.ir()` on a StructuredPrompt to get its IntermediateRepresentation.

In [2]:
name = "Alice"
age = "30"
p = prompt(t"Name: {name:n}, Age: {age:a}")

# Get the IntermediateRepresentation
ir = p.ir()

print(f"IR type: {type(ir).__name__}")
print(f"Text: {ir.text}")
print(f"Number of chunks: {len(ir.chunks)}")

IR type: IntermediateRepresentation
Text: Name: Alice, Age: 30
Number of chunks: 4


## Understanding Chunks

The IR contains chunks - each chunk maps to exactly one source element.

In [3]:
# Examine each chunk
print("Chunks in the IR:\n")
for i, chunk in enumerate(ir.chunks):
    print(f"Chunk {i}:")
    print(f"  Type: {type(chunk).__name__}")
    print(f"  Text: {chunk.text!r}")
    print(f"  Element ID: {chunk.element_id}")
    print()

Chunks in the IR:

Chunk 0:
  Type: TextChunk
  Text: 'Name: '
  Element ID: f60923ad-309b-468f-b32b-18376b784699

Chunk 1:
  Type: TextChunk
  Text: 'Alice'
  Element ID: fb4736bb-ad88-4d18-b1f8-dc22d05e2a58

Chunk 2:
  Type: TextChunk
  Text: ', Age: '
  Element ID: 808704b4-a42f-44bc-ac13-e75112713148

Chunk 3:
  Type: TextChunk
  Text: '30'
  Element ID: bb68a556-3b0d-483c-92d8-9a34659c3093



## Mapping Chunks to Elements

Each chunk's `element_id` maps back to a specific element in the structured prompt.

In [4]:
from t_prompts import Static, TextInterpolation

# Show the correspondence between chunks and elements
print("Chunk → Element mapping:\n")

for i, chunk in enumerate(ir.chunks):
    # Find the element with this ID
    matching_elem = None
    for elem in p.children:
        if elem.id == chunk.element_id:
            matching_elem = elem
            break

    if matching_elem:
        elem_type = type(matching_elem).__name__
        if isinstance(matching_elem, Static):
            elem_desc = f"Static(key={matching_elem.key})"
        elif isinstance(matching_elem, TextInterpolation):
            elem_desc = f"TextInterpolation(key='{matching_elem.key}')"
        else:
            elem_desc = elem_type

        print(f"Chunk {i} ({chunk.text!r}) → {elem_desc}")

Chunk → Element mapping:

Chunk 0 ('Name: ') → Static(key=0)
Chunk 1 ('Alice') → TextInterpolation(key='n')
Chunk 2 (', Age: ') → Static(key=1)
Chunk 3 ('30') → TextInterpolation(key='a')


## Nested Prompts and Chunks

When prompts are nested, each element still produces its own chunks.

In [5]:
greeting = "Hello"
inner = prompt(t"{greeting:g}, world!")
outer = prompt(t"Message: {inner:msg}")

# Get IR for the outer prompt
ir_nested = outer.ir()

print(f"Text: {ir_nested.text}")
print(f"\nNumber of chunks: {len(ir_nested.chunks)}")
print("\nChunks:")
for i, chunk in enumerate(ir_nested.chunks):
    print(f"  {i}. {chunk.text!r}")

Text: Message: Hello, world!

Number of chunks: 3

Chunks:
  0. 'Message: '
  1. 'Hello'
  2. ', world!'


## CompiledIR for Efficient Queries

Call `.compile()` on an IR to build indexes for efficient element-to-chunks queries.

In [6]:
# Compile the IR
compiled = ir_nested.compile()

print(f"Compiled IR type: {type(compiled).__name__}")
print(f"Number of chunks: {len(compiled.ir.chunks)}")

Compiled IR type: CompiledIR
Number of chunks: 3


## Querying Chunks for a Subtree

Use `get_chunks_for_subtree(element_id)` to get all chunks from an element and its descendants.

In [7]:
# Get chunks for the entire outer prompt
all_chunks = compiled.get_chunks_for_subtree(outer.id)
print("All chunks for outer prompt:")
for chunk in all_chunks:
    print(f"  {chunk.text!r}")

# Get chunks for just the nested inner prompt
nested_chunks = compiled.get_chunks_for_subtree(outer['msg'].id)
print("\nChunks for nested 'msg' prompt:")
for chunk in nested_chunks:
    print(f"  {chunk.text!r}")

# Reconstruct text from chunks
nested_text = "".join(chunk.text for chunk in nested_chunks)
print(f"\nReconstructed text from nested chunks: {nested_text!r}")

All chunks for outer prompt:
  'Message: '
  'Hello'
  ', world!'

Chunks for nested 'msg' prompt:
  'Hello'
  ', world!'

Reconstructed text from nested chunks: 'Hello, world!'


## Use Case: Analyzing Chunk Sizes

Understanding chunk sizes helps with optimization and debugging.

In [8]:
system_msg = "You are a helpful assistant."
user_query = "What is Python?"
examples = "Example 1: Hello -> Bonjour\nExample 2: Goodbye -> Au revoir"

p = dedent(t"""
    System: {system_msg:sys}

    Examples:
    {examples:ex}

    User: {user_query:user}
    """)

ir_analysis = p.ir()

print("Chunk size analysis:\n")
total_size = 0
for i, chunk in enumerate(ir_analysis.chunks):
    size = len(chunk.text)
    total_size += size
    print(f"Chunk {i}: {size:3d} chars - {chunk.text[:30]!r}..." if size > 30 else
          f"Chunk {i}: {size:3d} chars - {chunk.text!r}")

print(f"\nTotal size: {total_size} characters")
print(f"Text length: {len(ir_analysis.text)} characters")
print(f"Match: {total_size == len(ir_analysis.text)}")

Chunk size analysis:

Chunk 0:   8 chars - 'System: '
Chunk 1:  28 chars - 'You are a helpful assistant.'
Chunk 2:  12 chars - '\n\nExamples:\n'
Chunk 3:  59 chars - 'Example 1: Hello -> Bonjour\nEx'...
Chunk 4:   8 chars - '\n\nUser: '
Chunk 5:  15 chars - 'What is Python?'

Total size: 130 characters
Text length: 130 characters
Match: True


## Use Case: Finding Elements by ID

Given a chunk's element_id, you can navigate back to the element in the tree.

In [9]:
def find_element_by_id(prompt, element_id):
    """Recursively search for an element by its ID."""
    from t_prompts import StructuredPrompt

    # Check each child
    for elem in prompt.children:
        if elem.id == element_id:
            return elem

        # If element is a nested StructuredPrompt, recurse
        if isinstance(elem, StructuredPrompt):
            result = find_element_by_id(elem, element_id)
            if result:
                return result

    return None

# Pick a chunk and find its element
first_chunk = ir_analysis.chunks[1]  # Get a non-static chunk
element = find_element_by_id(p, first_chunk.element_id)

if element:
    print(f"Chunk text: {first_chunk.text!r}")
    print(f"Element type: {type(element).__name__}")
    print(f"Element key: {element.key}")
    if hasattr(element, 'expression'):
        print(f"Element expression: {element.expression}")

Chunk text: 'You are a helpful assistant.'
Element type: TextInterpolation
Element key: sys
Element expression: system_msg


## Use Case: Selective Text Extraction

Extract text from specific parts of the prompt by querying chunks.

In [10]:
# Create a prompt with identifiable sections
header = "Task Overview"
body = "Please analyze the following data and provide insights."
footer = "Thank you for your help!"

section_prompt = dedent(t"""
    === {header:header} ===

    {body:body}

    ---
    {footer:footer}
    """)

section_ir = section_prompt.ir()
section_compiled = section_ir.compile()

# Extract text from just the body section
body_chunks = section_compiled.get_chunks_for_subtree(section_prompt['body'].id)
body_text = "".join(chunk.text for chunk in body_chunks)

print("Full text:")
print(section_ir.text)
print("\nExtracted body text:")
print(body_text)
print(f"\nBody is {len(body_text)} of {len(section_ir.text)} total characters")

Full text:
=== Task Overview===

Please analyze the following data and provide insights.

---
Thank you for your help!

Extracted body text:
Please analyze the following data and provide insights.

Body is 55 of 107 total characters


## Use Case: Token Budget Analysis

Analyze which parts of a prompt consume the most tokens (simulated with character counts).

In [11]:
# Create a complex prompt
instruction = "You are an expert translator."
context = "The user wants formal business translations."
examples_text = "\n".join([f"EN: Example {i} -> FR: Exemple {i}" for i in range(10)])

budget_prompt = dedent(t"""
    {instruction:inst}

    Context: {context:ctx}

    Examples:
    {examples_text:examples}

    Now translate the following:
    """)

budget_ir = budget_prompt.ir()
budget_compiled = budget_ir.compile()

# Analyze size by interpolation key
print("Token budget analysis (character counts as proxy):\n")

for key in budget_prompt.keys():
    elem = budget_prompt[key]
    chunks = budget_compiled.get_chunks_for_subtree(elem.id)
    size = sum(len(chunk.text) for chunk in chunks)
    percentage = (size / len(budget_ir.text)) * 100

    print(f"{key:12s}: {size:4d} chars ({percentage:5.1f}%)")

print(f"\nTotal: {len(budget_ir.text)} characters")

Token budget analysis (character counts as proxy):

inst        :   29 chars (  6.7%)
ctx         :   44 chars ( 10.1%)
examples    :  309 chars ( 71.0%)

Total: 435 characters


## Multi-modal Chunks with Images

IntermediateRepresentation supports both TextChunk and ImageChunk.

In [12]:
from PIL import Image

# Create a simple image
img = Image.new('RGB', (100, 100), color='blue')

description = "Here is a blue square"
img_prompt = dedent(t"""
    {description:desc}

    Image: {img:image}
    """)

img_ir = img_prompt.ir()

print("Chunks in multi-modal IR:\n")
for i, chunk in enumerate(img_ir.chunks):
    chunk_type = type(chunk).__name__
    if chunk_type == "TextChunk":
        print(f"  {i}. {chunk_type}: {chunk.text!r}")
    else:
        print(f"  {i}. {chunk_type}: {chunk.text}")

# Text representation includes image placeholders
print(f"\nText representation:\n{img_ir.text}")

Chunks in multi-modal IR:

  0. TextChunk: 'Here is a blue square'
  1. TextChunk: '\n\nImage: '
  2. ImageChunk: [Image: Unknown 100x100 RGB]

Text representation:
Here is a blue square

Image: [Image: Unknown 100x100 RGB]


## Summary

IntermediateRepresentation provides the bridge between structured prompts and rendered output:

✅ **Chunks** - Each chunk maps to exactly one source element  
✅ **Element IDs** - Track provenance from output back to structure  
✅ **CompiledIR** - Efficient queries for element subtrees  
✅ **Multi-modal** - Supports both text and image chunks  
✅ **Analysis** - Enable size analysis, optimization, and debugging  

This makes it possible to:
- Trace rendered text back to source variables
- Extract specific sections of complex prompts
- Analyze token budgets by component
- Implement structured optimization strategies