Skip to content

Python: AgentTelemetryLayer records gen_ai.system_instructions before ContextProvider.before_run() extends them #5291

@benke520

Description

@benke520

Description

AgentTelemetryLayer._trace_agent_invocation() captures gen_ai.system_instructions from merged_options before calling execute(), which is where AgentMiddlewareLayer runs ContextProvider.before_run() and processes context.extend_instructions() calls.

This means the telemetry span only records the base agent instructions, not the dynamically extended instructions from context providers.

Reproduction

class UserMemoryProvider(ContextProvider):
    async def before_run(self, *, agent, session, context, state, **kwargs):
        context.extend_instructions(self.source_id, "The user's name is Alice.")

agent = Agent(
    client=client,
    name="MemoryAgent",
    instructions="You are a friendly assistant.",
    context_providers=[UserMemoryProvider()],
)
session = agent.create_session()
result = await agent.run("Hello", session=session)

Expected

gen_ai.system_instructions in the invoke_agent span contains:

[{"type": "text", "content": "You are a friendly assistant."}, {"type": "text", "content": "The user's name is Alice."}]

Actual

gen_ai.system_instructions only contains the base instructions:

[{"type": "text", "content": "You are a friendly assistant."}]

Root Cause

In observability.py, AgentTelemetryLayer._trace_agent_invocation() — both the streaming and non-streaming paths capture system_instructions before execute() is called:

# Non-streaming path (~line 1636):
with _get_span(...) as span:
    _capture_messages(
        span=span,
        messages=messages,
        system_instructions=_get_instructions_from_options(dict(merged_options)),  # base only
    )
    response = await execute()  # context providers run HERE, too late

The call order is:

AgentTelemetryLayer:
  1. Create span
  2. Record gen_ai.system_instructions  <-- only base instructions
  3. Call execute() -->
     AgentMiddlewareLayer:
       4. context_provider.before_run()  <-- extends instructions HERE
       5. Call LLM
       6. context_provider.after_run()
  7. End span

Suggested Fix

Move the _capture_messages() call to after execute() returns (or at minimum, re-capture system_instructions post-execute). Alternatively, the middleware layer could update the span attribute after providers run.

Current Workaround

Add a ContextProvider as the last provider that patches the span manually:

from opentelemetry import trace

class InstructionsTelemetryFixProvider(ContextProvider):
    def __init__(self):
        super().__init__("telemetry_fix")

    async def before_run(self, *, agent, session, context, state, **kwargs):
        span = trace.get_current_span()
        if span.is_recording():
            all_instructions = [agent.instructions] if isinstance(agent.instructions, str) else list(agent.instructions)
            if context.instructions:
                all_instructions.extend(context.instructions)
            otel = [{"type": "text", "content": i} for i in all_instructions]
            span.set_attribute("gen_ai.system_instructions", json.dumps(otel))

agent = Agent(
    ...,
    context_providers=[UserMemoryProvider(), InstructionsTelemetryFixProvider()],  # must be LAST
)

Environment

  • agent-framework: 1.0.1
  • opentelemetry-sdk: 1.40
  • Python: 3.10
  • OS: Windows

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions