[BREAKING] Python: Enable instrumentation by default#5865
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Updates the Python Agent Framework and samples to reflect that instrumentation is now enabled by default, and shifts programmatic opt-in to a new enable_sensitive_telemetry() API for sensitive payload capture.
Changes:
- Make
ENABLE_INSTRUMENTATIONdefault to enabled (opt-out viafalse) and update docs/samples accordingly. - Replace most uses of
enable_instrumentation()withenable_sensitive_telemetry()for sensitive-data opt-in. - Update DevUI and Foundry integration points to reflect the new defaults/entry points.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| python/samples/README.md | Updates env var table to reference observability settings instead of enable_instrumentation() |
| python/samples/04-hosting/foundry-hosted-agents/responses/07_observability/agent.yaml | Removes ENABLE_INSTRUMENTATION from sample agent config; keeps sensitive-data opt-in |
| python/samples/04-hosting/foundry-hosted-agents/responses/07_observability/agent.manifest.yaml | Removes ENABLE_INSTRUMENTATION from manifest sample |
| python/samples/04-hosting/foundry-hosted-agents/responses/07_observability/README.md | Documents instrumentation default-on and opt-out |
| python/samples/04-hosting/foundry-hosted-agents/responses/07_observability/.env.example | Removes ENABLE_INSTRUMENTATION example line |
| python/samples/03-workflows/observability/executor_io_observation.py | Wording update to reflect default instrumentation |
| python/samples/02-agents/observability/advanced_manual_setup_console_output.py | Switches to enable_sensitive_telemetry() for sensitive data capture |
| python/samples/02-agents/observability/README.md | Updates guidance/code samples to default-on instrumentation + sensitive opt-in |
| python/samples/02-agents/observability/.env.example | Documents opt-out via ENABLE_INSTRUMENTATION=false |
| python/packages/lab/lightning/agent_framework_lab_lightning/init.py | Removes explicit enable_instrumentation() call from tracer init |
| python/packages/foundry/tests/foundry/test_foundry_chat_client.py | Updates mocks to enable_sensitive_telemetry() |
| python/packages/foundry/tests/foundry/test_foundry_agent.py | Updates mocks to enable_sensitive_telemetry() |
| python/packages/foundry/agent_framework_foundry/_chat_client.py | Calls enable_sensitive_telemetry() only when sensitive data is requested |
| python/packages/foundry/agent_framework_foundry/_agent.py | Calls enable_sensitive_telemetry() only when sensitive data is requested |
| python/packages/devui/frontend/src/components/layout/deployment-modal.tsx | Updates docker-compose snippet to reflect default-on instrumentation |
| python/packages/devui/agent_framework_devui/_server.py | Treats instrumentation as enabled unless explicitly set to false |
| python/packages/devui/agent_framework_devui/_executor.py | Updates comment to reflect new defaults/entry point |
| python/packages/devui/agent_framework_devui/init.py | Removes instrumentation_enabled parameter and related logic |
| python/packages/core/tests/core/test_observability.py | Updates tests to cover the new enable_sensitive_telemetry() behavior |
| python/packages/core/agent_framework/observability.py | Defaults instrumentation to enabled; replaces enable_instrumentation() with enable_sensitive_telemetry() |
| python/CODING_STANDARD.md | Updates import example to enable_sensitive_telemetry |
| python/.github/skills/python-development/SKILL.md | Updates skills doc sample import to enable_sensitive_telemetry |
| python/.env.example | Updates env example to reflect default-on instrumentation |
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 90%
✗ Correctness
The PR correctly implements the 'enable instrumentation by default' change in ObservabilitySettings and renames enable_instrumentation to enable_sensitive_telemetry. However, beyond the already-flaged issues in the review thread, the configure_azure_monitor() methods in both _agent.py and _chat_client.py have a security-relevant behavioral regression: the old code always actively set enable_sensitive_data to False by default (overriding the env var), whereas the new code is a no-op when enable_sensitive_data=False, meaning ENABLE_SENSITIVE_DATA=true in the environment will no longer be overidden, potentially leaking sensitive telemetry data in production.
✓ Security Reliability
The PR correctly changes the default for
enable_instrumentationfrom False to True, with sound env-var parsing logic. Sensitive data capture remains opt-in. All existing references toenable_instrumentationoutside the diff are either pytest fixtures or attribute-level monkeypatches onOBSERVABILITY_SETTINGS(not calls to the removed function), so they won't break. The six existing unresolved review comments already cover the significant concerns: the breaking removal ofenable_instrumentation()from the public API, theconfigure_azure_monitorregression whenenable_sensitive_data=False, the DevUI boolean parsing inconsistency, the deployment modal comment mismatch, and the DevUIserve()parameter removal. No additional security or reliability defects were found.
✓ Test Coverage
The PR renames
enable_instrumentation()toenable_sensitive_telemetry()and changes the default forenable_instrumentationfromFalsetoTrue. Test renames and adaptations are correct, but there are two notable test coverage gaps: (1) no test verifies the new defaultenable_instrumentation=Truewhen theENABLE_INSTRUMENTATIONenv var is unset — this is the core behavioral change of the PR, and (2) neither foundry test file covers theconfigure_azure_monitor(enable_sensitive_data=False)path, leaving the newif enable_sensitive_data:guard untested for the false branch.
✗ Design Approach
The main design issue is in the Agent Lightning integration: this PR removes the explicit instrumentation enablement from
AgentFrameworkTracer.init(), which means that a user who hasENABLE_INSTRUMENTATION=falsewill no longer get Agent Framework spans through the Lightning tracer even though that tracer is documented as the component that enables Agent Framework observability for Lightning.
Flagged Issues
- configure_azure_monitor() no longer overrides ENABLE_SENSITIVE_DATA=true from the environment when called with the default enable_sensitive_data=False. The old code always called enable_instrumentation(enable_sensitive_data=False), which forcefully set OBSERVABILITY_SETTINGS.enable_sensitive_data=False. The new
if enable_sensitive_data:guard skips the call entirely, leaving env-sourced enable_sensitive_data=True in effect. This is a security regression: sensitive data (prompts, tool arguments) may leak in production when users expect configure_azure_monitor() to suppress it. - AgentFrameworkTracer no longer guarantees Agent Framework telemetry is enabled for the Lightning integration. With ENABLE_INSTRUMENTATION=false, observability gates become no-ops, which contradicts the Lightning tracer's documented contract to enable OpenTelemetry observability for Agent-lightning and the README guidance that instantiating AgentFrameworkTracer() sends telemetry to Lightning.
Automated review by TaoChenOSU's agents
There was a problem hiding this comment.
Automated Code Review
Reviewers: 4 | Confidence: 90%
✓ Correctness
The PR correctly changes the default for
enable_instrumentationfrom False to True, adds a newenable_sensitive_telemetry()convenience function, gates expensive_capture_messagesserialization onspan.is_recording()as a performance optimization, and replacesassert_neverin the converter with a graceful fallback. All code changes are well-tested and semantically correct. The_chat_client.pyreformatting is semantically equivalent. No correctness bugs found.
✓ Security Reliability
This PR enables instrumentation by default, adds a new
enable_sensitive_telemetry()convenience API, and gates expensive_capture_messagesserialization onspan.is_recording(). From a security and reliability perspective, the changes are sound: sensitive data capture (enable_sensitive_data) remains off by default and is gated behindSENSITIVE_DATA_ENABLED(which requires bothenable_instrumentationANDenable_sensitive_datato be true). Thespan.is_recording()guards are a good reliability improvement that avoids unnecessary JSON serialization when no tracer provider is configured. Theassert_never→logger.debug+ skip change in_converter.pytrades strict type exhaustiveness for forward compatibility, which is a reasonable tradeoff. No security or reliability issues found.
✓ Test Coverage
The PR's headline behavioral change — defaulting
enable_instrumentationtoTrue— lacks a direct unit test onObservabilitySettings. The removedtest_enable_instrumentation_reads_env_sensitive_datatest leaves the env-var re-read fallback path ofenable_instrumentation()without coverage. The new warning logic in__init__is also untested. Thespan.is_recording()gating and the newenable_sensitive_telemetry()function are well-covered by the new tests.
✓ Design Approach
One design issue stands out. The ChatKit converter now silently drops any unsupported
ThreadItemvariant instead of surfacing that the converter no longer understands the thread payload. That conflicts with the converter’s own documented contract and can truncate conversation state without any caller-visible failure. I did not find another repo-backed design issue in the observability changes.
Automated review by TaoChenOSU's agents
Add a public disable_instrumentation() entry point so users can explicitly opt out of Agent Framework telemetry, with a sticky-disable flag that makes the user's intent "leading" — no framework code path (foundry's configure_azure_monitor, configure_otel_providers, enable_instrumentation, enable_sensitive_telemetry, or direct OBSERVABILITY_SETTINGS.enable_* writes) can re-enable instrumentation until the user explicitly clears the disable with enable_instrumentation(force=True) / enable_sensitive_telemetry(force=True). Also addresses the two remaining unresolved review threads on the PR: 1. test_observability_settings_defaults_instrumentation_true pins the new "ENABLE_INSTRUMENTATION defaults to True when env unset" behavior. 2. test_enable_instrumentation_reads_env_sensitive_data restores coverage for the post-import load_dotenv() fallback path. Implementation: - ObservabilitySettings.enable_instrumentation / enable_sensitive_data become properties backed by _enable_*. While _user_disabled is True, the getters return False and the setters drop True writes (defense in depth so third- party writes can't subvert the disable). - Public is_user_disabled read-only property lets integrations (e.g. foundry's configure_azure_monitor) cheaply check the disable state without poking at privates. - enable_instrumentation() and enable_sensitive_telemetry() short-circuit with an info log when disabled; gain a force=True kwarg that clears the disable. - configure_otel_providers() still creates providers / exporters / views so a later force-enable can use them, but logs an info message when called while disabled. - Foundry's FoundryChatClient.configure_azure_monitor and FoundryAgent.configure_azure_monitor early-return when the user has disabled, so Azure Monitor's global providers aren't installed unnecessarily. Tests: 11 new tests covering default-on, env re-read at call time, sticky behavior against each re-enable surface (enable_instrumentation, enable_sensitive_telemetry, configure_otel_providers, direct attribute writes), force=True override, re-arming the disable, and the __all__ export. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Pushed 1. New
|
Add a "Disabling instrumentation" section to the observability sample README that walks through: - The distinction between the ENABLE_INSTRUMENTATION env var (initial, non-sticky) and disable_instrumentation() (process-wide, sticky). - Why the sticky semantics matter: framework integrations like FoundryChatClient.configure_azure_monitor() can call enable_instrumentation() as part of their setup, and the user's opt-out needs to win. - All five surfaces guarded by the sticky disable (property reads, public enable functions, configure_otel_providers, direct attribute writes, is_user_disabled-aware integrations). - The force=True escape hatch on both enable_instrumentation() and enable_sensitive_telemetry(). - How third-party integrations should consult OBSERVABILITY_SETTINGS.is_user_disabled. - The limits of the disable (does not tear down existing providers / in-flight spans / third-party instrumentation, does not persist across processes). Cross-links the new section from the ENABLE_INSTRUMENTATION row in the env vars table. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…antees Replace 'no telemetry will be emitted no matter what' (which is too strong, since callers can still pass force=True or mutate private attributes) with language framing the disable as a user-intent contract that library and framework code is expected to honor: the framework actively short-circuits the public enable paths, force=True and private-attribute writes are acknowledged as out-of-contract escape hatches that integrations should not use on the user's behalf. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- opentelemetry-sdk is no longer a hard dependency; it is lazily imported by create_resource(), create_metric_views(), and configure_otel_providers() with a clear ImportError when missing. Day-to-day instrumentation works with opentelemetry-api alone provided some other component configures the global OpenTelemetry providers (Azure Monitor, an APM agent, application bootstrap, etc.). - opentelemetry-semantic-conventions-ai is no longer used anywhere in the source; remove it from the listed dependencies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nly relevant migration The old guide documented the move away from setup_observability(otlp_endpoint=...) which was an earlier-release API change unrelated to this PR and stale enough that it's more confusing than helpful at this point. Replace it with a short note on the single migration this PR introduces: callers of enable_instrumentation(enable_sensitive_data=True) should switch to enable_sensitive_telemetry(). Cross-link to the Disabling instrumentation section for the rare 'force on without enabling sensitive data' use case where enable_instrumentation() still applies. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Motivation and Context
Closes #5749
Description
Changes required to make the code base consistent throughout.
Contribution Checklist