Skip to content

Feat/observability implementation#27

Merged
nadeem4 merged 3 commits intomainfrom
feat/observability-implementation
Jan 14, 2026
Merged

Feat/observability implementation#27
nadeem4 merged 3 commits intomainfrom
feat/observability-implementation

Conversation

@nadeem4
Copy link
Copy Markdown
Owner

@nadeem4 nadeem4 commented Jan 14, 2026

This pull request introduces a comprehensive observability and audit logging framework across the codebase, enhancing production readiness, compliance, and debuggability. The main improvements include OpenTelemetry-based metrics and tracing, structured JSON logging with tenant and trace correlation, and a persistent forensic audit log. Several configuration options and documentation updates have been added to support these features.

Observability & Audit Logging Enhancements

  • Introduced OpenTelemetry metrics and tracing:

    • Added support for node execution histograms and token usage counters, configurable via the OBSERVABILITY_EXPORTER and OTEL_EXPORTER_OTLP_ENDPOINT settings (metrics.py, settings.py, configuration.md, pyproject.toml). [1] [2] [3] [4]
    • Updated documentation to describe metrics, tracing, and visualization options for both local and production environments (observability.md, README.md). [1] [2] [3]
  • Structured logging improvements:

    • Enhanced the logger to inject both trace_id and tenant_id into all log records, and updated the JSON formatter to include these fields (logger.py). [1] [2] [3]
    • Logging is now automatically configured for JSON output when observability is enabled (settings.py).
  • Persistent audit log:

    • Added a dedicated EventLogger class that writes sanitized, structured JSON events to a rotating log file for forensic analysis and compliance (event_logger.py).
    • New configuration option AUDIT_LOG_PATH allows customization of the audit log location (settings.py, configuration.md). [1] [2]

Documentation & Planning

  • Added and updated documentation:
    • Expanded observability.md with detailed instructions, event structures, and example log entries.
    • Added a remediation plan specifically for observability, outlining enhancements, implementation status, and success metrics (remediation_plan_observability.md).
    • Updated the main README.md to document the new Observability Plane and its responsibilities. [1] [2]
    • Updated the main remediation plan to reflect new observability and logging action items.

Configuration & Dependency Updates

  • Added new OpenTelemetry exporter dependency to the core package (pyproject.toml).
  • Updated LLM model default in demo config for demonstration purposes (llm.demo.yaml).

These changes collectively provide a robust foundation for monitoring, compliance, and operational insight in production environments.

…, and audit logs

- Implemented OpenTelemetry integration in metrics.py and monitor.py

- Enabled structured JSON logging with TraceContextFilter

- Implemented UserContext schema and tenant_id propagation in logs

- Added EventLogger for persistent audit logging

- Updated documentation (README, ops/observability, ops/configuration)

- Added verification tests
@nadeem4 nadeem4 merged commit 2b9bb1a into main Jan 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant