-
Notifications
You must be signed in to change notification settings - Fork 4
Implement observe API with context propagation and scoring #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Implement observe API with context propagation and scoring #31
Conversation
- Nested hashes were being serialized as JSON strings, losing structure - Now preserves hierarchy through dot-notation keys for better queries - Enables OpenTelemetry backends to filter on individual nested fields
- Improves code maintainability by separating concerns - Makes recursive flattening logic more testable in isolation - Reduces complexity in flatten_metadata method through extraction
- Separates agent-specific instructions from general project docs - Prevents AI context pollution with implementation history - Adds .env to gitignore for credential protection - Streamlines onboarding by removing outdated architecture notes
- Eliminates 200+ lines of duplication across span, generation, and trace classes by centralizing common functionality - Enables consistent behavior for all observation types (spans, generations, events, tools, chains, agents) - Provides unified API for hierarchical tracing with block-based and stateful observation creation patterns - Simplifies future observation type additions by inheriting from battle-tested base implementation - Adds comprehensive test coverage (472 lines) for shared observation behaviors and edge cases - Introduces development tooling (Makefile) for streamlined test/lint workflows
- Align documentation with Ruby conventions for hash keys - Correct example from `totalCost` to `total_cost` for consistency
- Replaces imperative trace/span/generation API with zero-boilerplate `Langfuse.observe` that wraps blocks automatically - Enables distributed tracing via OpenTelemetry propagation headers (B3 and W3C formats) for cross-service observability - Consolidates observation types (generations, spans, chains, agents, tools, embeddings, retrievers, evaluators, guardrails, events) into unified interface reducing SDK surface area - Adds ScoreClient for asynchronous score submission with type validation - Implements custom SpanProcessor for direct Langfuse exporting without OTLP middleware reducing latency and complexity - Migrates from manual state management to OpenTelemetry's battle-tested context propagation eliminating race conditions in concurrent environments - Removes 4 obsolete classes (Trace, Tracer, BaseObservation, Generation, Span) reducing maintenance burden by ~850 LOC - Provides automatic prompt-trace linking when using get_prompt within observe blocks improving prompt version tracking
- Batch ingestion now safely retries transient failures (429, 503, 504, network errors) since operations are idempotent via unique event IDs - Exponential backoff prevents overwhelming rate-limited or degraded services while ensuring events eventually reach Langfuse - Failed batch sends after retry exhaustion now log specific status codes for easier debugging in production environments - Comprehensive test coverage validates retry behavior for both network failures and HTTP status codes
- Split 750+ line README.md into 7 specialized guides (GETTING_STARTED, API_REFERENCE, CONFIGURATION, PROMPTS, SCORING, ERROR_HANDLING, TRACING) making SDK easier to navigate for new and experienced users - Consolidate project guidance by merging CLAUDE.md content into AGENTS.md reducing maintenance overhead and creating single source of truth - Expand CONTRIBUTING.md with Makefile commands, test patterns, and frozen string literal requirements ensuring consistent development practices - Update ARCHITECTURE.md with retry logic, batch operations, and OpenTelemetry integration reflecting recent core changes - Add scripts/ directory to RuboCop exclusions preventing lint errors on utility scripts
- Removes "being built from scratch" language that no longer reflects current project maturity - Simplifies project description to focus on capabilities rather than development status - Updates both AGENTS.md and CLAUDE.md for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This is a comprehensive refactor of langfuse-ruby SDK to align with langfuse-js architecture, introducing distributed tracing, scoring capabilities, and a unified observation API. The PR adds ~8,000 lines and removes ~2,500 lines, replacing the legacy Tracer/Trace/Span/Generation classes with a unified BaseObservation system and 10 specialized observation types.
Key changes:
- Unified observation model with
start_observation()API supporting 10 types (span, generation, event, embedding, agent, tool, chain, retriever, evaluator, guardrail) - Context propagation via OpenTelemetry baggage for distributed tracing
- Async score batching with thread-safe queue and configurable flush intervals
- Comprehensive type system with validation layer
- Retry logic for batch operations with exponential backoff
Reviewed changes
Copilot reviewed 53 out of 55 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| lib/langfuse/types.rb | Type definitions for observations/traces with attribute classes |
| lib/langfuse/observations.rb | BaseObservation class and 10 specialized observation wrappers |
| lib/langfuse/otel_attributes.rb | Serialization layer converting domain models to OTel attributes |
| lib/langfuse/propagation.rb | Context propagation with baggage support for distributed tracing |
| lib/langfuse/span_processor.rb | Custom processor applying propagated attributes to child spans |
| lib/langfuse/score_client.rb | Thread-safe score batching with async flush timer |
| lib/langfuse/api_client.rb | Added batch endpoint with retry logic for POST requests |
| lib/langfuse/client.rb | Integrated score client with delegation methods |
| lib/langfuse/otel_setup.rb | Added SpanProcessor to tracer provider |
| spec/* | 688 test cases covering new functionality |
| langfuse.gemspec | Renamed gem to langfuse-rb, updated metadata |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Establishes dedicated open-source communication channel - Separates public gem support from internal developer contact - Improves community contributor routing and response workflow
Summary
Comprehensive refactor to align langfuse-ruby with the langfuse-js SDK architecture, adding distributed tracing propagation and scoring capabilities.
Files to Review
Important
1. Foundation Layer:
lib/langfuse/types.rb— Type definitions for observations/traces (foundation)lib/langfuse/otel_attributes.rb— Serializes Ruby objects → OTel attributes (uses types)2. Core Layer:
lib/langfuse/observations.rb— BaseObservation class with shared logic for all Observation types3. Distributed Tracing (Context Propagation):
lib/langfuse/propagation.rb— Distributed tracing context propagation (uses otel_attributes)lib/langfuse/span_processor.rb— Automatically applies propagated attributes to child spans (uses propagation)4. Scoring Integration:
lib/langfuse/score_client.rb— Added OTel integration for score batching (score_active_observation,score_active_tracemethods extract IDs from active spans)5. Integration & API Files:
lib/langfuse.rb— High-levelLangfuse.observe()interface and convenience methods (uses all above)lib/langfuse/otel_setup.rb— Zero-boilerplate OpenTelemetry integration (sets up BatchSpanProcessor and SpanProcessor)lib/langfuse/api_client.rb— Added retry logic for batch operationsMotivation
The original implementation diverged from langfuse-js patterns, creating API inconsistencies across SDKs. Users needed:
Changes
Core Architecture
+8,000/-2,500linesBaseObservationclass with 10 specialized types (span, generation, event, agent, tool, chain, retriever, evaluator, guardrail, embedding) matching langfuse-jsLangfuse.observe()API wrapping OTel spans with Langfuse semanticslib/langfuse/types.rb) with 9 attribute classes covering all observation typesDistributed Tracing
lib/langfuse/propagation.rb,+471linesLangfuse.propagate_attributes()for user_id, session_id, metadata inheritanceinject_context()extract_context()for HTTP header propagationScoring System
lib/langfuse/score_client.rb,+321linesLangfuse.shutdownAPI Enhancements
Documentation
+2,400lines across 7 new guidesdocs/API_REFERENCE.md: Complete method signatures for all observation typesdocs/CONFIGURATION.md: All config options with Rails/Rack examplesdocs/ERROR_HANDLING.md: Error types, retry patterns, logging strategiesdocs/SCORING.md: Scoring API with async patterns and best practicesdocs/GETTING_STARTED.md: Installation through first tracedocs/PROMPTS.md: Prompt management migration from old APIAGENTS.md: AI agent development instructions extracted fromCLAUDE.mdBreaking Changes
Tracer,Trace,Span,Generation(replaced by unifiedBaseObservation)Langfuse.observe()replacesLangfuse.trace()trace.span()span.generation()base_urldefault changed tohttps://us.cloud.langfuse.com(US region)Testing
Note
Project name updated to
langfuse-rbsince langfuse and langfuse-ruby not available on ruby gemsLangfuse UI Validation
Complex Workflow
Distributed Trace
All Observations
Scores
Makefile
Makefile Overview