Skip to content

Conversation

@kxzk
Copy link
Collaborator

@kxzk kxzk commented Nov 23, 2025

Summary

Comprehensive refactor to align langfuse-ruby with the langfuse-js SDK architecture, adding distributed tracing propagation and scoring capabilities.

Files to Review

Important

1. Foundation Layer:

  • lib/langfuse/types.rb — Type definitions for observations/traces (foundation)
  • lib/langfuse/otel_attributes.rb — Serializes Ruby objects → OTel attributes (uses types)

2. Core Layer:

  • lib/langfuse/observations.rb — BaseObservation class with shared logic for all Observation types

3. Distributed Tracing (Context Propagation):

  • lib/langfuse/propagation.rb — Distributed tracing context propagation (uses otel_attributes)
  • lib/langfuse/span_processor.rb — Automatically applies propagated attributes to child spans (uses propagation)

4. Scoring Integration:

  • lib/langfuse/score_client.rb — Added OTel integration for score batching (score_active_observation, score_active_trace methods extract IDs from active spans)

5. Integration & API Files:

  • lib/langfuse.rb — High-level Langfuse.observe() interface and convenience methods (uses all above)
  • lib/langfuse/otel_setup.rb — Zero-boilerplate OpenTelemetry integration (sets up BatchSpanProcessor and SpanProcessor)
  • lib/langfuse/api_client.rb — Added retry logic for batch operations

Motivation

The original implementation diverged from langfuse-js patterns, creating API inconsistencies across SDKs. Users needed:

  • Distributed tracing: Context propagation across services/processes
  • Scoring infrastructure: Post-execution evaluation of traces and observations
  • Type safety: Comprehensive validation and type definitions
  • Zero-boilerplate tracing: Automatic instrumentation without manual span management

Changes

Core Architecture +8,000/-2,500 lines

  • Unified observation model: Single BaseObservation class with 10 specialized types (span, generation, event, agent, tool, chain, retriever, evaluator, guardrail, embedding) matching langfuse-js
  • OpenTelemetry integration: Langfuse.observe() API wrapping OTel spans with Langfuse semantics
  • Type system: Comprehensive validation layer lib/langfuse/types.rb) with 9 attribute classes covering all observation types
  • Shared logic extraction: Base class eliminates 400+ lines of duplication across observation types

Distributed Tracing lib/langfuse/propagation.rb, +471 lines

  • Context propagation: Langfuse.propagate_attributes() for user_id, session_id, metadata inheritance
  • Cross-service support: OpenTelemetry baggage integration for HTTP/gRPC trace continuation
  • Automatic inheritance: Child spans receive parent trace-level attributes without manual passing
  • Carrier utilities: inject_context()extract_context() for HTTP header propagation

Scoring System lib/langfuse/score_client.rb, +321 lines

  • Async batching: Thread-safe queue with configurable batch size (default: 100) and flush interval (15s)
  • Three data types: Numeric (0-1 float), boolean, categorical scores
  • Flexible targeting: Score traces OR specific observations (spans/generations)
  • Retry logic: 3 attempts with exponential backoff (2s → 4s → 8s)
  • Graceful shutdown: Flush pending scores on Langfuse.shutdown

API Enhancements

  • Retry mechanism: Batch ingestion retries with backoff on transient failures
  • Attribute serialization: Recursive hash flattening for nested metadata (depth limit: 10)
  • Error taxonomy: Categorized exceptions (validation, network, auth, rate-limit, server)

Documentation +2,400 lines across 7 new guides

  • docs/API_REFERENCE.md: Complete method signatures for all observation types
  • docs/CONFIGURATION.md: All config options with Rails/Rack examples
  • docs/ERROR_HANDLING.md: Error types, retry patterns, logging strategies
  • docs/SCORING.md: Scoring API with async patterns and best practices
  • docs/GETTING_STARTED.md: Installation through first trace
  • docs/PROMPTS.md: Prompt management migration from old API
  • AGENTS.md: AI agent development instructions extracted from CLAUDE.md

Breaking Changes

  • Deleted classes: Tracer, Trace, Span, Generation (replaced by unified BaseObservation)
  • API surface: Langfuse.observe() replaces Langfuse.trace()trace.span()span.generation()
  • Configuration: base_url default changed to https://us.cloud.langfuse.com (US region)

Testing

  • 688 test cases: 100% coverage of new propagation and scoring modules
  • 97.71% line coverage (1066/1091 lines covered)

Note

Project name updated to langfuse-rb since langfuse and langfuse-ruby not available on ruby gems


Langfuse UI Validation

Complex Workflow Screenshot 2025-11-22 at 08 48 43 ummary>
Distributed Trace Screenshot 2025-11-22 at 11 31 23
All Observations Screenshot 2025-11-22 at 11 54 52
Scores Screenshot 2025-11-22 at 20 06 30

Makefile

Makefile Overview Screenshot 2025-11-23 at 07 32 37

kxzk added 11 commits November 7, 2025 07:00
- Nested hashes were being serialized as JSON strings, losing structure
- Now preserves hierarchy through dot-notation keys for better queries
- Enables OpenTelemetry backends to filter on individual nested fields
- Improves code maintainability by separating concerns
- Makes recursive flattening logic more testable in isolation
- Reduces complexity in flatten_metadata method through extraction
- Separates agent-specific instructions from general project docs
- Prevents AI context pollution with implementation history
- Adds .env to gitignore for credential protection
- Streamlines onboarding by removing outdated architecture notes
- Eliminates 200+ lines of duplication across span, generation, and
  trace classes by centralizing common functionality
- Enables consistent behavior for all observation types (spans,
  generations, events, tools, chains, agents)
- Provides unified API for hierarchical tracing with block-based and
  stateful observation creation patterns
- Simplifies future observation type additions by inheriting from
  battle-tested base implementation
- Adds comprehensive test coverage (472 lines) for shared observation
  behaviors and edge cases
- Introduces development tooling (Makefile) for streamlined test/lint
  workflows
- Align documentation with Ruby conventions for hash keys
- Correct example from `totalCost` to `total_cost` for consistency
- Replaces imperative trace/span/generation API with zero-boilerplate `Langfuse.observe` that wraps blocks automatically
- Enables distributed tracing via OpenTelemetry propagation headers (B3 and W3C formats) for cross-service observability
- Consolidates observation types (generations, spans, chains, agents, tools, embeddings, retrievers, evaluators, guardrails, events) into unified interface reducing SDK surface area
- Adds ScoreClient for asynchronous score submission with type validation
- Implements custom SpanProcessor for direct Langfuse exporting without OTLP middleware reducing latency and complexity
- Migrates from manual state management to OpenTelemetry's battle-tested context propagation eliminating race conditions in concurrent environments
- Removes 4 obsolete classes (Trace, Tracer, BaseObservation, Generation, Span) reducing maintenance burden by ~850 LOC
- Provides automatic prompt-trace linking when using get_prompt within observe blocks improving prompt version tracking
- Batch ingestion now safely retries transient failures (429, 503,
  504, network errors) since operations are idempotent via unique
  event IDs
- Exponential backoff prevents overwhelming rate-limited or degraded
  services while ensuring events eventually reach Langfuse
- Failed batch sends after retry exhaustion now log specific status
  codes for easier debugging in production environments
- Comprehensive test coverage validates retry behavior for both
  network failures and HTTP status codes
- Split 750+ line README.md into 7 specialized guides (GETTING_STARTED,
  API_REFERENCE, CONFIGURATION, PROMPTS, SCORING, ERROR_HANDLING, TRACING)
  making SDK easier to navigate for new and experienced users
- Consolidate project guidance by merging CLAUDE.md content into AGENTS.md
  reducing maintenance overhead and creating single source of truth
- Expand CONTRIBUTING.md with Makefile commands, test patterns, and frozen
  string literal requirements ensuring consistent development practices
- Update ARCHITECTURE.md with retry logic, batch operations, and OpenTelemetry
  integration reflecting recent core changes
- Add scripts/ directory to RuboCop exclusions preventing lint errors on
  utility scripts
- Removes "being built from scratch" language that no longer reflects
  current project maturity
- Simplifies project description to focus on capabilities rather than
  development status
- Updates both AGENTS.md and CLAUDE.md for consistency
@kxzk kxzk requested review from NoahFisher and Copilot November 23, 2025 14:28
@kxzk kxzk added documentation Improvements or additions to documentation enhancement New feature or request labels Nov 23, 2025
Copilot finished reviewing on behalf of kxzk November 23, 2025 14:29
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This is a comprehensive refactor of langfuse-ruby SDK to align with langfuse-js architecture, introducing distributed tracing, scoring capabilities, and a unified observation API. The PR adds ~8,000 lines and removes ~2,500 lines, replacing the legacy Tracer/Trace/Span/Generation classes with a unified BaseObservation system and 10 specialized observation types.

Key changes:

  • Unified observation model with start_observation() API supporting 10 types (span, generation, event, embedding, agent, tool, chain, retriever, evaluator, guardrail)
  • Context propagation via OpenTelemetry baggage for distributed tracing
  • Async score batching with thread-safe queue and configurable flush intervals
  • Comprehensive type system with validation layer
  • Retry logic for batch operations with exponential backoff

Reviewed changes

Copilot reviewed 53 out of 55 changed files in this pull request and generated no comments.

Show a summary per file
File Description
lib/langfuse/types.rb Type definitions for observations/traces with attribute classes
lib/langfuse/observations.rb BaseObservation class and 10 specialized observation wrappers
lib/langfuse/otel_attributes.rb Serialization layer converting domain models to OTel attributes
lib/langfuse/propagation.rb Context propagation with baggage support for distributed tracing
lib/langfuse/span_processor.rb Custom processor applying propagated attributes to child spans
lib/langfuse/score_client.rb Thread-safe score batching with async flush timer
lib/langfuse/api_client.rb Added batch endpoint with retry logic for POST requests
lib/langfuse/client.rb Integrated score client with delegation methods
lib/langfuse/otel_setup.rb Added SpanProcessor to tracer provider
spec/* 688 test cases covering new functionality
langfuse.gemspec Renamed gem to langfuse-rb, updated metadata

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

- Establishes dedicated open-source communication channel
- Separates public gem support from internal developer contact
- Improves community contributor routing and response workflow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant