Skip to content

feat: Integrate evaluation frameworks (RAGAS, DeepEval, Promptfoo) #4

@haasonsaas

Description

@haasonsaas

🚀 Feature Request

Description

Integrate popular LLM evaluation frameworks to automatically capture evaluation metrics in OpenTelemetry format, enabling unified observability across different evaluation tools.

Motivation

Teams use various evaluation frameworks for different purposes - RAGAS for RAG, DeepEval for self-explaining metrics, Promptfoo for security testing. eval2otel should seamlessly integrate with these tools to provide a unified observability layer.

Proposed Implementation

  1. RAGAS Integration

    import { fromRAGAS } from 'eval2otel/integrations';
    
    const ragasResult = await ragas.evaluate(dataset);
    const evalResult = fromRAGAS(ragasResult);
    eval2otel.processEvaluation(evalResult);
  2. DeepEval Integration

    import { DeepEvalAdapter } from 'eval2otel/integrations';
    
    const adapter = new DeepEvalAdapter(eval2otel);
    // Automatically captures DeepEval test results
    adapter.instrument();
  3. Promptfoo Integration

    import { PromptfooExporter } from 'eval2otel/integrations';
    
    // Export promptfoo results to OpenTelemetry
    const exporter = new PromptfooExporter(eval2otel);
    await exporter.exportResults('./promptfoo-output.json');
  4. Unified Metric Mapping

    • Map framework-specific metrics to OpenTelemetry conventions
    • Preserve original metric names as attributes
    • Support custom metric transformations
  5. Auto-Discovery

    • Detect installed evaluation frameworks
    • Automatically instrument when available
    • Configurable via environment variables

Metric Mappings

Framework Metric OpenTelemetry Metric
RAGAS faithfulness eval.ragas.faithfulness
RAGAS context_precision eval.ragas.context_precision
DeepEval G-Eval eval.deepeval.g_eval
DeepEval self_explanation eval.deepeval.explanation
Promptfoo security_score eval.promptfoo.security_score
Promptfoo injection_resistance eval.promptfoo.injection_resistance

Configuration

const eval2otel = createEval2Otel({
  serviceName: 'my-service',
  integrations: {
    ragas: { enabled: true, captureDataset: false },
    deepeval: { enabled: true, includeExplanations: true },
    promptfoo: { enabled: true, securityOnly: false }
  }
});

Benefits

  • Unified observability across evaluation tools
  • No need to modify existing evaluation code
  • Automatic metric standardization
  • Correlation between different evaluation types

References

Priority

Medium-High - Reduces integration friction and enables comprehensive evaluation tracking

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions