🚀 Feature Request
Description
Integrate popular LLM evaluation frameworks to automatically capture evaluation metrics in OpenTelemetry format, enabling unified observability across different evaluation tools.
Motivation
Teams use various evaluation frameworks for different purposes - RAGAS for RAG, DeepEval for self-explaining metrics, Promptfoo for security testing. eval2otel should seamlessly integrate with these tools to provide a unified observability layer.
Proposed Implementation
-
RAGAS Integration
import { fromRAGAS } from 'eval2otel/integrations';
const ragasResult = await ragas.evaluate(dataset);
const evalResult = fromRAGAS(ragasResult);
eval2otel.processEvaluation(evalResult);
-
DeepEval Integration
import { DeepEvalAdapter } from 'eval2otel/integrations';
const adapter = new DeepEvalAdapter(eval2otel);
// Automatically captures DeepEval test results
adapter.instrument();
-
Promptfoo Integration
import { PromptfooExporter } from 'eval2otel/integrations';
// Export promptfoo results to OpenTelemetry
const exporter = new PromptfooExporter(eval2otel);
await exporter.exportResults('./promptfoo-output.json');
-
Unified Metric Mapping
- Map framework-specific metrics to OpenTelemetry conventions
- Preserve original metric names as attributes
- Support custom metric transformations
-
Auto-Discovery
- Detect installed evaluation frameworks
- Automatically instrument when available
- Configurable via environment variables
Metric Mappings
| Framework |
Metric |
OpenTelemetry Metric |
| RAGAS |
faithfulness |
eval.ragas.faithfulness |
| RAGAS |
context_precision |
eval.ragas.context_precision |
| DeepEval |
G-Eval |
eval.deepeval.g_eval |
| DeepEval |
self_explanation |
eval.deepeval.explanation |
| Promptfoo |
security_score |
eval.promptfoo.security_score |
| Promptfoo |
injection_resistance |
eval.promptfoo.injection_resistance |
Configuration
const eval2otel = createEval2Otel({
serviceName: 'my-service',
integrations: {
ragas: { enabled: true, captureDataset: false },
deepeval: { enabled: true, includeExplanations: true },
promptfoo: { enabled: true, securityOnly: false }
}
});
Benefits
- Unified observability across evaluation tools
- No need to modify existing evaluation code
- Automatic metric standardization
- Correlation between different evaluation types
References
Priority
Medium-High - Reduces integration friction and enables comprehensive evaluation tracking
🚀 Feature Request
Description
Integrate popular LLM evaluation frameworks to automatically capture evaluation metrics in OpenTelemetry format, enabling unified observability across different evaluation tools.
Motivation
Teams use various evaluation frameworks for different purposes - RAGAS for RAG, DeepEval for self-explaining metrics, Promptfoo for security testing. eval2otel should seamlessly integrate with these tools to provide a unified observability layer.
Proposed Implementation
RAGAS Integration
DeepEval Integration
Promptfoo Integration
Unified Metric Mapping
Auto-Discovery
Metric Mappings
Configuration
Benefits
References
Priority
Medium-High - Reduces integration friction and enables comprehensive evaluation tracking