-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Migrate factual correctness #2401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Migrate factual correctness #2401
Conversation
anistark
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ascore implementation executes verification sequentially.
We can probably execute these in parallel, which is sequential right now:
- Verify response claims against reference
- Verify reference claims against response
thoughts?
| statements: List[StatementFaithfulnessAnswer] | ||
|
|
||
|
|
||
| def claim_decomposition_prompt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this method here or can we place it somewhere in common ragas.prompt.metrics.common
|
|
||
| # Ensure implementations give reasonably similar scores | ||
| # Factual correctness may have more variation due to claim decomposition and different LLM behavior | ||
| assert score_diff < 0.35, ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
35% tolerance is too high, no? Is it intended? Can it be lowered to 10-15 % ?
|
|
||
| return MetricResult(value=float(np.round(score, 2))) | ||
|
|
||
| async def _decompose_claims(self, response: str) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're missing callbacks in this and _verify_claims methods.
Is it intended? Callbacks would help in analysis and tracing.
No description provided.