Skip to content

Added pairwise evaluation for the dimension - rigor [#33]#41

Merged
HamedBabaei merged 5 commits into
devfrom
feature/integrate-llm-as-judge-rubrics
Apr 22, 2026
Merged

Added pairwise evaluation for the dimension - rigor [#33]#41
HamedBabaei merged 5 commits into
devfrom
feature/integrate-llm-as-judge-rubrics

Conversation

@MikeACedric
Copy link
Copy Markdown
Collaborator

This PR references the issue [#33] and includes the following changes:

Changes:

Prompts – Integrated pairwise prompt templates for the deep research dimension: Rigor that includes the rubrics EpistemicCalibration, QuantitativeEvidenceAndUncertainty, and ExplicitUncertainty.

Examples – Introduced NLP & Ecology-specific pairwise examples for each rubric.

Documentation – Added the pairwise evaluation information for the dimension - Rigor.

@MikeACedric MikeACedric requested a review from HamedBabaei April 21, 2026 14:45
@MikeACedric MikeACedric self-assigned this Apr 21, 2026
Copy link
Copy Markdown
Member

@HamedBabaei HamedBabaei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MikeACedric, everything seems to be fine, but to make sure we are not breaking any other pipeline, I would recommend writing a unittest for injector modules.

Comment thread yescieval/injector/example.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants