Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix context coverage metric input #33

Merged
merged 2 commits into from
Feb 17, 2024
Merged

fix context coverage metric input #33

merged 2 commits into from
Feb 17, 2024

Conversation

yisz
Copy link
Contributor

@yisz yisz commented Feb 17, 2024

Updated input from answer to ground_truth answers


Ellipsis 馃殌 This PR description was created by Ellipsis for commit 4839e9e.

Summary:

This PR updates the calculate method in LLMBasedContextCoverage to iterate over ground_truths instead of answer, returning the maximum score among all ground truths, and updates the corresponding documentation.

Key points:

  • Updated calculate method in LLMBasedContextCoverage class in continuous_eval/metrics/retrieval_LLM_based_metrics.py.
  • Replaced answer input with ground_truths and iterated over it to calculate scores.
  • Method now returns maximum score among all ground truths.
  • In case of error during calculation, score is set to -1.0.
  • Updated documentation in docs/src/content/docs/metrics/Retrieval/LLM-Based/llm_context_coverage.md.

Generated with 鉂わ笍 by ellipsis.dev

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes requested. Reviewed entire PR up to commit 0c03170

Reviewed 123 lines of code across 2 files in 48 second(s).

See details
  • Skipped files: 0
  • Confidence threshold: 50%
  • Drafted 0 additional comments.
  • Workflow ID: wflow_F7pmg52mIxhCRikk

Something look wrong? You can customize Ellipsis by editing the ellipsis.yaml for this repository.

Generated with 鉂わ笍 by ellipsis.dev

}
)

scores.append(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a logical bug in the code. The 'scores' list is appended with the coverage score for each ground truth answer, but it is also appended with a score of -1.0 whenever an exception is raised. This means that for each ground truth answer, two scores are added to the 'scores' list, which is not the intended behavior. The score of -1.0 should only be added when an exception is raised, and not for every ground truth answer. Consider adding an 'else' clause to the 'try-except' block to append the coverage score only when no exception is raised.

Suggested change
scores.append(
try:
coverage = self.extract_attributed_from_broken_json(content)
except Exception as e:
print(f"{type(e).__name__} Error: {content}, skipping")
scores.append(
{
"LLM_based_context_coverage": -1.0,
"LLM_based_context_statements": content,
}
)
else:
scores.append(
{
"LLM_based_context_coverage": coverage,
"LLM_based_context_statements": content,
}
)

Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Incremental review on commit c4110e3

Reviewed 25 lines of code across 1 files in 1 minute(s) and 1 second(s).

See details
  • Skipped files: 0
  • Confidence threshold: 50%
  • Drafted 1 additional comments.
  • Workflow ID: wflow_dVavXZaarPM6EZPq
View 1 draft comments

These comments were drafted by Ellipsis, but were filtered out of the final review. They're included here so you can see our internal thought process and help you configure your ellipsis.yaml.

Drafted 1 comments under confidence threshold

Filtered comment at continuous_eval/metrics/retrieval_LLM_based_metrics.py:137

Confidence changes required: 50%

Commentary: The PR changes seem to be in line with the description provided. The calculate method in the LLMBasedContextCoverage class now iterates over ground_truths instead of answer. The maximum score among all ground truths is returned. In case of an error during calculation, the score is set to -1.0. The changes seem to be logically correct and I don't see any clear violations of best practices, logical bugs, performance bugs, or security bugs. However, I would like to point out that the error handling could be improved. Currently, the error message is printed to the console and a score of -1.0 is returned. It would be better to log the error message instead of printing it to the console. This would allow for better tracking and debugging of errors.

Consider logging the error message instead of printing it to the console. This would allow for better tracking and debugging of errors.

import logging

logger = logging.getLogger(__name__)

try:
    coverage = self.extract_attributed_from_broken_json(content)
except Exception as e:
    logger.error(f'{type(e).__name__} Error: {content}, skipping')
    scores.append(
        {
            'LLM_based_context_coverage': -1.0,
            'LLM_based_context_statements': content,
        }
    )

Something look wrong? You can customize Ellipsis by editing the ellipsis.yaml for this repository.

Generated with 鉂わ笍 by ellipsis.dev

@yisz yisz merged commit 4839e9e into main Feb 17, 2024
2 checks passed
@yisz yisz deleted the fix/context-coverage branch February 17, 2024 20:36
Copy link
Contributor

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Incremental review on commit 4839e9e

Reviewed 123 lines of code across 2 files in 42 second(s).

See details
  • Skipped files: 0
  • Confidence threshold: 50%
  • Drafted 1 additional comments.
  • Workflow ID: wflow_OcyZNVr1YEyhL7wn
View 1 draft comments

These comments were drafted by Ellipsis, but were filtered out of the final review. They're included here so you can see our internal thought process and help you configure your ellipsis.yaml.

Drafted 1 comments under confidence threshold

Filtered comment at continuous_eval/metrics/retrieval_LLM_based_metrics.py:88

Confidence changes required: 0%

Commentary: The PR changes the input from 'answer' to 'ground_truths' in the 'calculate' method of the 'LLMBasedContextCoverage' class. This change makes sense because the method is supposed to calculate the context coverage score for each ground truth answer and return the maximum score. The code correctly iterates over the ground truth answers, calculates the context coverage score for each, and returns the maximum score. The error handling also seems appropriate, as it sets the score to -1.0 if there's an error during the calculation. The changes in the documentation are also correct and reflect the changes in the code.

The changes in this method are correct. It now correctly calculates the context coverage score for each ground truth answer and returns the maximum score. The error handling is also appropriate.


Something look wrong? You can customize Ellipsis by editing the ellipsis.yaml for this repository.

Generated with 鉂わ笍 by ellipsis.dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant