Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZeroDivisionError: division by zero in AccuracyForLanguageGeneration._compute_single_pred_single_ref #122

Closed
NISH1001 opened this issue Mar 30, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@NISH1001
Copy link
Contributor

Describe the bug
I was running RobertaForQuestionAnswering on HuggingFace's squad-v2 train sets (~86k).
The Accuracy metric at AccuracyForLanguageGeneration._compute_single_pred_single_ref threw division by zero error.

image

To Reproduce

  • Use datasets squad-v2 train set.
  • Run the samples through pipeline("question-answering", ...)

Expected behavior
Run without error.

Exception Traceback (if available)
If applicable, add full traceback to help explain your problem.

ration.py:107, in AccuracyForLanguageGeneration._compute_single_pred_single_ref(self, predictions, references, reduce_fn, **kwargs)
    105         if token in ref_counts:
    106             score += min(pred_count, ref_counts[token])  # Intersection count
--> 107     scores.append(score / max(len(pred), len(ref)))
    108 avg_score = sum(scores) / len(scores)
    109 return {"score": avg_score}

ZeroDivisionError: division by zero

Environment Information:

  • OS: Mac OS 13.2.1 (22D68)
  • jury version: 2.2.3
  • evaluate version: evaluate==0.2.2
  • datasets version: datasets==2.11.0

Thanks. Appreciate jury to exist. I could patch this by cloning and doing in-depth trace analysis. But, I wanted to know if there's a better way to patch this.

@NISH1001
Copy link
Contributor Author

Re:
I found the issue. It's during the AccuracyForLanguageGeneration._tokenize(...) process which is stripping off some texts. such as when both the predictions and references are just literal '$':
image

@NISH1001
Copy link
Contributor Author

Re: I was able to patch it under try/catch block here:
NISH1001@6bdf680

Should I send a PR? I don't know if need to just throw a warning or also show the original <value> for either of the pred/ref.

@devrimcavusoglu
Copy link
Member

Hi @NISH1001, thanks for the heads-up, and also thanks for your comments, it is appreciated. I'll look into the PR asap.

@devrimcavusoglu devrimcavusoglu added the bug Something isn't working label Mar 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants