Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swapped labels in CommitmentBank #1346

Open
davda54 opened this issue Feb 3, 2022 · 1 comment
Open

Swapped labels in CommitmentBank #1346

davda54 opened this issue Feb 3, 2022 · 1 comment

Comments

@davda54
Copy link

davda54 commented Feb 3, 2022

Describe the bug
The downloaded files for the CB task have swapped labels. This introduces a nasty silent bug because all the metrics calculated using these data seem correct but the fine-tuned model is actually predicting nonsense.

contradiction is mapped to entailment
entailment to neutral
neutral to contradiction

To Reproduce
I used Jiant 2.2.0 available via pip3 install jiant.

import json
from collections import Counter
import jiant.scripts.download_data.runscript as downloader

# Download Jiant CB
downloader.download_data(["cb"], "tasks")
with open("tasks/data/cb/train.jsonl") as f:
    jiant_freqs = Counter([json.loads(line)["label"] for line in f.readlines()])
print(jiant_freqs.most_common())

>>> [('entailment', 119), ('neutral', 115), ('contradiction', 16)]

# Download official CB from SuperGLUE
!wget "https://dl.fbaipublicfiles.com/glue/superglue/data/v2/CB.zip"
!unzip CB.zip
with open("CB/train.jsonl") as f:
    official_freqs = Counter([json.loads(line)["label"] for line in f.readlines()])
print(official_freqs.most_common())

>>> [('contradiction', 119), ('entailment', 115), ('neutral', 16)]
@zphang
Copy link
Collaborator

zphang commented Feb 3, 2022

Thanks! #1347

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants