Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Rare Word F1 #3566

Merged
merged 7 commits into from Apr 14, 2021
Merged

Rare Word F1 #3566

merged 7 commits into from Apr 14, 2021

Conversation

spencerp
Copy link
Contributor

@spencerp spencerp commented Mar 31, 2021

Patch description
F1 can be gamed easily (either by humans or the model) by predicting common tokens irrespective of semantics.
As an attempt to prevent this, this PR introduces "Rare Word F1" that only gives credit for matching words that are infrequent relative to some reference corpus.

This is less susceptible to the adversarial scenario of a model that predicts the same thing over and over again, since it shouldn't be possible to find a set of words that is both rare and shows up often in the labels.

true_pos_score = sum(weighted_common.values())
if true_pos_score == 0:
return 0
precision = true_pos_score / sum(weights[w] for w in pred_items)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, this is really interesting, i had originally imagined weights to just be 1 if above the threshold, 0 otherwise, i wonder if you could make that a configurable option for the metric? this is a cool way of doing it too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad you like it! If you look at my plots of _rarity_weight in the PR description, you'll see that it rises fairly sharply after the chosen threshold. I was afraid that this might tend toward the simpler version (0 above and 1 below the threshold) in most cases, at which point the weighting would complicate the metric without adding much value. So I just opted for the simpler version for now.

But if there were a way of calculating the rarity weight which rose to 1.0 more gradually, we could try another version of this metric that doesn't have a "cutoff" and only has the weighting. That would feel much more elegant and would probably be more robust to different word distributions.

The issue I was running into in trying to find a function like that, though, was that the top few words of the distribution are so common relative to even the rest of the top 50 words that I have to find a function that pushes the majority of the range (let's say 0.0 to 0.99) down to near zero while expanding the last bit of the range (0.99 to 1.0) to a gradual slope from near-zero to 1.0. Otherwise fairly common words are still given a high weight.

@spencerp spencerp marked this pull request as ready for review April 9, 2021 17:58
@spencerp spencerp changed the title Rare word F1 Rare Word F1 Apr 12, 2021
Copy link
Contributor

@stephenroller stephenroller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this is a pretty narrow metric, I'd feel a bit better if the metric were moved to the teacher itself. Custom_evaluation in particular would be a good fit

@spencerp
Copy link
Contributor Author

Given this is a pretty narrow metric, I'd feel a bit better if the metric were moved to the teacher itself. Custom_evaluation in particular would be a good fit

We use this in another teacher in parlai_internal (which we plan to move to public soon). But I moved it to wizard for now.

Copy link
Contributor

@stephenroller stephenroller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya this looks quite right to me

@spencerp spencerp merged commit 337a688 into master Apr 14, 2021
@spencerp spencerp deleted the rare-word-f1 branch April 14, 2021 03:56
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants