You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm training the bert gector model (using train.py) on an edit dataset similar to those found in the gector paper (ie nucle3.3 or conll14), but the model's predictions degenerate to predicting $KEEP for every token. This minimizes the loss, since most of the labels are $KEEP, but doesn't induce any learning in the model. Usually, this is solved by re-weighting the class losses to correct for class imbalance, but that wasn't done in your implementation.
How did you originally resolve this?
The text was updated successfully, but these errors were encountered:
We had not faced such a problem in our experiments. (after some number of updates model start to produce other tags as well)
I think that the following could be helpful.
Exclude true negatives from the data (tn_prob=0) during the pretraining stage.
Use bigger batch_size (at least 128; better 256).
Use more data if possible.
Freeze encoder weights during the first couple of epochs (cold_step_count in [2,4])
Additionally, you could modify the mask in order to make weights for KEEP operation lower.
Something like this:
I'm training the bert gector model (using train.py) on an edit dataset similar to those found in the gector paper (ie nucle3.3 or conll14), but the model's predictions degenerate to predicting $KEEP for every token. This minimizes the loss, since most of the labels are $KEEP, but doesn't induce any learning in the model. Usually, this is solved by re-weighting the class losses to correct for class imbalance, but that wasn't done in your implementation.
How did you originally resolve this?
The text was updated successfully, but these errors were encountered: