Training loss decreases, but model doesn't learn #19

sabetAI · 2020-07-11T23:19:01Z

I'm training the bert gector model (using train.py) on an edit dataset similar to those found in the gector paper (ie nucle3.3 or conll14), but the model's predictions degenerate to predicting $KEEP for every token. This minimizes the loss, since most of the labels are $KEEP, but doesn't induce any learning in the model. Usually, this is solved by re-weighting the class losses to correct for class imbalance, but that wasn't done in your implementation.

How did you originally resolve this?

komelianchuk · 2020-07-15T17:04:27Z

Hi, @sabetAI.
Sorry for the slow reply.

We had not faced such a problem in our experiments. (after some number of updates model start to produce other tags as well)
I think that the following could be helpful.

Exclude true negatives from the data (tn_prob=0) during the pretraining stage.
Use bigger batch_size (at least 128; better 256).
Use more data if possible.
Freeze encoder weights during the first couple of epochs (cold_step_count in [2,4])

Additionally, you could modify the mask in order to make weights for KEEP operation lower.
Something like this:

keep_bias = -0.5
weights = (labels == self.keep_index).long() * (keep_bias) + mask
loss_labels = sequence_cross_entropy_with_logits(logits_labels, labels, weights, label_smoothing=self.label_smoothing)

I hope that this will be useful to you.

skurzhanskyi closed this as completed Jul 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training loss decreases, but model doesn't learn #19

Training loss decreases, but model doesn't learn #19

sabetAI commented Jul 11, 2020

komelianchuk commented Jul 15, 2020

Training loss decreases, but model doesn't learn #19

Training loss decreases, but model doesn't learn #19

Comments

sabetAI commented Jul 11, 2020

komelianchuk commented Jul 15, 2020