You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I think the current training log is not very informative when we obtain some weird training curves, such as a sudden loss increase. It would be very helpful if some general statistics can be added to the training log:
grad_norm: showing the average gradient norm since last report
clip: showing the percentage of batches that have been grad-clipped since last report
loss_scale: showing the average loss scale since last report (when scaler is not None)
etc.
By referring to the implementation in fairseq, I think the above can be easily implemented by modifying espnet2/train/trainer.py as follows:
--- a/espnet2/train/trainer.py+++ b/espnet2/train/trainer.py@@ -678,6 +678,17 @@ class Trainer:
scaler.update()
else:
+ reporter.register(+ {+ "grad_norm": grad_norm,+ "clip": torch.where(+ grad_norm > grad_clip,+ grad_norm.new_tensor(100),+ grad_norm.new_tensor(0),+ ),+ "loss_scale": scaler.get_scale() if scaler else 1.0,+ }+ )
all_steps_are_invalid = False
with reporter.measure_time("optim_step_time"):
for iopt, (optimizer, scheduler) in enumerate(
With the above modifications, the training log of a speech enhancement task will look like this:
Hi, I think the current training log is not very informative when we obtain some weird training curves, such as a sudden loss increase. It would be very helpful if some general statistics can be added to the training log:
grad_norm
: showing the average gradient norm since last reportclip
: showing the percentage of batches that have been grad-clipped since last reportloss_scale
: showing the average loss scale since last report (whenscaler
is not None)By referring to the implementation in fairseq, I think the above can be easily implemented by modifying espnet2/train/trainer.py as follows:
With the above modifications, the training log of a speech enhancement task will look like this:
The text was updated successfully, but these errors were encountered: