Evaluation

1. Print Steps Feedback

After each completion of 25 steps (this configuration parameter is defined in the training script), the following list of metrics are displayed in the terminal :

 --> TIME: 2024-06-22 16:58:45 -- STEP: 1076/1233 -- GLOBAL_STEP: 35600
     | > loss_disc: 2.601724863052368  (2.5608400611629265)
     | > loss_disc_real_0: 0.16440898180007935  (0.18418801065575882)
     | > loss_disc_real_1: 0.18971821665763855  (0.2082644184587391)
     | > loss_disc_real_2: 0.22247859835624695  (0.22729541624456534)
     | > loss_disc_real_3: 0.21355962753295898  (0.23436040888266937)
     | > loss_disc_real_4: 0.2299477607011795  (0.23665879754051844)
     | > loss_disc_real_5: 0.2520277202129364  (0.2239697749508358)
     | > loss_0: 2.601724863052368  (2.5608400611629265)
     | > grad_norm_0: tensor(11.8741, device='cuda:0')  (tensor(20.7040, device='cuda:0'))
     | > loss_gen: 2.253811836242676  (2.2022835016250597)
     | > loss_kl: 1.4479506015777588  (1.5328740684737954)
     | > loss_feat: 4.815017223358154  (4.963463480144629)
     | > loss_mel: 20.43062973022461  (21.177591887548495)
     | > loss_duration: 1.4312173128128052  (1.4145199033850628)
     | > amp_scaler: 512.0  (567.1970260223051)
     | > loss_1: 30.37862777709961  (31.29073280738632)
     | > grad_norm_1: tensor(100.4473, device='cuda:0')  (tensor(144.6298, device='cuda:0'))
     | > current_lr_0: 0.0001993011799713115
     | > current_lr_1: 0.0001993011799713115
     | > step_time: 10.9037  (6.006802409998102)
     | > loader_time: 0.5829  (0.5791234892540256)

These metrics are the training statistics.

2. Epoch Feedback

After step 1.233, a new epoch begins. But first, an evaluation of the last epoch is done. The following metrics are displayed in the terminal in 25 steps = blocks (398 samples / 16 samples per batch = 25 steps).

    --> STEP: 0
     | > loss_disc: 2.496549606323242  (2.496549606323242)
     | > loss_disc_real_0: 0.1999395489692688  (0.1999395489692688)
     | > loss_disc_real_1: 0.21423761546611786  (0.21423761546611786)
     | > loss_disc_real_2: 0.21206657588481903  (0.21206657588481903)
     | > loss_disc_real_3: 0.21744751930236816  (0.21744751930236816)
     | > loss_disc_real_4: 0.1928769201040268  (0.1928769201040268)
     | > loss_disc_real_5: 0.23872263729572296  (0.23872263729572296)
     | > loss_0: 2.496549606323242  (2.496549606323242)
     | > loss_gen: 2.250887393951416  (2.250887393951416)
     | > loss_kl: 1.8777085542678833  (1.8777085542678833)
     | > loss_feat: 5.428119659423828  (5.428119659423828)
     | > loss_mel: 22.569669723510742  (22.569669723510742)
     | > loss_duration: 1.5732102394104004  (1.5732102394104004)
     | > loss_1: 33.6995964050293  (33.6995964050293)

3. Metric Values

The metrics used to evaluate the progress and performance of the training progress are the following :

loss_0 and loss_1
loss_disc
loss_disc_real_0 up to loss_disc_real_6
grad_norm_0 and grad_norm_1
loss_gen
loss_kl
loss_feat
loss_mel
loss_duration
amp_scaler
current_lr_0 and current_lr_1
step_time
loader_time

After each epoch, the metrics are compared with the values of the preceding epoch. Better values are shown in green, worse values in red.

metrics after each epoch

It's not necessary to compare the individual metrics values, because Google provides a valuable tool called TensorBoard to view the training progress in graphical format.

4. Best Model

After the evaluation at the end of the first epoch, the file containing the values of the 83.059.756 model parameters is saved as best_model.pth and a copy of this file is saved as best_model_1233.pth. If the model at the next evaluations is more accurate, both files are replaced. The current number of steps is updated in the filename of the copy, for example best_model_2466.pth, best_model_3699.pth, best_model_6165.pth etc.

5. Checkpoint

After 10.000 steps (this configuration parameter is defined in the training script), the current model is saved as checkpoint_10000.pth. The same is done at step 20.000, 30.000 and so on.

Provide feedback

Saved searches