The multi-fc losses calculating in DistributedDataParallel. #56772
Labels
oncall: distributed
Add this issue/PR to distributed oncall triage queue
triaged
This issue has been looked at a team member, and triaged and prioritized into an appropriate module
The labels of classes are in 0~9. In a batch, some labels are 255, so we need to ignore the loss of these labels.
I calculate the loss in two ways:
Way1:
Way2:
The way1 should be right, but the acc of way1 is always lower than way2 in four experiments.
cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd @cbalioglu @gcramer23
The text was updated successfully, but these errors were encountered: