I train resnet50 on ImageNet with GPUs=8, batchsize=256, learning-rate=0.1, epochs=90, and momentum=0.90.
The attained top1 accuracy is 75.80, lower than the reported 76.15. The gap is not marginal on the large-scale ImageNet.
Why does the difference exist?