Hello,
I have a few questions I hope someone can answer.
I saw that for adjusting the learning rate the linear scaling rule should be applied, therefore the LR is 0.1 with batch size 64 and 1.6 with batch size 1024. If I'm training on a single GPU with batch size 16 should I adjust my learning rate to be 0.025 instead of 0.1 as well?
Secondly, I'm only planning to run several experiments and train several variations of the network for ~30 epochs instead of 196. How would I need to adjust the warmup LR and epochs. I did change the max epoch to 30, perhaps I should keep it at 196, but then the whole 30 epochs would be the warmup. Or should I remove the warmup entirely and start from base LR?
Thank you in advance.
Hello,
I have a few questions I hope someone can answer.
I saw that for adjusting the learning rate the linear scaling rule should be applied, therefore the LR is 0.1 with batch size 64 and 1.6 with batch size 1024. If I'm training on a single GPU with batch size 16 should I adjust my learning rate to be 0.025 instead of 0.1 as well?
Secondly, I'm only planning to run several experiments and train several variations of the network for ~30 epochs instead of 196. How would I need to adjust the warmup LR and epochs. I did change the max epoch to 30, perhaps I should keep it at 196, but then the whole 30 epochs would be the warmup. Or should I remove the warmup entirely and start from base LR?
Thank you in advance.