-
Notifications
You must be signed in to change notification settings - Fork 67
How sensitive is PAWS to batch size? #5
Comments
Hi @frank-xwang, We didn't explore smaller batches, but I'm happy to help if you're interested in investigating this. In general, the loss should be relatively robust, but the size of the support set does make a difference (as per the ablation in Section 7). Thus, you may need longer training with small batches. |
Hello, thank you for your reply. I think that batch-size ablation study will be very interesting for researchers in many academic groups that do not have so many computing resources. It would be great if you could provide this kind of ablation study. |
I'll get back to you about the batch-size ablation, but it's unlikely I'll be able to get to this soon unfortunately. As for the a version that doesn't require Slurm, you can launch your ImageNet jobs with "main.py" instead of "main_distributed.py" and that should work on a single GPU without Slurm! For example
|
Awesome! Thank you! For the main file, sorry for the unclearness, I mean one machine with 8 GPUs. Do we have to use main_distributed.py? Is there any main file that is able to work without Slurm on 8 GPUs, with distributed training? |
Oh yes I see what you mean. Just pushed a change so that you can now run main.py using several GPUs on a multi-gpu machine, just specify the devices as command line arguments. For example, to run training on 8GPUs, specify the devices as so:
|
Great! Thanks! |
@frank-xwang Hi, did you run successfully on 8gpus ? Could you share your training time |
Hi, after reducing "unsupervised_batch_size" and "supervised_imgs_per_class", I can run it on 4 V100 GPUs. The training time for each epoch is approximately 0.8 hours. But I think reducing batch size may reduce performance, which may need to be verified after completing the experiment. |
|
Hi @CloudRR, I tried some hyperparameters, but failed to reproduce the reported results with 4 V100 GPUs. Although the speed is not bad, training 1 epoch takes about 1 hour. It seems that PAWS is also sensitive to batch size, as has been observed in many self-supervised learning methods. |
Same here. Couldnt reproduce with 4gpus, and also 1h/epoch |
Hi, I've had a lot on my plate, but I did manage to try out a PAWS run on ImageNet with a small batch-size, and it essentially reproduces the large-batch numbers. Using 8 V100 GPUs for 100 epochs with 10% of ImageNet labels, I get
This top-1 accuracy is consistent with the ablation in the bottom row of table 4 in the paper (similar support set, but much larger batch-size). Here is the config I used to produce this result when running on 8 GPUs. To explain some of the choices:
All other hyper-parameters are identical to the large-batch setup.
|
Hi, thanks for sharing the code! I am curious about PAWS' sensitivity to batch size. Have you tried experimenting with smaller batch sizes (such as 256 or 512) that 8 GPUs can afford on ImageNet? Thanks. @MidoAssran
The text was updated successfully, but these errors were encountered: