What "class_per_batch" should I set to reproduce the ImageNet 10% label performance #11
Comments
Hi @Ir1d , The config is the default config you would use on 64 GPUs. Setting As for training on 4 GPUs, I haven't tried that, but I'll look into establishing a small batch baseline. In general, you need to do a few things:
As for the support set, we found (as per the ablation in Section 7), that decreasing On 4GPUs, you can try setting |
Hi @MidoAssran Is it possible if you share the training time of an epoch on for example 8 gpu or 4 gpus (something less than 64 gpus) |
Hi @Ir1d, I don't have this on hand since I haven't tried something like 8 gpus yet, but I'll def look into it. |
@MidoAssran Thank you so much. I'm having about 2h / epoch and I'm not sure if I ran it correctly. |
Hi @Ir1d, Apologies for the delay getting back to you, I've had a lot on my plate recently, but I did finally try a small-batch run in the 10% label setting on ImageNet. See below (mostly reproduced from my comment here). I've attached my config file as well, but let me know if you have any trouble reproducing this. I'm going to close this issue for now. Using 8 V100 GPUs for 100 epochs with 10% of ImageNet labels, I get
This top-1 accuracy is consistent with the ablation in the bottom row of table 4 in the paper (similar support set, but much larger batch-size). Here is the config I used to produce this result when running on 8 GPUs. To explain some of the choices:
All other hyper-parameters are identical to the large-batch setup.
|
Hi, thank you for sharing the code.
I'm a bit confused by the class_per_batch option here.
In Sec 5. of the paper, it said "randomly sample 6720 images, 960 classes and 7 images per class", but in the config it showed 15 classes and 7 images per class. I wonder if I need to change this 15?
In the code it showed
num_support = supervised_views * s_batch_size * classes_per_batch
, so the num_support we actually have is105 = 1 * 7 * 15
on each gpuI know 960 is 15 x 64, does it mean that the actuall num_support should be
classes_per_batch * u_batch_size
?I ran the main.py on single device with 4 gpu, but the trained performance is extremely low (less than 10% accuracy) , I'm not sure if it is related to this problem.
The text was updated successfully, but these errors were encountered: