New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cifar10 baseline doesn't reach 95% #11
Comments
Thanks for pointing this out! I think the key parameter we didn't clearly specify for CIFAR-10 is the allowable "scale" for random cropping. The default parameter setting in timm allows the crop to be as low as 8% the original area of the image (before being resized to the original shape). We thought this didn't make sense for 32x32 CIFAR-10 images, so we changed this to 75%. It also would probably be a good idea to specify the CIFAR-10 mean and standard deviation, though I don't think this will change much. In particular, try adding the following flags: I've also updated the README to mention this. |
Hello, thanks for your help. Indeed the crop parameter has an effect on the accuracy, with
|
What patch size and kernel size are you using? I think this should easily achieve >96% if patch_size=1 and around 95% for patch_size=2 for large kernels. We used batch size 128 (whereas yours is 2*128), but I'm not sure if that would cause such a big difference. I have trained a ConvMixer-256/16 with patch_size=1 and kernel_size=9 on CIFAR-10 with almost the same settings (except for batch size) that achieved 96% by epoch 140/200. I'll see if increasing the batch size would actually have such a significant effect and get back to you. |
Hello, I use a patch size of 1 and a kernel size of 9. Tried smaller batch size (64) and it changed nothing.
and I use this model
|
Hi, the paper mentions using a "simple triangular learning rate schedule" - we're trying to replicate your work on CIFAR-10 (in TensorFlow) - we're wondering which LR schedule and parameters you used for the results in Table B. Thank you! |
@K-H-Ismail I'm currently training the same model as you using a freshly-cloned version of this repo with the same parameters, other than the batch size which I have set to |
@dmezh We included the learning rate schedule in this repo, though you kind of need to hunt through code to find it. The most important line is this one, which I'll paste below:
Where That said, I think you'll get approximately the same results using something more standard like cosine decay with one cycle. Let me know if you have any other questions about your replication! Given the interest in our CIFAR-10 results, we'll try to release a more compact training script and model weights for it sometime soon (but in PyTorch). |
Hello @tmp-iclr, Is it the same for you ? |
Glad to help. I used https://github.com/knjcode/cifar2png to construct the dataset. I'll see if there's any difference with the one you linked. By the way, the model I ran with your settings and |
So I'm training the same model on the dataset you used, and it does indeed seem to be lagging behind by a few percent. It's too early to say for sure, but this might be the problem... |
Upon inspection, it looks like the CIFAR dataset you used has substantial JPEG artifacts -- the images actually look noticeably less sharp and colorful. I'm now pretty sure the dataset discrepancy is, in fact, the problem. |
Hello, |
No problem! Glad we figured it out, and thanks. I'm going to go ahead and close this issue, but feel free to reopen it or open another if you have more questions (likewise, @dmezh). |
Hello,
I tried convmixer256 on Cifar-10 with the same timm options specified for ImageNet (except the num_classes) and it doesn't go beyond 90% accuracy. Could you please specify the options used for Cifar-10 experiment ?
The text was updated successfully, but these errors were encountered: