You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 10, 2022. It is now read-only.
Hi there,
Thanks for your great work of shampoo implementation in Pytorch. I'm trying to reproduce the cifar10 results in the Shampoo paper. But I got a much lower testing results. I have tried changing the learning rate form 0.01 to 10(according to the paper suggests), but still got a near 85% acc. Here are my experiments results:
We use the Resnet32 network in Cifar10 experiments.
--momentum, 0.9
--epsilon, 1e-4
--batchSize, 128
lr=0.1:(250 epochs)
Training Loss
Training Acc
Testing loss
Testing Acc
0.65
77.03%
0.68
76.39%
lr=1: (250 epochs)
Training Loss
Training Acc
Testing loss
Testing Acc
0.25
91.33%
0.57
84.04%
lr=2: (250 epochs)
Training Loss
Training Acc
Testing loss
Testing Acc
0.23
91.87%
0.72
82.02%
lr=5: (250 epochs)
Training Loss
Training Acc
Testing loss
Testing Acc
0.22
92.33%
0.75
82.04%
When training for 500 epochs for different lr above, the testing acc ramains almost the same. Still can't reach even 90% acc.
Any idea or suggestions about this problem? Thanks for your time.
The text was updated successfully, but these errors were encountered:
Thank you for your comprehensive experiments. Indeed, I also cannot reproduce the reported results with my implementation even though using the average of gradients.
So far, I'm also still investigating the reason. If you find something, please let me know.
Hi, some questions about the Algorithm 2 code.
In the Shampoo paper, for different dimension it use the original grad to calculate the contraction.
But in the code, the grad will be updated for each dimension, and then used to calculate the contraction for the next dimension. Is it sth wrong of my understanding about the code or the algo.2 in the paper?
.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi there,
Thanks for your great work of shampoo implementation in Pytorch. I'm trying to reproduce the cifar10 results in the Shampoo paper. But I got a much lower testing results. I have tried changing the learning rate form 0.01 to 10(according to the paper suggests), but still got a near 85% acc. Here are my experiments results:
--epsilon, 1e-4
--batchSize, 128
lr=0.1:(250 epochs)
lr=1: (250 epochs)
lr=2: (250 epochs)
lr=5: (250 epochs)
When training for 500 epochs for different lr above, the testing acc ramains almost the same. Still can't reach even 90% acc.
Any idea or suggestions about this problem? Thanks for your time.
The text was updated successfully, but these errors were encountered: