-
Notifications
You must be signed in to change notification settings - Fork 909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MNIST: ODEBlock possibly redundant? #32
Comments
I've tried the ODEnet on CIFAR-10 and got ~85% accuracy using the conv for downsampling. I didn't try too many different hyperparameters but the ODE block seems to be working. |
I tried some hyperparameters and tiny tricks based on same structure. ODENet could achieve a stable result around 91-92 accuracy on CIFAR-10 after something like 130-150 epochs. It was just for quick test due to my equipment limitation. With settings which does not take time into consideration, 93-94 is possible I guess. |
@zlannnn Could you share your hyperparameters and tricks used to achieve 91 accuracy on CIFAR-10? I am doing this experiments but the best accuracy is ~88%. Thanks! |
@jjjjjie I just used same structure as official minist example (which means 2 resblocks involved). Some regular augmentation, SGD, little more filters. I think major improvments are done by augmentation and more feature channals. If you try this with more channels, you would be able to achieve higher acc. |
@zlannnn Thanks for you reply! I still have some questions about your tricks:
Thank you in advance for your answer. |
@jjjjjie Hi
|
@zlannnn Thanks l lot! I will have a try |
Evaluate three structures on CIFAR-100, and get the following results: - ODE-block works well. |
Hello,
Thank you for your work. It introduces a very interesting concept.
I have a question regarding your experimental section that acts as verification of the ODE model for MNIST classification.
Your ODE MNIST model in the paper is the following
model = nn.Sequential(*downsampling_layers, *feature_layers, *fc_layers).to(device)
,with the ODE block being in the middle as
feature_layers
anddownsampling_method == conv
. It has overall 208266 parameters and achieves test error of 0.42%.However, if you get rid of the middle block altogether and construct the following instead
model = nn.Sequential(*downsampling_layers, *fc_layers).to(device)
,with
downsampling_layers
andfc_layers
exactly as in the case before, you get a model with 132874 that achieves a similar test error of under 0.6% after roughly 100 epochs.Can it be that your experiment shows remarkable efficiency of your
downsampling_layers
rather than of the ODE block?Thanks,
Simon
The text was updated successfully, but these errors were encountered: