New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add ResNet-RS models #554
Conversation
Hey @rwightman ! I am almost there with contributing Resnet-RS models to timm (my first model contribution)! Starting this [WIP] PR to get things wrapped up. I am very confident that the only things remaining in this implementation currently are:
Everything else according to me has been covered already. As per section 4.1 of the paper:
This has been done by adding
This hasn't been covered yet.
Covered off by setting
Covered off by adding a new parameter
Covered by adding Since I haven't ported weights from TF to PyTorch before, I was hoping for some guidance on how to do this. Also FYI. Pretty happy with the implementation thus far, but keen to hear your feedback too. |
Ok @rwightman, happy days! The implementation performs better on Imagenette. The graph below compares renetrs50 with resnet50. Resnet-RS 50 gets 85.9% top-1 compared to 84.5% for Resnet-50. I have kicked off a training run for ImageNet but given I have only single V100 - it should take at least 3-4 days before we see results. |
Do you have any plans for smaller resnet (18, 34)? Does it make sense? |
Hi @JulienMaille :) The smalles ResNet-RS model is ResNet 50 FYI |
@amaarora I thought that was imagenet 85.9 for a sec then realized it was imagenette ;) One thought re the stem config, do the RS models push the stride 2 into the blocks or replace the maxpool with a conv? I thought they replaced? Or do they do both depending on the model size? @JulienMaille don't think it makes much sense to define a basic block RS model. I generally use a ResNet26 def as the smallest bottleneck (2,2,2,2) like 34 but with bottleneck. Although I've seen 14s (1,1,1,1). An FYI, there are already resnet models better than these RS ones here. I trained an ecaresnet26t that's almost 80% top-1, ecaresnet50t that's over 82, resnet50d that's close to 80.5. The RS models are SE so the 50d score is impressive relative and the ECA are roughly similar in terms of throughput to an SE but lower param count. I have some comparable SE models as well but only a larger ones like the seresnet152d (84.35 vs 83 for the best 152 RS) was trained with recent hparams. |
Basically I've already been exploring the ideas in the RS paper here, heavier augs and regularization w/ different resolution scaling. I went a bit further with aug/regularization. |
@amaarora also, the stem changes will likely break the feature extraction without changing feature_info locations based on the stem config |
Good catch! This has been a little confusing but going by the implementation, you're right. They replace the maxpool with stride-2 3x3 conv. This can be inferred from the Resnet-RS config here where FYI, this is in conflict to the paper where they write in section 4.1 under Resnet-D architecture subheading:
Note the usage of the word "removed" and not "replaced". I guess in such cases we follow the TF implementation and take it to be source of truth? @rwightman |
FYI I am working on fixing the failed tests. |
@rwightman Finally passing! :) How do you add pretrained weights please? If I can now add pretrained weights to these models, then they are ready to be merged IMHO. EDIT* Please ignore above.. It's not working still and not sure why |
I keep getting error code 137. It appears as though that’s due to OOM error and not quite code change. Source from here https://stackoverflow.com/questions/43268156/process-finished-with-exit-code-137-in-pycharm Still trying to figure out why and can’t quite replicate it unless I run all tests for all models. |
@rwightman - the tensorflow convolution weights have a
|
Ignore all the 'Momentum' vars... for ExponentialMovingAverage though, you
will want the option to either use that instead of the parent value, or not
use it, that is essentially the EMA averaged version of the weights and the
non-EMA. Usually the EMA in most google checkpoints is better, but not
always
…On Sun, May 2, 2021 at 9:11 PM Aman Arora ***@***.***> wrote:
@rwightman <https://github.com/rwightman> - the tensorflow convolution
weights have a momentum term in them. Do you know what to do with these
please? I have been able to match the tf kernel weights to conv but not
sure what to do with ema and momentum weights. See example below:
...
conv2d_58/kernel
conv2d_58/kernel/ExponentialMovingAverage
conv2d_58/kernel/Momentum
conv2d_59/kernel
conv2d_59/kernel/ExponentialMovingAverage
conv2d_59/kernel/Momentum
...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#554 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABLQICCLMI2LSZBOUOTODIDTLYPAHANCNFSM42WKDQIA>
.
|
Thanks @rwightman !! I am very close to getting the model weights in now. Just last question - I am left with From https://github.com/lukemelas/EfficientNet-PyTorch/blob/master/tf_to_pytorch/convert_tf_to_pt/load_tf_weights.py, I have mapped:
Not sure what to do with these:
|
Okay done! I have loaded tensorflow weights to PyTorch, now I am trying to find how to benchmark the model and test that the model weights actually have around 80% top-1 accuracy. :) |
@rwightman 79.114 TOP-1 Accuracy on ImageNet 1k for Resnetrs-50! Hell yeah! :D Just to confirm - here's what I did: Now let me get the weights for all resnet-rs models - it's pretty easy from here on. How do we upload the pretrained weights? Would be really helpful if you could also benchmark or get another set of eyes for QA. |
I just ignored these |
All model weights are here. https://www.kaggle.com/aroraaman/resnetrs You should now be able to do:
|
@amaarora congrats! yeah, num_batches_tracked is a pytorch specific thing that can be ignored for this... you know there is a validate.py right? :) you can just call, try different image size, crop pct, image interpolation to find what the best is .... one thing that's not clear from the paper and table in the official repo is whether each validation result is at the train size (ie 160x160) for the 50, or if they're doing the train - test res thing and testing at a higher res, ie 224 when trained at 160. |
@amaarora FYI, the kaggle weight link doesn't work |
Sorry it's cause the dataset is private. Could you please try again? :) |
Must be the excitement of getting resnetrs weights to work that I just completely didn't see it. :D
I do agree. I ran benchmarks yesterday testing at default 224x224 for all models and found validation accuracy to be within 1% of the reported results here . |
@amaarora for the models that have checkpoints for more than one size, did you include only the largest one? |
@rwightman Sorry - I should have clarified. I went for the smallest one - but can totally update to have the largest ones? Or both? It should take only a couple minutes from here on to update the weights. |
FYI here is the messy script that I wrote to port the weights from TF. |
@amaarora k, good, that makes sense (with regards to what I observed), I picked a few to do a quick eval and the 152 wasn't checking out but the others were. I think the largest for each model makes sense, don't see much value in having them all, and def prefer having the best possible one for each arch |
@@ -318,7 +334,7 @@ class Bottleneck(nn.Module): | |||
|
|||
def __init__(self, inplanes, planes, stride=1, downsample=None, cardinality=1, base_width=64, | |||
reduce_first=1, dilation=1, first_dilation=None, act_layer=nn.ReLU, norm_layer=nn.BatchNorm2d, | |||
attn_layer=None, aa_layer=None, drop_block=None, drop_path=None): | |||
attn_layer=None, aa_layer=None, drop_block=None, drop_path=None, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rwightman FYI this is change that you might want to review specifically. I have added **kwargs
to Bottleneck
block that get passed to attention layers.
This is to pass in reduction_ratio=0.25
for se
layers as mentioned in the paper.
@@ -341,7 +357,7 @@ def __init__(self, inplanes, planes, stride=1, downsample=None, cardinality=1, b | |||
self.conv3 = nn.Conv2d(width, outplanes, kernel_size=1, bias=False) | |||
self.bn3 = norm_layer(outplanes) | |||
|
|||
self.se = create_attn(attn_layer, outplanes) | |||
self.se = create_attn(attn_layer, outplanes, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These **kwargs
get passed to Bottleneck using block_args
in model_config as in L1112 model definition for resnetrs152
Okay thanks! I'll go back and port the weights for largest ones where there are two set of weights available and run benchmark scripts to share the better performing model in my Kaggle dataset - does this sound good? Won't be long. |
@rwightman updated kaggle dataset here https://www.kaggle.com/aroraaman/resnetrs I have included all weights as in here https://github.com/tensorflow/tpu/tree/master/models/official/resnet/resnet_rs |
@rwightman Here are my benchmarking results. ResNet RS 101For ResNet RS 152For newly added ResNet-RS 350For I suggest we use |
@amaarora merged and made a few additional changes... curious if you compared the EMA vs non-EMA weights on any of the models? The 420 weights are pretty weak, actually 420 and the i320 350 and the 270 weights are kind of meh, the i256 350 are great by comparison (validate up to 84.4ish with some res scaling). I can't get the 420 past 84.2/84.3. So I'd be curious on a check to see if the 420 weight are any better for the non-EMA (assuming the one I have is the EMA). |
Hey @rwightman - thanks for your help!
Sorry I didn't.
What you have are the non-EMA weights for all models. Please allow ~1hour to go back and share the EMA weights as a separate Kaggle dataset with you. |
@amaarora oh, okay, that could be a fairly significant difference, curious to see how it stacks up |
@rwightman EMA weights uploaded here https://www.kaggle.com/aroraaman/resnetrsema |
@rwightman |
No description provided.