Category of Task: Vision
Kind of Task: Image Classification
The ResNet model family is a set of convolutional neural networks that can be used as the base for a variety of vision tasks. ImageNet ResNets are a subset of the ResNet family which were designed specifically for classification on the ImageNet dataset.
Paper: Deep Residual Learning for Image Recognition by Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
Code and hyperparameters: DeepLearningExamples Github repository by Nvidia
The basic architecture defined in the original papers is as follows:
- The first layer is a 7x7 Convolution with stride 2 and 64 filters.
- Subsequent layers follow 4 stages with {64, 128, 256, 512} input channels with a varying number of residual blocks at each stage that depends on the family member. At the end of every stage, the resolution is reduced by half using a convolution with stride 2.
- The final section consists of a global average pooling followed by a linear + softmax layer that outputs values for the specified number of classes
ResNet family members are defined by their number of layers. Parameter count, accuracy, and training time are provided below.
Model Family Members | Parameter Count | Our Accuracy | Training Time on 8xA100s |
---|---|---|---|
ResNet-18 | 11.5M | TBA | TBA |
ResNet-34 | 21.8M | TBA | TBA |
ResNet-50 | 25.6M | 76.5% | 3.83 hrs |
ResNet-101 | 44.5M | 78.1% | 5.50 hrs |
ResNet-152 | 60.2M | TBA | TBA |
The implementation follows He et al. except for the ResNet v1.5 change from Nvidia's DeepLearningExamples. For bottleneck residual blocks that downsample, the downsampling is performed by the 3x3 convolution instead of the 1x1 convolution. Nvidia reports approximately +0.5% absolute accuracy and 5% decrease in throughput from this change.
- Optimizer: SGD
- Learning rate: 2.048
- Momentum: 0.875
- Weight decay: 1/32768
- Batch size: 2048
- LR Schedulers
- Linear warmup for 8 epochs
- Cosine decay after warmup
- Number of epochs: 90