This is a collection of common basic AI models, mainly collected from official repos and third-party implementation. This repo mainly includes the following contents: paper links, original models, measured performance data and test scripts.
All the model data can be found at:
- baiduyun, code:
5nd3
- google driver
Host computer infomation:
- Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
- GeForce RTX 2070 super
ILSVRC2012 dataset is used to verify and test the classification models, which can be downloaded at Imagenet and Imagenet Torrents.
Repo link:caffe-alexnet
Alexnet: ImageNet Classification with Deep Convolutional Neural Networks
1-crop method is adopted,resize to (256, 256),central crop: 227, mean value file: imagenet-mean
def alexnet_preprocess(img_path, mean_val=None, resize_size=None,
crop_size=None):
mean_val = get_mean_val(mean_val)
mean_val = np.transpose(mean_val, (1, 2, 0))
img = ImageProcess(img_path)
img.load_image()
img.resize_image(short_side_len=resize_size)
img.sub_mean_val(mean_val=mean_val)
img.crop_image(crop_len=crop_size)
img.transpose(trans_order=(2, 0, 1))
return img.data
model name | top1 | top5 |
---|---|---|
Alexnet | 0.56184 | 0.79474 |
Repo link: shicai repo[caffe]
-
Mobilentv1: MobileNets Efficient-Convolutional-Neural-Networks-for-Mobile-Vision-Applications.pdf
-
Mobilenetv2: MobileNetV2-Inverted-Residuals-and-Linear-Bottlenecks.pdf
1-crop method is adapted,resize to (N, 256) or (256, N), central crop: 224,mean value:[103.94, 116.78, 123.68], input scale is 0.017.
def mobilenet_preprocess(img_path, mean_val=None, input_scale=None,
resize_size=None, crop_size=None):
mean_val = get_mean_val(mean_val)
img = ImageProcess(img_path)
img.load_image()
img.resize_image(short_side_len=resize_size)
img.crop_image(crop_len=crop_size)
img.transpose(trans_order=(2, 0, 1))
img.sub_mean_val(mean_val=mean_val)
img.input_scale(scale=input_scale)
return img.data
model name | top1 | top5 |
---|---|---|
Mobilenetv1 | 0.6989 | 0.8938 |
Mobilenetv2 | 0.71616 | 0.90226 |
Repo links:
- Inceptionv1: Going-Deeper-with-Convolutions.pdf
- Inceptionv2: Batch-Normalization-Accelerating-Deep-Network-Training-by-Reducing-Internal-Covariate-Shift.pdf
- Inceptionv3: Rethinking-the-Inception-Architecture-for-Computer-Vision.pdf
- Inceptionv4: Inception-v4-Inception-ResNet-and-the-Impact-of-Residual-Connections-on-Learning.pdf
1-crop method is adapted:
- inceptionv1:mean value:[104, 117, 123], resize: 256*256, central crop:224
- inceptionv3:mean value:[128, 128, 128, 128], resize: 395*395, central crop: 395, input_scale: 0.0078125(1/128.0) validation dataset:ILSVRC2015_val
- inceptionv4:mean value:[128, 128, 128, 128], resize: 320*320, central crop: 299, input_scale: 0.0078125(1/128.0)
def inception_preprocess(img_path, mean_val=None, resize_size=None,
crop_size=None, input_scale=None):
mean_val = get_mean_val(mean_val)
img = ImageProcess(img_path)
img.load_image()
img.resize_image(short_side_len=resize_size)
img.crop_image(crop_len=crop_size)
img.transpose(trans_order=(2, 0, 1))
img.sub_mean_val(mean_val=mean_val)
if input_scale is not None:
img.input_scale(scale=input_scale)
return img.data
model name | top1 | top5 |
---|---|---|
Inceptionv1 | 0.68528 | 0.88848 |
Inceptionv3 | 0.7936 | 0.9492 |
Inceptionv4 | 0.7994 | 0.9502 |
Repo link:forresti-SqueezeNet
Squeezenet: SqueezeNet-AlexNet-level-accuracy-with-50x-fewer-parameters-and-<0.5MB-model-size.pdf
1-crop method is adapted: mean value:[104, 117, 123], resize: 256*256, crop:224
def squeezenet_preprocess(img_path, mean_val=None, resize_size=None,
crop_size=None):
mean_val = get_mean_val(mean_val)
img = ImageProcess(img_path)
img.load_image()
img.resize_image(short_side_len=resize_size)
img.crop_image(crop_len=crop_size)
img.transpose(trans_order=(2, 0, 1))
img.sub_mean_val(mean_val=mean_val)
return img.data
model name | top1 | top5 |
---|---|---|
Squeezenet-v1.0 | 0.57106 | 0.80004 |
Squeezenet-v1.1 | 0.5771 | 0.80498 |
Repo link: KaimingHe repo
Resnet:Deep Residual Learning for Image Recognition.pdf
1-crop method is adapted: resize to (N, 256) or (256, N), central crop: 224, mean value:resnet-mean
def resnet_preprocess(img_path, mean_val=None, resize_size=None,
crop_size=None):
mean_val = get_mean_val(mean_val)
img = ImageProcess(img_path)
img.load_image()
img.resize_image(short_side_len=resize_size)
img.crop_image(crop_len=crop_size)
img.transpose(trans_order=(2, 0, 1))
img.sub_mean_val(mean_val=mean_val)
return img.data
model name | top1 | top5 |
---|---|---|
Resnet50 | 0.7497 | 0.92088 |
Resnet101 | 0.76238 | 0.9286 |
Resnet152 | 0.76686 | 0.9321 |
Repo link: shicai-DenseNet-Caffe
Densenet: Densely-Connected-Convolutional-Networks.pdf
1-crop method is adapted:resize to (N, 256) or (256, N), central crop: 224, mean value: [103.94, 116.78, 123.68], input scale is 0.017.
def densenet_preprocess(img_path, mean_val=None, input_scale=None,
resize_size=None, crop_size=None):
mean_val = get_mean_val(mean_val)
img = ImageProcess(img_path)
img.load_image()
img.resize_image(short_side_len=resize_size)
img.crop_image(crop_len=crop_size)
img.transpose(trans_order=(2, 0, 1))
img.sub_mean_val(mean_val=mean_val)
img.input_scale(scale=input_scale)
return img.data
model name | top1 | top5 |
---|---|---|
Densenet121 | 0.74898 | 0.92234 |
Densenet161 | 0.77702 | 0.93826 |
Densenet169 | 0.76194 | 0.93166 |
Densenet201 |
Repo link: vgg
Vgg: VGG-Very-Deep-Convolutional-Networks-for-Large-Scale-Image.pdf)
1-crop method is adapted, resize to (N, 256) or (256, N), central crop: 224, mean value:[103.939, 116.779, 123.68]
def vgg_preprocess(img_path, mean_val=None, resize_size=None,
crop_size=None):
mean_val = get_mean_val(mean_val)
img = ImageProcess(img_path)
img.load_image()
img.resize_image(short_side_len=resize_size)
img.crop_image(crop_len=crop_size)
img.transpose(trans_order=(2, 0, 1))
img.sub_mean_val(mean_val=mean_val)
return img.data
model name | top1 | top5 |
---|---|---|
Vgg16 | 0.71264 | 0.90062 |
Vgg19 | 0.71248 | 0.89974 |