cnn-benchmarks

Benchmarks for popular convolutional neural network models on CPU and different GPUs, with and without cuDNN.

Some general conclusions from this benchmarking:

GTX 1080 > Titan X: Across all models, the GTX 1080 is 1.10x to 1.15x faster than the Titan X.
ResNet > VGG: ResNet-50 is 1.5x faster than VGG-16 and more accurate than VGG-19 (7.02 vs 8.0); ResNet-101 is about the same speed as VGG-16 but much more accurate than VGG-19 (6.21 vs 8.0).
Always use cuDNN: On the GTX 1080, cuDNN is 2.0x to 2.8x faster than nn; on the Titan X, cuDNN is 2.2x to 3.0x faster than nn.
GPUs are critical: The GTX 1080 with cuDNN is 35x to 50x faster than dual Xeon E5-2630 v3 CPUs.

All benchmarks were run in Torch on a machine with dual Intel Xeon E5-2630 v3 processors (8 cores each plus hyperthreading means 32 threads) and 64GB RAM running Ubuntu 14.04 with the CUDA 8.0 Release Candidate.

We benchmark all models with a minibatch size of 16 and an image size of 224 x 224; this allows direct comparisons between models, and allows all but the ResNet-200 model to run on the GTX 1080, which has only 8GB of memory.

The following models are benchmarked:

Network	Layers	Top-1 error	Top-5 error	Speed (ms)	Citation
AlexNet	8	42.9	19.8	23.18	[1]
VGG-16	16	25.6	8.1	232.55	[2]
VGG-19	19	25.5	8.0	281.69	[2]
ResNet-18	18	30.43	10.76	47.07	[3]
ResNet-34	34	26.73	8.74	79.70	[3]
ResNet-50	50	24.01	7.02	153.90	[3]
ResNet-101	101	22.44	6.21	235.33	[3]
ResNet-152	152	22.16	6.16	328.90	[3]
ResNet-200	200	21.66	5.79	-	[4]

Top-1 and Top-5 error are single-crop error rates on the ILSVRC 2012 Validation set. Speed is the total time for a forward and backward pass on a GTX 1080 with cuDNN 5.0.

We use the following GPUs for benchmarking:

GPU	Memory	Architecture	CUDA Cores	FP32 TFLOPS	Release Date
GeForce GTX Titan X	12GB GDDR5	Maxwell	3072	6.14	March 2015
GeForce GTX 1080	8GB GDDRX5	Pascal	2560	8.87	May 2016

AlexNet

(input 16 x 3 x 224 x 224)

We use the BVLC AlexNet from Caffe.

AlexNet uses grouped convolutions; this was a strategy to allow model parallelism over two GTX 580 GPUs, which had only 3GB of memory each. Grouped convolutions are no longer commonly used, and are not even implemented by the torch/nn backend; therefore we can only benchmark AlexNet using cuDNN.

GPU	Forward (ms)	Backward (ms)	Total (ms)
GeForce GTX 1080 (cuDNN 5005)	7.36	15.83	23.18
GeForce GTX TITAN X (cuDNN 5005)	7.02	16.69	23.71

VGG-16

(input 16 x 3 x 224 x 224)

This is Model D in [2] used in the ILSVRC-2014 competition, available here.

GPU	Forward (ms)	Backward (ms)	Total (ms)
GeForce GTX 1080 (cuDNN 5005)	66.56	165.98	232.55
GeForce GTX TITAN X (cuDNN 5005)	76.15	186.28	262.42
GeForce GTX 1080 (nn)	143.81	378.61	522.42
GeForce GTX TITAN X (nn)	172.56	415.41	587.97
CPU: Dual Intel Xeon E5-2630 v3	3101.76	5393.72	8495.48

VGG-19

(input 16 x 3 x 224 x 224)

This is Model E in [2] used in the ILSVRC-2014 competition, available here.

GPU	Forward (ms)	Backward (ms)	Total (ms)
GeForce GTX 1080 (cuDNN 5005)	80.39	201.31	281.69
GeForce GTX TITAN X (cuDNN 5005)	93.83	229.57	323.40
GeForce GTX 1080 (nn)	176.45	453.63	630.08
GeForce GTX TITAN X (nn)	215.55	494.33	709.88
CPU: Dual Intel Xeon E5-2630 v3	3609.78	6239.45	9849.23

ResNet-18

(input 16 x 3 x 224 x 224)

This is the 18-layer model described in [3] and implemented in fb.resnet.torch.

GPU	Forward (ms)	Backward (ms)	Total (ms)
GeForce GTX 1080 (cuDNN 5005)	14.69	32.38	47.07
GeForce GTX TITAN X (cuDNN 5005)	16.97	36.86	53.84
GeForce GTX 1080 (nn)	43.05	78.95	122.00
GeForce GTX TITAN X (nn)	55.20	95.57	150.76
CPU: Dual Intel Xeon E5-2630 v3	847.46	1348.33	2195.78

ResNet-34

(input 16 x 3 x 224 x 224)

This is the 34-layer model described in [3] and implemented in fb.resnet.torch.

GPU	Forward (ms)	Backward (ms)	Total (ms)
GeForce GTX 1080 (cuDNN 5005)	24.83	54.86	79.70
GeForce GTX TITAN X (cuDNN 5005)	28.72	63.22	91.94
GeForce GTX 1080 (nn)	84.27	138.04	222.31
GeForce GTX TITAN X (nn)	109.75	164.94	274.69
CPU: Dual Intel Xeon E5-2630 v3	1530.01	2435.20	3965.21

ResNet-50

(input 16 x 3 x 224 x 224)

This is the 50-layer model described in [3] and implemented in fb.resnet.torch.

GPU	Forward (ms)	Backward (ms)	Total (ms)
GeForce GTX 1080 (cuDNN 5005)	50.67	103.24	153.90
GeForce GTX TITAN X (cuDNN 5005)	56.42	114.60	171.02
GeForce GTX 1080 (nn)	109.81	201.66	311.47
GeForce GTX TITAN X (nn)	136.37	245.99	382.36
CPU: Dual Intel Xeon E5-2630 v3	2477.61	4149.64	6627.25

ResNet-101

(input 16 x 3 x 224 x 224)

This is the 101-layer model described in [3] and implemented in fb.resnet.torch.

GPU	Forward (ms)	Backward (ms)	Total (ms)
GeForce GTX 1080 (cuDNN 5005)	77.77	157.56	235.33
GeForce GTX TITAN X (cuDNN 5005)	88.30	171.82	260.12
GeForce GTX 1080 (nn)	203.33	321.60	524.93
GeForce GTX TITAN X (nn)	258.26	404.16	662.42
CPU: Dual Intel Xeon E5-2630 v3	4414.91	6891.33	11306.24

ResNet-152

(input 16 x 3 x 224 x 224)

This is the 101-layer model described in [3] and implemented in fb.resnet.torch.

GPU	Forward (ms)	Backward (ms)	Total (ms)
GeForce GTX 1080 (cuDNN 5005)	109.93	218.97	328.90
GeForce GTX TITAN X (cuDNN 5005)	125.69	241.28	366.97
GeForce GTX 1080 (nn)	299.12	460.95	760.07
GeForce GTX TITAN X (nn)	379.79	579.63	959.42
CPU: Dual Intel Xeon E5-2630 v3	6572.17	10300.61	16872.78

ResNet-200

(input 16 x 3 x 224 x 224)

This is the 200-layer model described in [4] and implemented in fb.resnet.torch.

Even with a batch size of 16, the 8GB GTX 1080 did not have enough memory to run the model.

GPU	Forward (ms)	Backward (ms)	Total (ms)
GeForce GTX TITAN X (cuDNN 5005)	171.15	322.66	493.82
GeForce GTX TITAN X (nn)	491.69	806.95	1298.65
CPU: Dual Intel Xeon E5-2630 v3	8666.43	13758.73	22425.16

Citations

[1] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet Classification with Deep Convolutional Neural Networks." NIPS 2012 [2] Karen Simonyan and Andrew Zisserman. "Very Deep Convolutional Networks for Large-Scale Image Recognition." ICLR 2015 [3] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep Residual Learning for Image Recognition." CVPR 2016. [4] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Identity Mappings in Deep Residual Networks." ECCV 2016.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.gitignore		.gitignore
README.md		README.md
analyze_cnn_benchmark_results.py		analyze_cnn_benchmark_results.py
cnn_benchmark.lua		cnn_benchmark.lua
convert_model.lua		convert_model.lua
run_cnn_benchmarks.py		run_cnn_benchmarks.py
utils.lua		utils.lua

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.gitignore

.gitignore

README.md

README.md

analyze_cnn_benchmark_results.py

analyze_cnn_benchmark_results.py

cnn_benchmark.lua

cnn_benchmark.lua

convert_model.lua

convert_model.lua

run_cnn_benchmarks.py

run_cnn_benchmarks.py

utils.lua

utils.lua

Repository files navigation

cnn-benchmarks

AlexNet

VGG-16

VGG-19

ResNet-18

ResNet-34

ResNet-50

ResNet-101

ResNet-152

ResNet-200

Citations

About

Releases

Packages

Languages

pgericson/cnn-benchmarks

Folders and files

Latest commit

History

Repository files navigation

cnn-benchmarks

AlexNet

VGG-16

VGG-19

ResNet-18

ResNet-34

ResNet-50

ResNet-101

ResNet-152

ResNet-200

Citations

About

Resources

Stars

Watchers

Forks

Languages