Skip to content

Latest commit

 

History

History
39 lines (28 loc) · 3.43 KB

README.md

File metadata and controls

39 lines (28 loc) · 3.43 KB

convnet-benchmarks

Easy benchmarking of all public open-source implementations of convnets. A summary is provided in the section below.

Work in progress! I am still working through each convolution module in each library, THIS IS NOT AN EXHAUSTIVE LIST!
  • After getting an initial baseline with the single module below (and getting inital benchmark scripts), I will benchmark a full AlexNet/MattNet/Overfeat

Machine: 6-core Intel i7-3930K @ 3.20GHz + NVIDIA Titan Black + Ubuntu 14.04 x86_64

###Spatial Convolution layer (3D input 3D output) #####:forward() Columns L1, L2, L3, L4, L5, Total are times in milliseconds

Original Library Class/Function Benchmarked Device L1 L2 L3 L4 L5 Total
Theano (experimental)*** pylearn2.mlp.ConvElemwise GPU 205 75 28 9 5 322
cuda-convnet2 * ConvLayer GPU 69 242 87 9 17 424
Caffe ConvolutionLayer<Dtype> GPU 102 203 158 39 52 554
Torch-7 nn.SpatialConvolutionMM GPU 105 240 168 41 55 609
cuda-convnet** pylearn2.cuda_convnet GPU 98 404 149 16 38 705
ccv ccv_convnet_layer GPU 121 437 182 23 44 809
Theano (legacy)** pylearn2.mlp.ConvElemwise GPU 418 2299 672 88 272 3749
  • * indicates that the library was tested with Torch bindings of the specific kernels.
  • ** indicates that the library was tested with Pylearn2 bindings.
  • *** This is an experimental module which used FFT to calculate convolutions. It uses a lot of memory according to @benanne
  • L1 - Input: 128x128 Batch-size 128, Feature maps: 3->96, Kernel Size: 11x11, Stride: 1x1
  • L2 - Input: 64x64 Batch-size 128, Feature maps: 64->128, Kernel Size: 9x9, Stride: 1x1
  • L3 - Input: 32x32 Batch-size 128, Feature maps: 128->128, Kernel Size: 9x9, Stride: 1x1
  • L4 - Input: 16x16 Batch-size 128, Feature maps: 128->128, Kernel Size: 7x7, Stride: 1x1
  • L5 - Input: 13x13 Batch-size 128, Feature maps: 384->384, Kernel Size: 3x3, Stride: 1x1
  • The table is ranked according to the total time (L1 + L2 + L3 + L4 + L5)