Skip to content

Latest commit

 

History

History
108 lines (97 loc) · 12.9 KB

cuda_benchmark.md

File metadata and controls

108 lines (97 loc) · 12.9 KB

CUDA benchmark

1. Compilation

Please refer to Building commands on linux and Building commands on windows in CUDA Platform Guide.

2. Running benchmark

Please refer to How to run benchmark in CUDA Platform Guide.

3. Testing configuration

There are three implementations of a function in our benchmark, one is the CUDA implementation in ppl.cv, the others are the x86 and cuda counterparts of the former in OpenCV. They all run on a serial of parameter combinations covering common usage and the elapsed time is recorded. Besides the particular parameters of a function, the supported data types(uchar/float), the channels(1/3/4) and the commonly used image sizes are tested for each function. The input images are composed of randomly generated pixel values.

We describe performance in terms of acceleration ratio using the x86 or CUDA implementation which is the fastest in OpenCV as the baseline. For each function, we sort the speedups and pick out the minimum speedup, the median speedup and the maximum speedup to form a compact box diagram to characterize acceleration ratio instead of average speedup.

Information of machines:

  • X86 desktop computer with Geforce GTX 1060 GPU:
    • CPU: Intel® Core™ i7-7700 CPU (8 cores, 3.60GHz)
    • GPU: GeForce GTX 1060 (1280 CUDA Cores, 1772 MHz)
    • Host memory: 32 GB
    • Device memory: 6 GB
    • OS: ubuntu 16.04
  • X86 Cloud server with Tesla V100 GPU:
    • CPU: Intel(R) Xeon(R) Gold 6146 CPU (12 cores, 3.20 GHz)
    • GPU: Tesla V100 (5,120 cores, 1230 MHz)
    • Host memory: 396 GB
    • Device memory: 64 GB
    • OS: ubuntu 16.04

4. Speedup statistics

function Geforce GTX 1060 Tesla V100
Abs (1.027618, 1.216567, 2.522977)(schar), (1.002612, 1.079896, 1.212746)(float) (1.920000, 2.250000, 4.000000)(schar), (1.428571, 1.600000, 2.666667)(float)
Add (1.037273, 1.317647, 3.866667)(uchar), (1.012752, 1.135303, 1.318750)(float) (1.484422, 2.346158, 3.628854)(uchar), (1.169862, 1.400410, 2.473603)(float)
AddWeighted (0.911404, 1.069565, 2.500000)(uchar), (1.032401, 1.068750, 1.312500)(float) (0.887411, 1.310023, 2.540298)(uchar), (1.162151, 1.358260, 2.297834)(float)
Subtract (1.062500, 1.321429, 3.444444)(uchar), (1.025086, 1.073438, 1.256250)(float) (1.187220, 1.922802, 3.456477)(uchar), (1.200039, 1.639410, 3.316506)(float)
Mul (1.083189, 1.327586, 3.193548)(uchar), (1.020218, 1.062124, 1.213873)(float) (1.591886, 2.275094, 3.521160)(uchar), (1.169695, 1.402987, 2.497707)(float)
Div (1.197259, 1.318625, 1.967651)(uchar), (1.020744, 1.091126, 1.343856)(float) (1.393939, 2.000000, 2.666667)(uchar), (1.465116, 1.800000, 3.666667)(float)
BGR2BGRA (1.076923, 1.188437, 2.466667)(uchar), (1.009084, 1.058051, 1.319444)(float) (0.996509, 2.210639, 3.574978)(uchar), (1.191974, 1.460745, 3.175911)(float)
BGRA2BGR (1.061947, 1.235294, 2.666667)(uchar), (1.013997, 1.085965, 1.347518)(float) (1.198190, 1.848153, 2.867657)(uchar), (1.173317, 1.402871, 2.416639)(float)
BGR2RGB (1.052308, 1.251572, 2.700000)(uchar), (1.009910, 1.076321, 1.300000)(float) (1.209468, 1.870198, 2.894736)(uchar), (1.178372, 1.450653, 2.412296)(float)
BGRA2RGBA (1.052252, 1.294118, 3.550000)(uchar), (1.005998, 1.061538, 1.250000)(float) (1.008355, 2.209785, 3.238027)(uchar), (1.184873, 1.405101, 3.121947)(float)
BGR2GRAY (1.297546, 1.875000, 3.272727)(uchar), (1.018468, 1.121212, 2.225000)(float) (1.398824, 2.286852, 3.272300)(uchar), (1.187580, 2.308537, 3.109362)(float)
BGRA2GRAY (1.172324, 1.976190, 3.850000)(uchar), (1.022443, 1.125000, 2.050000)(float) (1.324805, 2.282110, 3.271641)(uchar), (1.212494, 1.904405, 3.039953)(float)
GRAY2BGR (1.063415, 1.540000, 2.666667)(uchar), (1.019014, 1.107143, 1.454545)(float) (1.102559, 1.845679, 2.813610)(uchar), (1.187604, 1.552207, 2.345931)(float)
GRAY2BGRA (1.211268, 1.688889, 3.550000)(uchar), (1.016974, 1.095238, 1.960000)(float) (0.964918, 2.304777, 3.295088)(uchar), (1.212406, 1.816164, 3.188364)(float)
BGR2YCrCb (1.115718, 1.437500, 2.933333)(uchar), (1.009922, 1.078431, 1.357143)(float) (1.241286, 1.960112, 2.976867)(uchar), (1.197845, 1.590102, 2.509307)(float)
YCrCb2BGR (1.000899, 1.245714, 2.275000)(uchar), (1.023173, 1.066667, 1.275168)(float) (1.240044, 1.953158, 3.116737)(uchar), (1.199772, 1.588004, 2.517634)(float)
BGR2HSV (1.047619, 1.124324, 1.440000)(uchar), (0.997503, 1.044550, 1.516779)(float) (1.244666, 1.442114, 2.198618)(uchar), (1.185546, 1.439000, 2.350594)(float)
HSV2BGR (1.087786, 1.182353, 1.625000)(uchar), (1.296982, 1.423002, 1.751773)(float) (1.205051, 1.746410, 2.417964)(uchar), (1.799301, 1.971470, 2.591022)(float)
BGR2LAB (1.057958, 1.116981, 1.347518)(uchar), (4.817276, 4.958534, 5.178571)(float) (1.218432, 1.340922, 2.003351)(uchar), (11.744559, 16.295735, 16.833752)(float)
LAB2BGR (3.492866, 3.535597, 3.842932)(uchar), (1.005988, 1.049924, 1.215789)(float) (7.991927, 10.267584, 11.016621)(uchar), (1.196817, 1.574198, 2.219194)(float)
NV122BGR (11.103975, 14.618688, 16.317706)(uchar) (22.728164, 76.905314, 80.594786)(uchar)
NV122BGRA (16.251937, 17.611833, 18.670433)(uchar) (25.602489, 92.221827, 126.598710)(uchar)
NV212BGR (11.066975, 15.031184, 16.289727)(uchar) (21.985959, 76.702585, 80.624540)(uchar)
NV212BGRA (16.354588, 17.128284, 18.803167)(uchar) (25.452630, 92.121791, 126.082495)(uchar)
BGR2I420 (10.693974, 14.342727, 15.599600)(uchar) (18.157460, 65.449673, 96.908186)(uchar)
BGRA2I420 (10.988100, 13.469619, 14.967882)(uchar) (19.512714, 71.242601, 102.685346)(uchar)
I4202BGR (12.730026, 14.859967, 16.374234)(uchar) (21.091450, 72.681859, 75.285153)(uchar)
I4202BGRA (16.132000, 16.995249, 18.781433)(uchar) (24.304845, 89.937122, 126.809467)(uchar)
YUV2GRAY (1.474327, 2.118086, 3.434171)(uchar) (1.091379, 14.942844, 20.622220)(uchar)
UYVY2BGR (12.124640, 14.798133, 15.474752)(uchar) (23.692301, 54.666932, 60.882893)(uchar)
UYVY2GRAY (13.245150, 17.255159, 29.511351)(uchar) (11.636411, 47.939781, 54.628228)(uchar)
YUYV2BGR (10.991902, 14.631356, 15.423861)(uchar) (23.766830, 55.792111, 62.559629)(uchar)
YUYV2GRAY (15.908143, 18.193933, 28.971842)(uchar) (11.908574, 45.874270, 54.474318)(uchar)
AdaptiveThreshold (3.573068, 14.689693, 25.344412)(uchar) (21.294030, 73.243000, 102.870909)(uchar)
BilateralFilter (1.168011, 1.525880, 2.761905)(uchar), (1.030715, 1.982063, 2.684054)(float) (81.962000, 170.308496, 2558.305913)(uchar), (237.644286, 462.073981, 574.847162)(float)
BitwiseAnd (1.000000, 3.181818, 8.497829)(uchar) (1.147059, 4.000000, 35.471800)(uchar)
BoxFilter (1.480552, 4.495130, 7.620557)(uchar), (1.448339, 6.352779, 13.262363)(float) (7.124271, 25.163939, 46.066635)(uchar), (14.106000, 38.753665, 77.675652)(float)
CalcHist (1.150568, 1.770833, 2.394737)(uchar) (1.857143, 3.000000, 3.533333)(uchar)
ConvertTo (0.993691, 1.073810, 1.603448)(uchar), (0.998486, 1.052786, 1.187879)(float) (0.670103, 1.344444, 4.500000)(uchar), (0.780702, 1.176471, 2.666667)(float)
CopyMakeborder (1.000000, 1.370000, 2.757983)(uchar), (1.057269, 1.162304, 3.528545)(float) (1.596447, 1.717750, 14.224575)(uchar), (1.389611, 2.373997, 21.272053)(float)
Crop (5.246094, 10.568061, 17.913457)(uchar), (3.501557, 12.897787, 23.676354)(float) (8.071733, 48.336800, 62.190875)(uchar), (27.270250, 42.298500, 82.963085)(float)
Dilate (0.700053, 3.018605, 36.491972)(uchar), (1.233902, 4.462496, 30.425474)(float) (1.521163, 12.164694, 39.407692)(uchar), (4.035320, 33.010542, 91.460005)(float)
DistanceTransform (4.748947, 10.090304, 53.176053)(float) (5.214643, 15.715885, 175.633388)(float)
EqualizeHist (1.282700, 1.964115, 3.808168)(uchar) (1.896552, 2.444444, 22.475000)(uchar)
Erode (0.712260, 2.985459, 37.392623)(uchar), (1.166272, 4.408501, 30.251434)(float) (1.519858, 12.016352, 40.007692)(uchar), (4.043243, 31.962500, 91.159956)(float)
Filter2D (0.857971, 2.707717, 10.080080)(uchar), (1.158228, 2.812923, 11.549172)(float) (1.064132, 5.109709, 10.239130)(uchar), (1.180978, 3.344444, 10.202247)(float)
Flip (1.166667, 1.250000, 1.885246)(uchar), (1.020772, 1.088785, 1.247059)(float) (2.266297, 2.692010, 2.764538)(uchar), (1.430543, 1.496240, 2.699739)(float)
GaussianBlur (1.642779, 3.304591, 12.553922)(uchar), (1.660031, 2.951287, 6.443099)(float) (9.550909, 15.166667, 77.000000)(uchar), (9.790741, 22.002174, 56.855556)(float)
GuidedFilter (1.841109, 4.446838, 11.442694)(uchar), (1.914174, 4.654867, 12.122168)(float) (6.002427, 33.592295, 85.662757)(uchar), (6.084409, 29.738052, 103.347549)(float)
Integral (0.336724, 0.616571, 1.143994)(uchar),(0.560649, 1.074805, 2.191565)(float) (0.447493, 1.962876, 2.007879)(uchar), (0.788689, 2.565471, 3.641776)(float)
Laplacian (5.719577, 9.622927, 55.736000)(uchar), (2.665474, 5.248192, 15.066487)(float) (31.377286, 75.869500, 234.550952)(uchar), (17.290625, 35.339333, 106.552075)(float)
Mean (0.498830, 18.990729, 59.166509)(uchar), (1.752304, 12.701592, 34.480694)(float) (0.802057, 35.056429, 221.736000)(uchar), (6.057550, 39.561500, 157.173500)(float)
MeanStdDev (0.337072, 11.151297, 40.957974)(uchar), (4.841667, 8.282971, 17.452190)(float) (0.536569, 23.880556, 145.785484)(uchar), (8.946910, 32.976917, 94.712100)(float)
MedianBlur (0.351890, 1.116563, 3.519904)(uchar), (2.136163, 3.724323, 4.381040)(float) (3.459940, 7.941560, 23.388209)(uchar), (18.604885, 20.476500, 21.021053)(float)
Merge (2.278780, 2.644595, 6.276601)(uchar), (2.353986, 8.241453, 10.674901)(float) (3.113387, 17.123593, 20.245900)(uchar), (16.259525, 24.317619, 46.897456)(float)
MinMaxLoc (0.326772, 3.114965, 6.056915)(uchar), (1.952189, 10.600955, 16.410043)(float) (0.464410, 4.004704, 11.220379)(uchar), (1.564421, 15.698727, 33.511538)(float)
Norm (0.256201, 1.656461, 39.051089)(uchar), (1.036796, 5.491927, 31.023884)(float) (0.132152, 5.693815, 108.367206)(uchar), (1.217980, 22.487100, 111.887975)(float)
Normalize (1.477920, 6.778663, 30.741967)(uchar), (1.928009, 11.314839, 27.490531)(float) (4.358423, 15.967336, 77.594030)(uchar), (9.863286, 30.135352, 85.963588)(float)
Ones (34.007361, 102.137812, 110.859572)(uchar), (14.834672, 20.067127, 34.036056)(float) (42.736150, 170.238000, 361.399606)(uchar), (30.091450, 60.011938, 119.932833)(float)
PerspectiveTransform (24.956857, 30.550903, 53.705714)(float) (77.732667, 120.337187, 236.752000)(float)
PyrDown (0.855491, 1.760697, 3.200000)(uchar), (0.783599, 0.996094, 1.968254)(float) (0.888889, 1.840000, 3.000000)(uchar), (0.967742, 1.714286, 2.500000)(float)
PyrUp (0.982715, 1.141379, 3.225000)(uchar), (1.018668, 1.101715, 1.277778)(float) (1.092308, 1.714286, 2.750000)(uchar), (0.982332, 1.104167, 1.714286)(float)
Remap (1.000000, 1.500000, 3.093750)(uchar), (0.979498, 1.380000, 3.125000)(float) (1.192308, 2.666667, 3.333333)(uchar), (1.117647, 2.500000, 3.666667)(float)
Resize (1.000943, 1.531532, 2.875000)(uchar), (0.993286, 1.147826, 2.619048)(float) (1.030841, 2.471922, 3.428373)(uchar), (1.131494, 2.197102, 3.316012)(float)
Rotate (0.574651, 1.043956, 3.076923)(uchar), (0.546294, 0.665658, 1.033333)(float) (0.805556, 2.016667, 5.550000)(uchar), (0.480392, 0.957143, 4.000000)(float)
SepFilter2D (1.364119, 1.908174, 9.654275)(uchar), (1.341837, 1.863272, 9.666667)(short), (1.340006, 2.333333, 6.084475)(float) (9.454545, 19.282143, 84.583333)(uchar), (9.413043, 19.357143, 84.583333)(short), (10.227273, 18.721429, 63.787500)(float)
SetTo (1.000000, 1.600000, 5.615385)(uchar), (1.006349, 1.281250, 3.380952)(float) (0.735294, 2.875000, 6.153846)(uchar), (0.542540, 1.304348, 4.500000)(float)
Sobel (2.329529, 5.088146, 11.560976)(uchar), (2.374818, 4.994074, 11.308411)(short), (2.209220, 4.286020, 6.318538)(float) (20.446154, 46.628319, 72.314286)(uchar), (19.476015, 42.983740, 90.071429)(short), (21.000000, 39.061538, 70.411111)(float)
Split (1.067019, 1.238372, 3.166667)(uchar), (1.006090, 1.067797, 1.450000)(float) (1.230769, 3.000000, 4.000000)(uchar), (1.070588, 1.461538, 4.000000)(float)
Transpose (8.403667, 11.634840, 15.109060)(uchar), (5.698850, 12.042905, 15.199568)(float) (28.730750, 64.849000, 72.014231)(uchar), (28.907429, 72.115600, 126.537647)(float)
WarpAffine (0.730088, 3.073171, 96.212000)(uchar), (0.967412, 2.549020, 118.212581)(float) (1.200000, 4.000000, 260.023697)(uchar), (1.230769, 4.333333, 404.470000)(float)
WarpPerspective (1.010352, 3.270270, 172.198672)(uchar), (1.022066, 3.275000, 172.292843)(float) (1.266667, 4.000000, 673.100000)(uchar), (1.257143, 4.000000, 983.360000)(float)
Zeros (0.952381, 1.000000, 1.087719)(uchar), (0.975610, 1.001887, 1.019608)(float) (1.000000, 1.000000, 2.000000)(uchar), (1.000000, 1.000000, 1.066667)(float)