Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

huge performance drop after some FLOPS/byte point #46

Open
edisonchan opened this issue Nov 7, 2023 · 3 comments
Open

huge performance drop after some FLOPS/byte point #46

edisonchan opened this issue Nov 7, 2023 · 3 comments

Comments

@edisonchan
Copy link

edisonchan commented Nov 7, 2023

I have try to build and run mixbench-ocl on Snapdragon 8 Gen2, its GPU is Adreno.

Total global   mem:    7629 MB
--
Max allowed buffer:  1024 MB
OpenCL version:      OpenCL 3.0 Adreno(TM) 740
Total CUs:           6
-----------------------------------------------------------------------
Buffer size:            256MB
Workgroup size:         256
Elements per workitem:  8
Workitem fusion degree: 4
Workitem stride:        NDRange
Buffer allocation:      Device allocated
Timer:                  CL event based
Warning:                Double precision computations   are not supported
Loading kernel source file...
Precompilation of kernels...   [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>]

image


----------------------------------------------------------------------------- CSV data -----------------------------------------------------------------------------
Experiment ID, Single Precision ops,,,,              Double precision ops,,,,              Half precision ops,,,,                Integer operations,,,
Compute iters, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Iops/byte, ex.time,   GIOPS, GB/sec
            0,      0.250,    2.41,   13.95,  55.79,      0.125,    0.00,     inf,    inf,      0.500,    2.39,   28.10,  56.21,     0.250,    2.43,   13.81,  55.25
            1,      0.750,    2.41,   41.82,  55.76,      0.375,    0.00,     inf,    inf,      1.500,    2.38,   84.63,  56.42,     0.750,    2.43,   41.51,  55.35
            2,      1.250,    2.39,   70.08,  56.07,      0.625,    0.00,     inf,    inf,      2.500,    2.38,  140.88,  56.35,     1.250,    2.43,   69.13,  55.30
            3,      1.750,    2.39,   98.11,  56.06,      0.875,    0.00,     inf,    inf,      3.500,    2.36,  198.70,  56.77,     1.750,    2.40,   97.74,  55.85
            4,      2.250,    2.38,  126.99,  56.44,      1.125,    0.00,     inf,    inf,      4.500,    2.35,  256.47,  56.99,     2.250,    2.40,  125.67,  55.85
            5,      2.750,    2.41,  153.02,  55.65,      1.375,    0.00,     inf,    inf,      5.500,    2.36,  313.06,  56.92,     2.750,    2.40,  153.86,  55.95
            6,      3.250,    2.38,  183.44,  56.44,      1.625,    0.00,     inf,    inf,      6.500,    2.37,  368.58,  56.70,     3.250,    2.43,  179.21,  55.14
            7,      3.750,    2.41,  208.94,  55.72,      1.875,    0.00,     inf,    inf,      7.500,    2.40,  419.61,  55.95,     3.750,    2.41,  209.02,  55.74
            8,      4.250,    2.38,  239.18,  56.28,      2.125,    0.00,     inf,    inf,      8.500,    2.35,  485.03,  57.06,     4.250,    2.40,  237.47,  55.88
            9,      4.750,    2.37,  269.11,  56.66,      2.375,    0.00,     inf,    inf,      9.500,    2.35,  543.27,  57.19,     4.750,    2.40,  266.06,  56.01
           10,      5.250,    2.36,  298.05,  56.77,      2.625,    0.00,     inf,    inf,     10.500,    2.34,  601.25,  57.26,     5.250,    2.40,  293.48,  55.90
           11,      5.750,    2.37,  325.63,  56.63,      2.875,    0.00,     inf,    inf,     11.500,    2.35,  657.36,  57.16,     5.750,    2.40,  320.91,  55.81
           12,      6.250,    2.37,  354.25,  56.68,      3.125,    0.00,     inf,    inf,     12.500,    3.94,  425.39,  34.03,     6.250,    2.40,  349.67,  55.95
           13,      6.750,    2.36,  383.09,  56.75,      3.375,    0.00,     inf,    inf,     13.500,    4.23,  428.55,  31.74,     6.750,    2.40,  376.88,  55.83
           14,      7.250,    2.36,  411.82,  56.80,      3.625,    0.00,     inf,    inf,     14.500,    4.53,  429.72,  29.64,     7.250,    2.41,  403.94,  55.72
           15,      7.750,    2.37,  439.65,  56.73,      3.875,    0.00,     inf,    inf,     15.500,    4.81,  432.70,  27.92,     7.750,    2.44,  425.78,  54.94
           16,      8.250,    2.36,  468.37,  56.77,      4.125,    0.00,     inf,    inf,     16.500,    5.11,  433.56,  26.28,     8.250,    2.53,  437.30,  53.01
           17,      8.750,    2.36,  496.81,  56.78,      4.375,    0.00,     inf,    inf,     17.500,    5.39,  435.60,  24.89,     8.750,    2.64,  445.04,  50.86
           18,      9.250,    2.36,  525.14,  56.77,      4.625,    0.00,     inf,    inf,     18.500,    5.69,  436.40,  23.59,     9.250,    2.73,  455.11,  49.20
           20,     10.250,    2.36,  581.97,  56.78,      5.125,    0.00,     inf,    inf,     20.500,    6.27,  438.90,  21.41,    10.250,    2.95,  466.98,  45.56
           22,     11.250,    2.36,  639.51,  56.85,      5.625,    0.00,     inf,    inf,     22.500,    6.85,  440.81,  19.59,    11.250,    3.19,  472.88,  42.03
           24,     12.250,    2.36,  696.74,  56.88,      6.125,    0.00,     inf,    inf,     24.500,   12.12,  271.22,  11.07,    12.250,    3.45,  477.12,  38.95
           28,     14.250,    2.36,  810.49,  56.88,      7.125,    0.00,     inf,    inf,     28.500,   14.01,  272.98,   9.58,    14.250,    3.95,  483.94,  33.96
           32,     16.250,    2.36,  922.64,  56.78,      8.125,    0.00,     inf,    inf,     32.500,   15.90,  274.33,   8.44,    16.250,    4.46,  488.71,  30.07
           40,     20.250,    2.37, 1148.26,  56.70,     10.125,    0.00,     inf,    inf,     40.500,   19.68,  276.22,   6.82,    20.250,    5.49,  495.26,  24.46
           48,     24.250,    2.38, 1369.75,  56.48,     12.125,    0.00,     inf,    inf,     48.500,   23.46,  277.49,   5.72,    24.250,    6.51,  499.82,  20.61
           56,     28.250,    2.37, 1597.06,  56.53,     14.125,    0.00,     inf,    inf,     56.500,   27.23,  278.46,   4.93,    28.250,    7.54,  502.81,  17.80
           64,     32.250,   36.46,  118.70,   3.68,     16.125,    0.00,     inf,    inf,     64.500,   31.02,  279.10,   4.33,    32.250,   41.67,  103.89,   3.22
           80,     40.250,   42.93,  125.84,   3.13,     20.125,    0.00,     inf,    inf,     80.500,   38.58,  280.03,   3.48,    40.250,   49.35,  109.47,   2.72
           96,     48.250,   49.30,  131.36,   2.72,     24.125,    0.00,     inf,    inf,     96.500,   46.15,  280.68,   2.91,    48.250,   57.18,  113.26,   2.35
          128,     64.250,   62.33,  138.34,   2.15,     32.125,    0.00,     inf,    inf,    128.500,   61.26,  281.52,   2.19,    64.250,   72.67,  118.67,   1.85
          192,     96.250,   88.14,  146.57,   1.52,     48.125,    0.00,     inf,    inf,    192.500,   91.53,  282.29,   1.47,    96.250,  106.47,  121.34,   1.26
          256,    128.250,  117.05,  147.06,   1.15,     64.125,    0.00,     inf,    inf,    256.500,  121.77,  282.73,   1.10,   128.250,  137.89,  124.84,   0.97
--------------------------------------------------------------------------------------------------------------------------------------------------------------------

What reason cause this "problem"?

@ekondis
Copy link
Owner

ekondis commented Jan 5, 2024

This can happen. One possibility could be potentially register spilling occurring

If you have time, you could experiment by manually controlling the unroll factor of the loop. For example, you could add a #pragma unroll 16 directive before line:

for(int i=0; i<COMPUTE_ITERATIONS; i++){

@edisonchan
Copy link
Author

edisonchan commented Oct 19, 2024

This can happen. One possibility could be potentially register spilling occurring

If you have time, you could experiment by manually controlling the unroll factor of the loop. For example, you could add a #pragma unroll 16 directive before line:

for(int i=0; i<COMPUTE_ITERATIONS; i++){

I have try, 16 is not enough here, 128 maybe the best number, but still have a huge drop after 128 Compute iters:

LD_LIBRARY_PATH=/data/data/com.termux/files/usr/lib:/system/vendor/lib64 ./mixbench-ocl
mixbench-ocl (v0.04-13-g597b700)
Use "-h" argument to see available options
------------------------ Device specifications ------------------------
Platform:            QUALCOMM Snapdragon(TM)
Device:              QUALCOMM Adreno(TM) 750/QUALCOMM
Driver version:      OpenCL 3.0 QUALCOMM build: commit unknown Compiler E031.45.02.11
Address bits:        64
GPU clock rate:      1 MHz
Total global mem:    7631 MB
Max allowed buffer:  1024 MB
OpenCL version:      OpenCL 3.0 Adreno(TM) 750
Total CUs:           6
-----------------------------------------------------------------------
Buffer size:            256MB
Workgroup size:         256
Elements per workitem:  8
Workitem fusion degree: 4
Workitem stride:        NDRange
Buffer allocation:      Device allocated
Timer:                  CL event based
Warning:                Double precision computations are not supported
Loading kernel source file...
Precompilation of kernels... [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>]
----------------------------------------------------------------------------- CSV data -----------------------------------------------------------------------------
Experiment ID, Single Precision ops,,,,              Double precision ops,,,,              Half precision ops,,,,                Integer operations,,,
Compute iters, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Iops/byte, ex.time,   GIOPS, GB/sec
            0,      0.250,    2.23,   15.06,  60.24,      0.125,    0.00,     inf,    inf,      0.500,    2.22,   30.20,  60.40,     0.250,    2.23,   15.03,  60.14
            1,      0.750,    2.22,   45.33,  60.44,      0.375,    0.00,     inf,    inf,      1.500,    2.22,   90.57,  60.38,     0.750,    2.23,   45.12,  60.16
            2,      1.250,    2.22,   75.55,  60.44,      0.625,    0.00,     inf,    inf,      2.500,    2.22,  151.02,  60.41,     1.250,    2.23,   75.34,  60.27
            3,      1.750,    2.21,  106.08,  60.62,      0.875,    0.00,     inf,    inf,      3.500,    2.22,  211.87,  60.53,     1.750,    2.24,  104.95,  59.97
            4,      2.250,    2.21,  136.72,  60.77,      1.125,    0.00,     inf,    inf,      4.500,    2.21,  272.69,  60.60,     2.250,    2.23,  135.54,  60.24
            5,      2.750,    2.22,  166.34,  60.49,      1.375,    0.00,     inf,    inf,      5.500,    2.21,  333.44,  60.63,     2.750,    2.24,  164.48,  59.81
            6,      3.250,    2.22,  196.06,  60.33,      1.625,    0.00,     inf,    inf,      6.500,    2.22,  392.97,  60.46,     3.250,    2.24,  194.56,  59.86
            7,      3.750,    2.22,  226.74,  60.46,      1.875,    0.00,     inf,    inf,      7.500,    2.22,  453.43,  60.46,     3.750,    2.24,  224.49,  59.86
            8,      4.250,    2.21,  257.75,  60.65,      2.125,    0.00,     inf,    inf,      8.500,    2.22,  514.36,  60.51,     4.250,    2.24,  254.54,  59.89
            9,      4.750,    2.22,  287.67,  60.56,      2.375,    0.00,     inf,    inf,      9.500,    2.22,  574.88,  60.51,     4.750,    2.24,  284.74,  59.95
           10,      5.250,    2.22,  317.55,  60.49,      2.625,    0.00,     inf,    inf,     10.500,    2.22,  634.80,  60.46,     5.250,    2.24,  314.72,  59.95
           11,      5.750,    2.22,  347.95,  60.51,      2.875,    0.00,     inf,    inf,     11.500,    2.22,  695.90,  60.51,     5.750,    2.25,  343.59,  59.75
           12,      6.250,    2.22,  378.56,  60.57,      3.125,    0.00,     inf,    inf,     12.500,    2.22,  756.77,  60.54,     6.250,    2.24,  373.98,  59.84
           13,      6.750,    2.22,  408.28,  60.49,      3.375,    0.00,     inf,    inf,     13.500,    2.21,  819.86,  60.73,     6.750,    2.25,  403.39,  59.76
           14,      7.250,    2.22,  437.71,  60.37,      3.625,    0.00,     inf,    inf,     14.500,    2.22,  876.63,  60.46,     7.250,    2.24,  434.21,  59.89
           15,      7.750,    2.22,  468.82,  60.49,      3.875,    0.00,     inf,    inf,     15.500,    2.21,  940.02,  60.65,     7.750,    2.24,  463.52,  59.81
           16,      8.250,    2.21,  499.93,  60.60,      4.125,    0.00,     inf,    inf,     16.500,    2.21,  999.86,  60.60,     8.250,    2.24,  495.01,  60.00
           17,      8.750,    2.22,  529.98,  60.57,      4.375,    0.00,     inf,    inf,     17.500,    2.21, 1061.43,  60.65,     8.750,    2.23,  526.15,  60.13
           18,      9.250,    2.21,  560.98,  60.65,      4.625,    0.00,     inf,    inf,     18.500,    2.21, 1122.48,  60.67,     9.250,    2.24,  553.49,  59.84
           20,     10.250,    2.25,  611.44,  59.65,      5.125,    0.00,     inf,    inf,     20.500,    2.22, 1238.24,  60.40,    10.250,    2.28,  602.33,  58.76
           22,     11.250,    2.22,  681.40,  60.57,      5.625,    0.00,     inf,    inf,     22.500,    2.23, 1354.21,  60.19,    11.250,    2.43,  621.13,  55.21
           24,     12.250,    2.21,  743.35,  60.68,      6.125,    0.00,     inf,    inf,     24.500,    2.20, 1492.05,  60.90,    12.250,    2.60,  631.64,  51.56
           28,     14.250,    2.21,  866.21,  60.79,      7.125,    0.00,     inf,    inf,     28.500,    2.20, 1737.26,  60.96,    14.250,    2.98,  641.85,  45.04
           32,     16.250,    2.21,  984.71,  60.60,      8.125,    0.00,     inf,    inf,     32.500,    2.21, 1971.92,  60.67,    16.250,    3.36,  648.53,  39.91
           40,     20.250,    2.21, 1230.94,  60.79,     10.125,    0.00,     inf,    inf,     40.500,    2.20, 2466.45,  60.90,    20.250,    4.13,  657.63,  32.48
           48,     24.250,    2.22, 1467.45,  60.51,     12.125,    0.00,     inf,    inf,     48.500,    2.22, 2934.90,  60.51,    24.250,    4.90,  663.57,  27.36
           56,     28.250,    2.22, 1708.71,  60.49,     14.125,    0.00,     inf,    inf,     56.500,    2.32, 3265.60,  57.80,    28.250,    5.68,  667.65,  23.63
           64,     32.250,    2.22, 1947.96,  60.40,     16.125,    0.00,     inf,    inf,     64.500,    2.54, 3409.62,  52.86,    32.250,    6.45,  670.67,  20.80
           80,     40.250,    2.25, 2399.93,  59.63,     20.125,    0.00,     inf,    inf,     80.500,   49.72,  217.32,   2.70,    40.250,    8.01,  674.70,  16.76
           96,     48.250,    2.56, 2531.71,  52.47,     24.125,    0.00,     inf,    inf,     96.500,   59.54,  217.55,   2.25,    48.250,    9.56,  677.62,  14.04
          128,     64.250,    3.32, 2597.59,  40.43,     32.125,    0.00,     inf,    inf,    128.500,   79.17,  217.85,   1.70,    64.250,   12.66,  681.16,  10.60
          192,     96.250,   52.40,  246.51,   2.56,     48.125,    0.00,     inf,    inf,    192.500,  118.44,  218.14,   1.13,    96.250,   72.07,  179.24,   1.86
          256,    128.250,   69.67,  247.06,   1.93,     64.125,    0.00,     inf,    inf,    256.500,  157.72,  218.28,   0.85,   128.250,   95.91,  179.48,   1.40
--------------------------------------------------------------------------------------------------------------------------------------------------------------------

@ekondis
Copy link
Owner

ekondis commented Oct 24, 2024

So, this is device dependent. Maybe, for the OpenCL implementation this could be exposed as a parameter to the benchmark to provide more flexibility to the user. I'm not sure if it's worth though.
In addition, as you see the threshold of compute iters after which the performance drops varies not only on the device but on the type of data (128 for SP float vs 64 for HP float).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants