Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF #18605

Closed

Conversation

Hamlin-Li
Copy link

@Hamlin-Li Hamlin-Li commented Apr 3, 2024

Hi,
Can you help to review the patch?
This pr is based on previous work and discussion in pr 16234, pr 18294.

Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check src/jdk.incubator.vector/linux/native/libvectormath/README), rather than depends on external sleef things (header or lib) at build or run time.
Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk.

Besides of the code changes, one important task is to handle the legal process.

Thanks!

Test

tests:

  • test/jdk/jdk/incubator/vector/
  • test/hotspot/jtreg/compiler/vectorapi/

options:

  • -XX:UseSVE=1 -XX:+EnableVectorSupport -XX:+UseVectorStubs
  • -XX:UseSVE=0 -XX:+EnableVectorSupport -XX:+UseVectorStubs
  • -XX:+EnableVectorSupport -XX:-UseVectorStubs

Performance

Options

  • +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs'
  • -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs'

Float

data

Benchmark (size) Mode Cnt Error Units Score +intrinsic (UseSVE=1) Score -intrinsic Improvement(UseSVE=1) Score +intrinsic (UseSVE=0) Score -intrinsic Improvement (UseSVE=0)
Float128Vector.ACOS 1024 thrpt 10 0.015 ops/ms 245.439 101.483 2.419 245.733 102.033 2.408
Float128Vector.ASIN 1024 thrpt 10 0.013 ops/ms 296.702 103.559 2.865 296.741 103.18 2.876
Float128Vector.ATAN 1024 thrpt 10 0.004 ops/ms 196.862 49.627 3.967 195.891 49.771 3.936
Float128Vector.ATAN2 1024 thrpt 10 0.021 ops/ms 135.088 32.449 4.163 135.721 32.579 4.166
Float128Vector.CBRT 1024 thrpt 10 0.004 ops/ms 114.547 39.517 2.899 114.756 39.273 2.922
Float128Vector.COS 1024 thrpt 10 0.006 ops/ms 93.226 62.883 1.483 93.195 63.116 1.477
Float128Vector.COSH 1024 thrpt 10 0.005 ops/ms 154.498 76.58 2.017 154.147 77.026 2.001
Float128Vector.EXP 1024 thrpt 10 0.248 ops/ms 483.569 83.614 5.783 502.786 83.424 6.027
Float128Vector.EXPM1 1024 thrpt 10 0.01 ops/ms 156.338 62.091 2.518 157.589 62.008 2.541
Float128Vector.HYPOT 1024 thrpt 10 0.007 ops/ms 191.217 56.834 3.364 191.247 58.624 3.262
Float128Vector.LOG 1024 thrpt 10 0.019 ops/ms 258.223 52.005 4.965 259.642 52.018 4.991
Float128Vector.LOG10 1024 thrpt 10 0.004 ops/ms 238.916 43.311 5.516 240.135 43.352 5.539
Float128Vector.LOG1P 1024 thrpt 10 0.112 ops/ms 246.507 42.227 5.838 246.546 42.24 5.837
Float128Vector.POW 1024 thrpt 10 0.033 ops/ms 73.78 25.17 2.931 73.693 25.113 2.934
Float128Vector.SIN 1024 thrpt 10 0.004 ops/ms 95.509 62.807 1.521 95.792 62.883 1.523
Float128Vector.SINH 1024 thrpt 10 0.011 ops/ms 153.177 77.586 1.974 152.97 77.248 1.98
Float128Vector.TAN 1024 thrpt 10 0.002 ops/ms 74.394 32.662 2.278 74.491 32.639 2.282
Float128Vector.TANH 1024 thrpt 10 0.005 ops/ms 129.308 144.581 0.894 129.319 144.916 0.892
Float256Vector.ACOS 1024 thrpt 10 0.311 ops/ms 378.109 135.118 2.798 122.381 123.502 0.991
Float256Vector.ASIN 1024 thrpt 10 1.039 ops/ms 452.692 135.067 3.352 126.037 123.53 1.02
Float256Vector.ATAN 1024 thrpt 10 0.017 ops/ms 288.785 62.032 4.655 59.783 59.821 0.999
Float256Vector.ATAN2 1024 thrpt 10 0.065 ops/ms 217.573 40.843 5.327 38.337 38.352 1
Float256Vector.CBRT 1024 thrpt 10 0.042 ops/ms 185.721 49.353 3.763 46.273 46.279 1
Float256Vector.COS 1024 thrpt 10 0.036 ops/ms 163.584 78.947 2.072 70.544 70.74 0.997
Float256Vector.COSH 1024 thrpt 10 0.01 ops/ms 211.746 96.885 2.186 84.078 84.366 0.997
Float256Vector.EXP 1024 thrpt 10 0.121 ops/ms 954.69 117.145 8.15 97.97 97.713 1.003
Float256Vector.EXPM1 1024 thrpt 10 0.055 ops/ms 213.462 79.832 2.674 74.292 74.36 0.999
Float256Vector.HYPOT 1024 thrpt 10 0.052 ops/ms 306.511 74.208 4.13 68.856 69.077 0.997
Float256Vector.LOG 1024 thrpt 10 0.216 ops/ms 406.914 65.408 6.221 59.808 59.767 1.001
Float256Vector.LOG10 1024 thrpt 10 0.37 ops/ms 371.385 53.156 6.987 49.334 49.171 1.003
Float256Vector.LOG1P 1024 thrpt 10 1.851 ops/ms 397.247 52.042 7.633 50.181 50.199 1
Float256Vector.POW 1024 thrpt 10 0.048 ops/ms 115.155 27.174 4.238 24.659 24.703 0.998
Float256Vector.SIN 1024 thrpt 10 0.107 ops/ms 154.975 79.103 1.959 70.9 70.615 1.004
Float256Vector.SINH 1024 thrpt 10 0.351 ops/ms 202.683 97.643 2.076 84.587 84.371 1.003
Float256Vector.TAN 1024 thrpt 10 0.005 ops/ms 127.597 37.136 3.436 34.774 34.757 1
Float256Vector.TANH 1024 thrpt 10 1.233 ops/ms 249.084 247.272 1.007 169.903 169.805 1.001
Float512Vector.ACOS 1024 thrpt 10 0.069 ops/ms 148.467 152.264 0.975 150.131 154.717 0.97
Float512Vector.ASIN 1024 thrpt 10 0.287 ops/ms 147.144 158.074 0.931 147.251 148.71 0.99
Float512Vector.ATAN 1024 thrpt 10 0.101 ops/ms 68.498 67.987 1.008 67.968 68.131 0.998
Float512Vector.ATAN2 1024 thrpt 10 0.016 ops/ms 44.189 44.052 1.003 43.898 43.781 1.003
Float512Vector.CBRT 1024 thrpt 10 0.012 ops/ms 53.514 53.672 0.997 53.623 53.635 1
Float512Vector.COS 1024 thrpt 10 0.222 ops/ms 80.566 80.713 0.998 80.672 80.796 0.998
Float512Vector.COSH 1024 thrpt 10 0.104 ops/ms 102.175 102.038 1.001 102.303 102.009 1.003
Float512Vector.EXP 1024 thrpt 10 0.255 ops/ms 118.824 118.942 0.999 118.551 118.976 0.996
Float512Vector.EXPM1 1024 thrpt 10 0.021 ops/ms 87.363 87.153 1.002 87.842 87.387 1.005
Float512Vector.HYPOT 1024 thrpt 10 0.048 ops/ms 86.838 86.439 1.005 86.903 86.709 1.002
Float512Vector.LOG 1024 thrpt 10 0.017 ops/ms 70.794 70.746 1.001 70.469 70.62 0.998
Float512Vector.LOG10 1024 thrpt 10 0.051 ops/ms 55.821 55.85 0.999 55.883 55.773 1.002
Float512Vector.LOG1P 1024 thrpt 10 0.085 ops/ms 57.113 57.582 0.992 56.942 57.245 0.995
Float512Vector.POW 1024 thrpt 10 0.006 ops/ms 26.66 26.656 1 26.651 26.641 1
Float512Vector.SIN 1024 thrpt 10 0.067 ops/ms 80.873 80.806 1.001 80.638 80.456 1.002
Float512Vector.SINH 1024 thrpt 10 0.16 ops/ms 103.818 102.766 1.01 102.669 103.83 0.989
Float512Vector.TAN 1024 thrpt 10 0.148 ops/ms 38.107 37.971 1.004 37.938 37.862 1.002
Float512Vector.TANH 1024 thrpt 10 1.206 ops/ms 237.573 235.876 1.007 236.684 236.724 1
Float64Vector.ACOS 1024 thrpt 10 0.006 ops/ms 123.038 64.939 1.895 123.07 65.556 1.877
Float64Vector.ASIN 1024 thrpt 10 0.006 ops/ms 148.56 65.115 2.282 148.576 66.468 2.235
Float64Vector.ATAN 1024 thrpt 10 0.003 ops/ms 98.512 40.569 2.428 98.458 40.932 2.405
Float64Vector.ATAN2 1024 thrpt 10 0.004 ops/ms 67.706 24.824 2.727 68.214 25.157 2.712
Float64Vector.CBRT 1024 thrpt 10 0.001 ops/ms 57.299 29.725 1.928 57.343 29.279 1.959
Float64Vector.COS 1024 thrpt 10 0.008 ops/ms 46.689 44.153 1.057 46.67 43.683 1.068
Float64Vector.COSH 1024 thrpt 10 0.005 ops/ms 77.552 51.012 1.52 77.66 51.285 1.514
Float64Vector.EXP 1024 thrpt 10 0.257 ops/ms 242.736 54.277 4.472 248.345 54.298 4.574
Float64Vector.EXPM1 1024 thrpt 10 0.003 ops/ms 78.741 45.22 1.741 79.082 45.396 1.742
Float64Vector.HYPOT 1024 thrpt 10 0.002 ops/ms 95.716 36.135 2.649 95.702 36.424 2.627
Float64Vector.LOG 1024 thrpt 10 0.006 ops/ms 130.395 38.954 3.347 130.321 38.99 3.342
Float64Vector.LOG10 1024 thrpt 10 0.003 ops/ms 119.783 33.912 3.532 120.254 33.951 3.542
Float64Vector.LOG1P 1024 thrpt 10 0.006 ops/ms 123.966 34.381 3.606 123.984 34.291 3.616
Float64Vector.POW 1024 thrpt 10 0.003 ops/ms 36.872 21.747 1.695 36.774 21.639 1.699
Float64Vector.SIN 1024 thrpt 10 0.002 ops/ms 48.008 44.076 1.089 48.001 43.989 1.091
Float64Vector.SINH 1024 thrpt 10 0.004 ops/ms 76.711 50.893 1.507 76.936 51.236 1.502
Float64Vector.TAN 1024 thrpt 10 0.006 ops/ms 37.286 26.095 1.429 37.283 26.06 1.431
Float64Vector.TANH 1024 thrpt 10 0.004 ops/ms 64.71 79.799 0.811 64.741 79.924 0.81
FloatMaxVector.ACOS 1024 thrpt 10 0.103 ops/ms 378.138 136.187 2.777 245.725 102.05 2.408
FloatMaxVector.ASIN 1024 thrpt 10 1.013 ops/ms 452.441 135.287 3.344 296.708 103.589 2.864
FloatMaxVector.ATAN 1024 thrpt 10 0.028 ops/ms 288.802 62.021 4.657 196.817 49.824 3.95
FloatMaxVector.ATAN2 1024 thrpt 10 0.037 ops/ms 216.386 40.889 5.292 135.756 32.75 4.145
FloatMaxVector.CBRT 1024 thrpt 10 0.269 ops/ms 187.141 49.382 3.79 114.819 39.203 2.929
FloatMaxVector.COS 1024 thrpt 10 0.014 ops/ms 163.726 78.882 2.076 93.184 63.087 1.477
FloatMaxVector.COSH 1024 thrpt 10 0.006 ops/ms 212.544 97.49 2.18 154.547 77.685 1.989
FloatMaxVector.EXP 1024 thrpt 10 0.048 ops/ms 955.792 117.15 8.159 488.526 83.227 5.87
FloatMaxVector.EXPM1 1024 thrpt 10 0.01 ops/ms 213.435 79.837 2.673 157.618 62.006 2.542
FloatMaxVector.HYPOT 1024 thrpt 10 0.041 ops/ms 308.446 74.165 4.159 191.259 58.628 3.262
FloatMaxVector.LOG 1024 thrpt 10 0.105 ops/ms 405.824 65.604 6.186 257.679 51.992 4.956
FloatMaxVector.LOG10 1024 thrpt 10 0.186 ops/ms 371.417 53.204 6.981 240.117 43.427 5.529
FloatMaxVector.LOG1P 1024 thrpt 10 0.713 ops/ms 395.943 52.002 7.614 246.515 42.196 5.842
FloatMaxVector.POW 1024 thrpt 10 0.079 ops/ms 115.35 27.143 4.25 73.411 25.226 2.91
FloatMaxVector.SIN 1024 thrpt 10 0.04 ops/ms 154.421 79.424 1.944 95.548 62.973 1.517
FloatMaxVector.SINH 1024 thrpt 10 0.04 ops/ms 202.51 97.974 2.067 153.3 77.106 1.988
FloatMaxVector.TAN 1024 thrpt 10 0.013 ops/ms 127.56 36.981 3.449 74.483 32.733 2.275
FloatMaxVector.TANH 1024 thrpt 10 0.792 ops/ms 247.428 247.743 0.999 129.375 144.932 0.893
FloatScalar.ACOS 1024 thrpt 10 0.09 ops/ms 337.034 337.102 1 336.994 337.001 1
FloatScalar.ASIN 1024 thrpt 10 0.096 ops/ms 351.308 351.34 1 351.273 351.293 1
FloatScalar.ATAN 1024 thrpt 10 0.008 ops/ms 91.71 91.657 1.001 91.627 91.403 1.002
FloatScalar.ATAN2 1024 thrpt 10 0.004 ops/ms 58.171 58.206 0.999 58.21 58.184 1
FloatScalar.CBRT 1024 thrpt 10 0.112 ops/ms 67.946 67.961 1 67.97 67.973 1
FloatScalar.COS 1024 thrpt 10 0.144 ops/ms 109.93 109.944 1 109.961 110.002 1
FloatScalar.COSH 1024 thrpt 10 0.008 ops/ms 136.223 136.357 0.999 136.427 136.5 0.999
FloatScalar.EXP 1024 thrpt 10 0.141 ops/ms 176.773 176.585 1.001 176.884 176.818 1
FloatScalar.EXPM1 1024 thrpt 10 0.015 ops/ms 127.417 127.504 0.999 127.536 126.957 1.005
FloatScalar.HYPOT 1024 thrpt 10 0.006 ops/ms 162.621 162.834 0.999 162.766 162.404 1.002
FloatScalar.LOG 1024 thrpt 10 0.029 ops/ms 92.565 92.4 1.002 92.567 92.565 1
FloatScalar.LOG10 1024 thrpt 10 0.005 ops/ms 70.792 70.774 1 70.789 70.799 1
FloatScalar.LOG1P 1024 thrpt 10 0.051 ops/ms 73.908 74.572 0.991 73.898 74.61 0.99
FloatScalar.POW 1024 thrpt 10 0.003 ops/ms 30.554 30.566 1 30.561 30.556 1
FloatScalar.SIN 1024 thrpt 10 0.248 ops/ms 109.954 109.57 1.004 109.873 109.842 1
FloatScalar.SINH 1024 thrpt 10 0.005 ops/ms 139.617 139.616 1 139.432 139.242 1.001
FloatScalar.TAN 1024 thrpt 10 0.007 ops/ms 44.327 44.16 1.004 44.478 44.401 1.002
FloatScalar.TANH 1024 thrpt 10 0.362 ops/ms 545.506 545.688 1 545.744 545.604 1

Double

data

Benchmark (size) Mode Cnt Error Units Score +intrinsic (UseSVE=1) Score -intrinsic Improvement(UseSVE=1) Score +intrinsic (UseSVE=0) Score -intrinsic (UseSVE=0) Improvement (UseSVE=0)
Double128Vector.ACOS 1024 thrpt 10 0.005 ops/ms 117.913 67.641 1.743 117.977 67.793 1.74
Double128Vector.ASIN 1024 thrpt 10 0.006 ops/ms 145.789 68.392 2.132 145.518 68.181 2.134
Double128Vector.ATAN 1024 thrpt 10 0.004 ops/ms 87.644 42.752 2.05 87.544 43.136 2.029
Double128Vector.ATAN2 1024 thrpt 10 0.003 ops/ms 60.414 26.235 2.303 60.182 26.313 2.287
Double128Vector.CBRT 1024 thrpt 10 0.001 ops/ms 52.679 30.617 1.721 52.657 30.69 1.716
Double128Vector.COS 1024 thrpt 10 0.004 ops/ms 71.501 47.165 1.516 71.612 47.114 1.52
Double128Vector.COSH 1024 thrpt 10 0.007 ops/ms 82.195 53.846 1.526 82.372 54.144 1.521
Double128Vector.EXP 1024 thrpt 10 0.012 ops/ms 216.471 58.192 3.72 217.261 58.271 3.728
Double128Vector.EXPM1 1024 thrpt 10 0.007 ops/ms 95.372 48.037 1.985 95.799 47.954 1.998
Double128Vector.HYPOT 1024 thrpt 10 0.002 ops/ms 88.137 37.331 2.361 87.856 37.307 2.355
Double128Vector.LOG 1024 thrpt 10 0.038 ops/ms 98.972 41.669 2.375 99.046 41.723 2.374
Double128Vector.LOG10 1024 thrpt 10 0.004 ops/ms 83.921 36.163 2.321 83.844 36.099 2.323
Double128Vector.LOG1P 1024 thrpt 10 0.006 ops/ms 86.526 36.291 2.384 86.592 36.148 2.395
Double128Vector.POW 1024 thrpt 10 0.001 ops/ms 34.439 21.817 1.579 34.373 21.618 1.59
Double128Vector.SIN 1024 thrpt 10 0.007 ops/ms 82.248 47.064 1.748 82.63 47.524 1.739
Double128Vector.SINH 1024 thrpt 10 0.005 ops/ms 80.27 53.565 1.499 80.404 53.438 1.505
Double128Vector.TAN 1024 thrpt 10 0.001 ops/ms 56.221 27.615 2.036 56.516 27.792 2.034
Double128Vector.TANH 1024 thrpt 10 0.011 ops/ms 64.979 83.143 0.782 65.652 82.771 0.793
Double256Vector.ACOS 1024 thrpt 10 0.455 ops/ms 179.103 112.49 1.592 87.833 88.651 0.991
Double256Vector.ASIN 1024 thrpt 10 0.691 ops/ms 212.368 112.884 1.881 88.369 88.365 1
Double256Vector.ATAN 1024 thrpt 10 0.008 ops/ms 120.882 55.861 2.164 49.106 48.979 1.003
Double256Vector.ATAN2 1024 thrpt 10 0.006 ops/ms 98.254 33.362 2.945 30.514 30.556 0.999
Double256Vector.CBRT 1024 thrpt 10 0.016 ops/ms 89.053 43.473 2.048 38.255 37.885 1.01
Double256Vector.COS 1024 thrpt 10 0.03 ops/ms 119.208 65.874 1.81 57.119 57.033 1.002
Double256Vector.COSH 1024 thrpt 10 0.01 ops/ms 124.26 76.188 1.631 63.477 63.002 1.008
Double256Vector.EXP 1024 thrpt 10 0.048 ops/ms 390.922 88.453 4.42 72.249 72.248 1
Double256Vector.EXPM1 1024 thrpt 10 0.017 ops/ms 121.844 66.475 1.833 57.431 57.36 1.001
Double256Vector.HYPOT 1024 thrpt 10 0.034 ops/ms 138.774 60.148 2.307 51.837 51.881 0.999
Double256Vector.LOG 1024 thrpt 10 0.073 ops/ms 165.474 55.445 2.984 48.7 48.571 1.003
Double256Vector.LOG10 1024 thrpt 10 0.015 ops/ms 144.862 44.937 3.224 40.579 40.624 0.999
Double256Vector.LOG1P 1024 thrpt 10 0.21 ops/ms 151.807 46.401 3.272 40.943 41.158 0.995
Double256Vector.POW 1024 thrpt 10 0.003 ops/ms 53.228 25.144 2.117 21.862 21.852 1
Double256Vector.SIN 1024 thrpt 10 0.007 ops/ms 130.875 65.753 1.99 57.42 57.172 1.004
Double256Vector.SINH 1024 thrpt 10 0.004 ops/ms 120.093 76.13 1.577 63.283 62.823 1.007
Double256Vector.TAN 1024 thrpt 10 0.073 ops/ms 79.318 33.242 2.386 30.463 30.322 1.005
Double256Vector.TANH 1024 thrpt 10 1.633 ops/ms 152.914 154.668 0.989 107.585 7.441 14.458
Double512Vector.ACOS 1024 thrpt 10 0.1 ops/ms 122.582 121.073 1.012 123.136 22.485 5.476
Double512Vector.ASIN 1024 thrpt 10 0.099 ops/ms 123.678 122.482 1.01 121.616 22.78 5.339
Double512Vector.ATAN 1024 thrpt 10 0.14 ops/ms 61.939 61.928 1 61.821 62.013 0.997
Double512Vector.ATAN2 1024 thrpt 10 0.014 ops/ms 38.638 38.541 1.003 38.668 38.697 0.999
Double512Vector.CBRT 1024 thrpt 10 0.024 ops/ms 49.685 49.667 1 49.674 49.634 1.001
Double512Vector.COS 1024 thrpt 10 0.046 ops/ms 74.125 73.99 1.002 74.462 72.102 1.033
Double512Vector.COSH 1024 thrpt 10 0.15 ops/ms 86.945 87.2 0.997 87.111 87.187 0.999
Double512Vector.EXP 1024 thrpt 10 0.507 ops/ms 100.955 101.43 0.995 101.213 1.336 75.758
Double512Vector.EXPM1 1024 thrpt 10 0.017 ops/ms 75.648 75.012 1.008 75.632 75.293 1.005
Double512Vector.HYPOT 1024 thrpt 10 0.3 ops/ms 72.42 72.487 0.999 72.457 72.277 1.002
Double512Vector.LOG 1024 thrpt 10 0.021 ops/ms 64.729 64.613 1.002 64.584 64.43 1.002
Double512Vector.LOG10 1024 thrpt 10 0.022 ops/ms 52.042 51.953 1.002 51.958 51.879 1.002
Double512Vector.LOG1P 1024 thrpt 10 0.103 ops/ms 52.239 52.169 1.001 52.161 52.176 1
Double512Vector.POW 1024 thrpt 10 0.008 ops/ms 25.488 25.473 1.001 25.462 25.461 1
Double512Vector.SIN 1024 thrpt 10 0.121 ops/ms 74.514 74.724 0.997 74.655 74.56 1.001
Double512Vector.SINH 1024 thrpt 10 0.216 ops/ms 86.568 86.488 1.001 86.673 86.855 0.998
Double512Vector.TAN 1024 thrpt 10 0.05 ops/ms 36.129 36.199 0.998 36.355 36.113 1.007
Double512Vector.TANH 1024 thrpt 10 0.125 ops/ms 172.425 171.657 1.004 171.701 71.727 2.394
Double64Vector.ACOS 1024 thrpt 10 0.125 ops/ms 29.916 30.242 0.989 30.232 30.135 1.003
Double64Vector.ASIN 1024 thrpt 10 0.008 ops/ms 30.677 30.58 1.003 30.396 30.524 0.996
Double64Vector.ATAN 1024 thrpt 10 0.038 ops/ms 19.561 19.526 1.002 19.446 19.456 0.999
Double64Vector.ATAN2 1024 thrpt 10 0.008 ops/ms 15.376 15.669 0.981 15.412 15.369 1.003
Double64Vector.CBRT 1024 thrpt 10 0.004 ops/ms 13.943 13.943 1 13.873 13.89 0.999
Double64Vector.COS 1024 thrpt 10 0.012 ops/ms 20.677 20.698 0.999 20.632 20.652 0.999
Double64Vector.COSH 1024 thrpt 10 0.036 ops/ms 22.949 23.116 0.993 23.163 23.241 0.997
Double64Vector.EXP 1024 thrpt 10 0.104 ops/ms 23.424 23.521 0.996 23.605 23.622 0.999
Double64Vector.EXPM1 1024 thrpt 10 0.157 ops/ms 22.301 22.353 0.998 21.973 22.166 0.991
Double64Vector.HYPOT 1024 thrpt 10 0.084 ops/ms 21.01 20.835 1.008 20.911 20.819 1.004
Double64Vector.LOG 1024 thrpt 10 0.041 ops/ms 18.265 18.291 0.999 18.192 18.21 0.999
Double64Vector.LOG10 1024 thrpt 10 0.003 ops/ms 16.502 16.441 1.004 16.393 16.433 0.998
Double64Vector.LOG1P 1024 thrpt 10 0.009 ops/ms 16.815 16.862 0.997 16.792 16.833 0.998
Double64Vector.POW 1024 thrpt 10 0.012 ops/ms 11.814 11.82 0.999 11.865 11.877 0.999
Double64Vector.SIN 1024 thrpt 10 0.005 ops/ms 20.557 20.605 0.998 20.57 20.26 1.015
Double64Vector.SINH 1024 thrpt 10 0.074 ops/ms 23.133 23.23 0.996 23.048 23.069 0.999
Double64Vector.TAN 1024 thrpt 10 0.009 ops/ms 14.504 14.553 0.997 14.456 14.518 0.996
Double64Vector.TANH 1024 thrpt 10 0.12 ops/ms 31.304 31.226 1.002 31.4 31.267 1.004
DoubleMaxVector.ACOS 1024 thrpt 10 0.146 ops/ms 179.388 112.342 1.597 118.005 67.768 1.741
DoubleMaxVector.ASIN 1024 thrpt 10 0.169 ops/ms 212.342 114.107 1.861 145.676 68.143 2.138
DoubleMaxVector.ATAN 1024 thrpt 10 0.011 ops/ms 120.925 55.823 2.166 86.676 43.156 2.008
DoubleMaxVector.ATAN2 1024 thrpt 10 0.006 ops/ms 98.345 33.604 2.927 60.45 26.383 2.291
DoubleMaxVector.CBRT 1024 thrpt 10 0.006 ops/ms 88.947 43.447 2.047 52.648 30.665 1.717
DoubleMaxVector.COS 1024 thrpt 10 0.023 ops/ms 119.164 65.718 1.813 71.619 47.145 1.519
DoubleMaxVector.COSH 1024 thrpt 10 0.005 ops/ms 124.342 75.967 1.637 82.447 54.084 1.524
DoubleMaxVector.EXP 1024 thrpt 10 0.042 ops/ms 390.767 87.918 4.445 216.207 58.342 3.706
DoubleMaxVector.EXPM1 1024 thrpt 10 0.018 ops/ms 121.79 66.387 1.835 95.935 48.204 1.99
DoubleMaxVector.HYPOT 1024 thrpt 10 0.011 ops/ms 138.549 61.183 2.265 87.859 37.39 2.35
DoubleMaxVector.LOG 1024 thrpt 10 0.034 ops/ms 164.687 55.44 2.971 98.446 41.873 2.351
DoubleMaxVector.LOG10 1024 thrpt 10 0.026 ops/ms 144.388 44.94 3.213 84.062 36.252 2.319
DoubleMaxVector.LOG1P 1024 thrpt 10 0.218 ops/ms 151.047 46.394 3.256 86.671 36.248 2.391
DoubleMaxVector.POW 1024 thrpt 10 0.004 ops/ms 53.241 25.251 2.108 34.371 21.58 1.593
DoubleMaxVector.SIN 1024 thrpt 10 0.003 ops/ms 130.708 65.451 1.997 83.012 47.547 1.746
DoubleMaxVector.SINH 1024 thrpt 10 0.007 ops/ms 120.654 75.693 1.594 80.603 53.586 1.504
DoubleMaxVector.TAN 1024 thrpt 10 0.062 ops/ms 80.045 33.268 2.406 56.48 27.723 2.037
DoubleMaxVector.TANH 1024 thrpt 10 0.99 ops/ms 154.334 153.197 1.007 65.401 82.937 0.789
DoubleScalar.ACOS 1024 thrpt 10 0.06 ops/ms 342.452 342.471 1 342.471 42.461 8.066
DoubleScalar.ASIN 1024 thrpt 10 0.09 ops/ms 353.739 354.47 0.998 352.211 54.513 6.461
DoubleScalar.ATAN 1024 thrpt 10 0.043 ops/ms 100.797 101.069 0.997 101.089 1.086 93.084
DoubleScalar.ATAN2 1024 thrpt 10 0.025 ops/ms 62.29 62.283 1 62.218 62.227 1
DoubleScalar.CBRT 1024 thrpt 10 0.014 ops/ms 73.922 73.929 1 73.906 73.916 1
DoubleScalar.COS 1024 thrpt 10 0.204 ops/ms 117.948 117.806 1.001 117.856 17.763 6.635
DoubleScalar.COSH 1024 thrpt 10 0.016 ops/ms 141.113 141.083 1 141.749 40.659 3.486
DoubleScalar.EXP 1024 thrpt 10 0.008 ops/ms 189.453 188.923 1.003 189.555 89.348 2.122
DoubleScalar.EXPM1 1024 thrpt 10 0.051 ops/ms 133.617 133.549 1.001 133.224 33.61 3.964
DoubleScalar.HYPOT 1024 thrpt 10 3.613 ops/ms 180.215 175.912 1.024 176.083 81.916 2.15
DoubleScalar.LOG 1024 thrpt 10 0.013 ops/ms 101.791 101.801 1 101.779 1.786 56.987
DoubleScalar.LOG10 1024 thrpt 10 0.099 ops/ms 76.849 76.847 1 76.807 76.757 1.001
DoubleScalar.LOG1P 1024 thrpt 10 0.081 ops/ms 79.261 79.298 1 79.268 79.281 1
DoubleScalar.POW 1024 thrpt 10 0.002 ops/ms 31.915 31.925 1 31.919 31.92 1
DoubleScalar.SIN 1024 thrpt 10 0.167 ops/ms 118.087 117.722 1.003 118.292 18.243 6.484
DoubleScalar.SINH 1024 thrpt 10 0.012 ops/ms 143.901 143.803 1.001 144.228 43.922 3.284
DoubleScalar.TAN 1024 thrpt 10 0.047 ops/ms 46.513 46.584 0.998 46.503 46.778 0.994
DoubleScalar.TANH 1024 thrpt 10 0.204 ops/ms 552.603 561.965 0.983 561.941 61.802 9.093

Backup of previous test summary

NOTE:

  • Src means implementation in this pr, i.e. without depenency on external sleef.
  • Disabled means disable intrinsics by -XX:-UseVectorStubs
  • system_sleef means implementation in previous pr 18294, i.e. build and run jdk with depenency on external sleef.

Basically, the perf data below shows that

  • this implementation has better performance than previous version in pr 18294,
  • and both sleef versions has much better performance compared with non-sleef version.

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF (Enhancement - P4)

Contributors

  • Xiaohong Gong <xgong@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18605/head:pull/18605
$ git checkout pull/18605

Update a local copy of the PR:
$ git checkout pull/18605
$ git pull https://git.openjdk.org/jdk.git pull/18605/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18605

View PR using the GUI difftool:
$ git pr show -t 18605

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18605.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 3, 2024

👋 Welcome back mli! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 3, 2024

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Apr 3, 2024

@Hamlin-Li The following labels will be automatically applied to this pull request:

  • build
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added build build-dev@openjdk.org hotspot hotspot-dev@openjdk.org labels Apr 3, 2024
@Hamlin-Li Hamlin-Li marked this pull request as ready for review April 3, 2024 14:44
@luhenry
Copy link
Member

luhenry commented Jul 15, 2024

I can't tell what problem we're trying to solve by not simply checking in the source code, in its preferred form, to the OpenJDK tree. Thhis has practical advantages to do with traceability and security, and in-principle reasons to do with basic Open Source practice too. On the other side, there are no disadvantages.

Do you suggest to copy the whole sleef source repo into jdk?

I think so, along with scripting that generates the preprocessed file we use. It might be the case that there are some sleef files not used at all they could be omitted, but I'm not sure it would be useful, and from a traceability point of view it's probably best to grab it all, unless it's really huge

Given the Sleef build system currently uses cmake, we would have two choices to build the header files as part of the OpenJDK build system:

  1. take a dependency on cmake in order to build the Sleef headers
  2. write a custom build system for Sleef to integrate into OpenJDK

Neither approach sound good to me as a mandatory option.

However, if we are to allow the person building OpenJDK to optionally generate the headers from a Sleef source checkout (provided by the user with a --with-sleef-src=/path/to/sleef), we can then more easily take the assumption that the user has installed the necessary dependencies. That would also be in line with how binutils is being built and integrated.

@vidmik
Copy link
Contributor

vidmik commented Jul 15, 2024

If we want the traceability (which I agree is good) of the SLEEF source code but want to avoid having it in the jdk repo itself (adding unnecessary "bloat" for everybody), perhaps we can consider having it in a separate repository somewhere in/under openjdk?

It's not immediately clear to me that we need to have support in the JDK build system (configure/make) itself for building/updating the header files, as long as there's a simple, documented way of doing so. I like to think the createSleef.sh script is that, but I recognize that I'm biased because I wrote it.

@Hamlin-Li
Copy link
Author

Hamlin-Li commented Jul 15, 2024

I think so, along with scripting that generates the preprocessed file we use. It might be the case that there are some sleef files not used at all they could be omitted, but I'm not sure it would be useful, and from a traceability point of view it's probably best to grab it all, unless it's really huge

Currently,

With these 2 points, seems the traceability is fine to me, please kindly point out if I missed some points. Maybe we can add some more clear and specific information in README or createSleef.sh in #19185 to indicate which version of sleef source we're using in jdk.

I'm also fine with your suggestion to add whole sleef repo into jdk (maybe we can remove some of files, but we can ignore the difference temporarily in the dicussion here). To copy the sleef repo into jdk, we still need to pre-generate the inline header files, and check them in jdk along with the sleef repo, I think you also think so too (As without checking in these inline headers, we will have to bring some extra dependencies into jdk, and increase extra compilation time when building jdk). But from traceability point of view, seems to me it does not bring extra benefit than current #19185. For example, if someone want to verify the pre-generate inline headers in jdk, he need to first verify the sleef source in jdk, then the pre-generated sleef inline headers.

How do you think about it?

@vidmik
Copy link
Contributor

vidmik commented Jul 15, 2024

I think the key question is whether we're comfortable relying on/pointing at an external repository which may or may not be there tomorrow and/or where tags may change outside of our control.

The SLEEF source code looks to be around 7.5MB, give or take. That's not enormous, but it's not exactly small when keeping in mind that if we #include it in the jdk repo it's going to be there for every cloned repo in every project/branch and very few will actually care about it. I agree that we'd still have to include the pre-generated header files.

Hence my suggestion to consider putting it under our control, but in a separate openjdk controlled repository.

@theRealAph
Copy link
Contributor

Given the Sleef build system currently uses cmake, we would have two choices to build the header files as part of the OpenJDK build system

I don't think that anyone is proposing to do that, so we can discount it altogether.

However, if we are to allow the person building OpenJDK to optionally generate the headers from a Sleef source checkout (provided by the user with a --with-sleef-src=/path/to/sleef), we can then more easily take the assumption that the user has installed the necessary dependencies. That would also be in line with how binutils is being built and integrated.

Mmm, but we don't need to do that.

@theRealAph
Copy link
Contributor

I think the key question is whether we're comfortable relying on/pointing at an external repository which may or may not be there tomorrow and/or where tags may change outside of our control.

Right. We should adopt best practice, both from an Open Source compliance point of view and (from a security, traceability, and binary reproduceability point of view) with regard to the xz backdoor hack.

The SLEEF source code looks to be around 7.5MB, give or take. That's not enormous, but it's not exactly small when keeping in mind that if we #include it in the jdk repo it's going to be there for every cloned repo in every project/branch and very few will actually care about it. I agree that we'd still have to include the pre-generated header files.

Hence my suggestion to consider putting it under our control, but in a separate openjdk controlled repository.

That ticks many of the boxes, as long as we can be sure to tag everything. But from a space point of view I'm not sure it's compelling. After all, we've recently decided to use branches rather than separate repos for releases, which is a good idea because it keeps everything together, but it does increase the repo size for everyone.

It would be very nice if Git allowed a subset of the repo to be checked out, but as far as I can see it doesn't.

Before checkout, the OpenJDK repo is 1.4G. After checkout that's 2.1G. So, about 0.7G of that is the JDK source code, if you include the file system overhead.

7.5Mb doesn't sound excessive when you consider that SLEEF potentially provides vectorized routines for many OpenJDK targets. It's not just about AArch64.

This is starting to sound like we need a policy decision, because we don't want to re-hash this discussion every time the question comes up, as it surely will. For me, that supplying preprocessed source code without real source is known bad practice, even to the extent of being expressly forbidden in the open source definition, is a slam-dunk argument. But clearly that argument doesn't work for everyone. Maybe something to be discussed at the workshop?

@theRealAph
Copy link
Contributor

Currently,

* in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185) it generates the sleef inline headers from sleef 3.6.1, which is tagged in sleef repo.

* And with the script in [8329816: Add SLEEF version 3.6.1 #19185](https://github.com/openjdk/jdk/pull/19185), anyone with access to sleef repo can re-generate these inline headers by himself

Right, but think about package builders. This isn't about J Random Hacker doing it by hand.

When a package gets built, the builder machine unpacks source code. If SLEEF is included as part of JDK source, all the builder has to do is run the script and overwrite whatever preprocessed source is in there. The alternative is packaging the SLEEF source code tarball separately in the OpenJDK source package. Sure, all of this can be done, but it's a question of whether we do it once, here, now, or all the downstream builders have to do it themselves.

( in fact anyone can generate the inline headers from sleef from scratch without using scripts in 8329816: Add SLEEF version 3.6.1 #19185, our script just make it easy for the future maintenance), so it's easy for anyone to verify these inline header files used in jdk.

That script must be checked in to the OpenJDK tree.

With these 2 points, seems the traceability is fine to me, please kindly point out if I missed some points. Maybe we can add some more clear and specific information in README or createSleef.sh in #19185 to indicate which version of sleef source we're using in jdk.

I'm also fine with your suggestion to add whole sleef repo into jdk (maybe we can remove some of files, but we can ignore the difference temporarily in the dicussion here). To copy the sleef repo into jdk, we still need to pre-generate the inline header files, and check them in jdk along with the sleef repo, I think you also think so too

Yes.

(As without checking in these inline headers, we will have to bring some extra dependencies into jdk, and increase extra compilation time when building jdk). But from traceability point of view, seems to me it does not bring extra benefit than current #19185. For example, if someone want to verify the pre-generate inline headers in jdk, he need to first verify the sleef source in jdk, then the pre-generated sleef inline headers.

You don't need to verify the pre-generated inline headers, just overwrite them. The point is that the sleef source is digitally signed, not just by the SLEEF maintainers, but by OpenJDK as well. This is not a small thing.

@Hamlin-Li
Copy link
Author

@theRealAph Thanks for clarification.

I think there are several different parts involved in the above discussion, please kindly correct me if I misunderstood.

  1. package builders. This is about the release of jdk (both src and binary), by either openjdk, adoptium, or any other downstream vendors.
  2. jdk daily development. This is about to modify, build, run/test jdk daily by jdk developers.

For the package builders, original sleef source is necessary; for the jdk daily development, only pre-generated sleef inline headers are necessary. The script to pre-generate sleef inline headers is only triggerred by package builders (and I think it involves some scripts which are not part of jdk source ? e.g. the script to trigger pre-generating script), but for jdk daily development, we just need pre-generated sleef inline headers.
Am I understanding correctly above?

@theRealAph
Copy link
Contributor

@theRealAph Thanks for clarification.

I think there are several different parts involved in the above discussion, please kindly correct me if I misunderstood.

1. package builders. This is about the release of jdk (both src and binary), by either openjdk, adoptium, or any other downstream vendors.

2. jdk daily development. This is about to modify, build, run/test jdk daily by jdk developers.

For the package builders, original sleef source is

may be

necessary; for the jdk daily development, only pre-generated sleef inline headers are necessary.

Yes, most of the time. Some devs will want to be more thorough.

The script to pre-generate sleef inline headers is only triggerred by package builders (and I think it involves some scripts which are not part of jdk source ? e.g. the script to trigger pre-generating script),

No: all of the scripts to generate the preprocessed source from the SLEEF source must in the OpenJDK source.

but for jdk daily development, we just need pre-generated sleef inline headers. Am I understanding correctly above?

Yes, most of the time.

Bear in mind that convenient daily development of OpenJDK is important, because we don't want to discourage developers. But we've never treated the size of the repo as one of our primary considerations.

@adinn
Copy link
Contributor

adinn commented Jul 16, 2024

Obviously we need to include pre-generated sources in the repo so that most people can just build the library using sanctioned code without needing to regenerate anything.

I absolutely agree with @theRealAph that we need to have all relevant SLEEF header build scripts in the OpenJDK repo so that anyone who want to rebuild the headers can do so. I don't believe it is just packagers who will want to do that and it is good open source practice to allow and, where possible, make it easy for anyone to do so.

Given the size of the original SLEEF sources I also agree with @theRealAph it is no great burden to include them in the jdk repo. However, I am not averse to @vidmik's alternative of putting the sources in an openjdk/sleef repo. That would be fine so long as the openjdk repo includes SLEEF build scripts that pull a determinate hash to generate the headers.

Likewise I agree with @vidmik's suggestion of omitting the extra packages the SLEEF generate step requires from the standard configure/make scripts would be fine so long as the SLEEF build scripts prompt users on what to install. We don't want to force everyone to install packages that they don't need. But we do still need to make it straightforward for those who do want to regenerate the sources to achieve that goal.

@sxa
Copy link
Contributor

sxa commented Jul 16, 2024

This is starting to sound like we need a policy decision, because we don't want to re-hash this discussion every time the question comes up, as it surely will.

+1 to this if we don't already have one

While I haven't read through every comment in this thread in this specific case I generally agree with what @theRealAph has said in some of his earlier comments. My primary concern is that the generated code in there is currently effectively unreviewable in terms of checking for potential vulnerabilities so I also feel it's best to check in the whole (reviewable) source if this PR is to be accepted. Much as I dislike repository bloat I think it's a fairly easy decision in this case IMHO with SLEEF being 7.5MB in size when the openjdk codebase is so large.

An alternative "absolute minimum" would be to reference the GitHub SHA of the SLEEF source and include the process for regenerating it reproducibly so that this information is available to anyone who wanted to verify it. With my distributor (Temurin) hat on either of those solutions would mean we have the original source referenced for inclusion in the product SBOM to track the supply chain. I'll also note that I'm also making an assumption here that the generated code from SLEEF is reproducible and not sensitive to the build environment like the CDS archives - I have not tried building them myself to verify but I feel that is important to understand before merging the generated code.

As a project should also consider whole issue of ensuring that we have sufficient trust from a supply-chain perspective on the SLEEF source ... I have no specific reason to distrust it but it might be good to understand how well reviewed it is before doing this as it's not a project I'm personally familiar with.

On a slightly separate note (and I see @luhenry is in this comment thread too and has contributed to SLEEF) it will be good if this can be used to enhance the performance on RISC-V too in the future ;-)

@Hamlin-Li
Copy link
Author

@theRealAph I see, I think now I understand the whole picture of your concerns. Thanks!

I think the key question is whether we're comfortable relying on/pointing at an external repository which may or may not be there tomorrow and/or where tags may change outside of our control.
The SLEEF source code looks to be around 7.5MB, give or take. That's not enormous, but it's not exactly small when keeping in mind that if we #include it in the jdk repo it's going to be there for every cloned repo in every project/branch and very few will actually care about it. I agree that we'd still have to include the pre-generated header files.
Hence my suggestion to consider putting it under our control, but in a separate openjdk controlled repository.

Based on @vidmik 's previous comments, I think we all agree original sleef source should be added into jdk, including pre-generated sleef inline headers, the only different opinions between us are about how to include sleef source into jdk, one is to just add it into jdk repo itself, another is to put it in another repo which is under control of jdk. Please kindly correct me if I misunderstood.

I have not particular preference which options to take. My only concern is how long it will take to make that decision. If it could take rather long time, can we take several incremental steps to achieve the final goal? e.g.

  1. add pre-generated sleef inline headers into jdk, which is done by 8329816: Add SLEEF version 3.6.1 #19185
  2. support vector math in jdk, which is done by this pr.
  3. add sleef source into either jdk repo itself or another repo under control of jdk.

I think we have plenty time to achieve the final goal in jdk-24.

How do you think about it? @theRealAph @vidmik @luhenry @magicus @erikj79

@Hamlin-Li
Copy link
Author

On a slightly separate note (and I see @luhenry is in this comment thread too and has contributed to SLEEF) it will be good if this can be used to enhance the performance on RISC-V too in the future ;-)

We already had a prototype which depends on this pr, and the performance gain is promising.

@theRealAph
Copy link
Contributor

@theRealAph Thanks for clarification.

I think there are several different parts involved in the above discussion, please kindly correct me if I misunderstood.

1. package builders. This is about the release of jdk (both src and binary), by either openjdk, adoptium, or any other downstream vendors.

2. jdk daily development. This is about to modify, build, run/test jdk daily by jdk developers.

For the package builders, original sleef source is

may be

necessary; for the jdk daily development, only pre-generated sleef inline headers are necessary.

Yes, most of the time. Some devs will want to be more thorough.

The script to pre-generate sleef inline headers is only triggerred by package builders (and I think it involves some scripts which are not part of jdk source ? e.g. the script to trigger pre-generating script),

No: all of the scripts to generate the preprocessed source from the SLEEF source must in the OpenJDK source.

but for jdk daily development, we just need pre-generated sleef inline headers. Am I understanding correctly above?

Yes, most of the time.

Bear in mind that convenient daily development of OpenJDK is important, because we don't want to discourage developers. But we've never treated the size of the repo as one of our primary considerations.

I have not particular preference which options to take. My only concern is how long it will take to make that decision. If it could take rather long time, can we take several incremental steps to achieve the final goal? e.g.

We're only a couple of weeks away from the summit. What would be a long time?

@Hamlin-Li
Copy link
Author

We're only a couple of weeks away from the summit. What would be a long time?

OK, then let's wait for it.

@fitzsim
Copy link

fitzsim commented Jul 18, 2024

It is possible to regenerate sleefinline_advsimd.h and sleefinline_sve.h with some new OpenJDK build logic and only the following fifteen SLEEF source files:

32K	./src/jdk.incubator.vector/linux/native/sleef/src/arch/helperadvsimd.h
40K	./src/jdk.incubator.vector/linux/native/sleef/src/arch/helpersve.h
8.0K	./src/jdk.incubator.vector/linux/native/sleef/src/common/addSuffix.c
20K	./src/jdk.incubator.vector/linux/native/sleef/src/common/commonfuncs.h
16K	./src/jdk.incubator.vector/linux/native/sleef/src/common/dd.h
20K	./src/jdk.incubator.vector/linux/native/sleef/src/common/df.h
4.0K	./src/jdk.incubator.vector/linux/native/sleef/src/common/estrin.h
12K	./src/jdk.incubator.vector/linux/native/sleef/src/common/keywords.txt
12K	./src/jdk.incubator.vector/linux/native/sleef/src/common/misc.h
4.0K	./src/jdk.incubator.vector/linux/native/sleef/src/common/quaddef.h
4.0K	./src/jdk.incubator.vector/linux/native/sleef/src/libm/funcproto.h
20K	./src/jdk.incubator.vector/linux/native/sleef/src/libm/mkrename.c
116K	./src/jdk.incubator.vector/linux/native/sleef/src/libm/sleefinline_header.h.org
164K	./src/jdk.incubator.vector/linux/native/sleef/src/libm/sleefsimddp.c
152K	./src/jdk.incubator.vector/linux/native/sleef/src/libm/sleefsimdsp.c
624K	total

I was able to extract the shell and C preprocessing steps from the upstream CMake-based build system (by adding --verbose to cmake --build in createSleef.sh) and convert them into an OpenJDK .gmk file.

This branch shows various approaches; ideas include:

  • the fifteen source files are checked directly into the OpenJDK repository
  • a --regenerate-sleef-headers configure option that will cause the headers to be rebuilt as their dependencies change
  • a make regenerate-sleef-headers phony target that unconditionally rebuilds the headers
  • cross-compilation support when --openjdk-target=aarch64-linux-gnu is specified on an x86-64 build machine
  • a README section with hints on how to maintain the OpenJDK build rules

Whenever the OpenJDK SLEEF source code copies were updated, one would also check for changes in the upstream CMake steps.

@Hamlin-Li
Copy link
Author

It is possible to regenerate sleefinline_advsimd.h and sleefinline_sve.h with some new OpenJDK build logic and only the following fifteen SLEEF source files:

32K	./src/jdk.incubator.vector/linux/native/sleef/src/arch/helperadvsimd.h
40K	./src/jdk.incubator.vector/linux/native/sleef/src/arch/helpersve.h
8.0K	./src/jdk.incubator.vector/linux/native/sleef/src/common/addSuffix.c
20K	./src/jdk.incubator.vector/linux/native/sleef/src/common/commonfuncs.h
16K	./src/jdk.incubator.vector/linux/native/sleef/src/common/dd.h
20K	./src/jdk.incubator.vector/linux/native/sleef/src/common/df.h
4.0K	./src/jdk.incubator.vector/linux/native/sleef/src/common/estrin.h
12K	./src/jdk.incubator.vector/linux/native/sleef/src/common/keywords.txt
12K	./src/jdk.incubator.vector/linux/native/sleef/src/common/misc.h
4.0K	./src/jdk.incubator.vector/linux/native/sleef/src/common/quaddef.h
4.0K	./src/jdk.incubator.vector/linux/native/sleef/src/libm/funcproto.h
20K	./src/jdk.incubator.vector/linux/native/sleef/src/libm/mkrename.c
116K	./src/jdk.incubator.vector/linux/native/sleef/src/libm/sleefinline_header.h.org
164K	./src/jdk.incubator.vector/linux/native/sleef/src/libm/sleefsimddp.c
152K	./src/jdk.incubator.vector/linux/native/sleef/src/libm/sleefsimdsp.c
624K	total

I was able to extract the shell and C preprocessing steps from the upstream CMake-based build system (by adding --verbose to cmake --build in createSleef.sh) and convert them into an OpenJDK .gmk file.

This branch shows various approaches; ideas include:

  • the fifteen source files are checked directly into the OpenJDK repository
  • a --regenerate-sleef-headers configure option that will cause the headers to be rebuilt as their dependencies change
  • a make regenerate-sleef-headers phony target that unconditionally rebuilds the headers
  • cross-compilation support when --openjdk-target=aarch64-linux-gnu is specified on an x86-64 build machine
  • a README section with hints on how to maintain the OpenJDK build rules

Really nice work, Thanks!

Whenever the OpenJDK SLEEF source code copies were updated, one would also check for changes in the upstream CMake steps.

Compared to current implementation in #19185, my bit concern about This branch is the future maintainence effort when we need to update the sleef source along with the cmake changes, also when new platforms support of sleef are added in jdk.

In another hand, I'm not sure if This branch qualify the traceability requirement discussed above.

@theRealAph
Copy link
Contributor

Compared to current implementation in #19185, my bit concern about This branch is the future maintainence effort when we need to update the sleef source along with the cmake changes, also when new platforms support of sleef are added in jdk.

That's a fair point. However, it's probably less work than any adequate alternative proposed thus far.

In another hand, I'm not sure if This branch qualify the traceability requirement discussed above.

I'm sure it's fine: we have readable source code in the preferred form, along with a script that generates it from the corresponding SLEEF release.

@fitzsim
Copy link

fitzsim commented Jul 23, 2024

Compared to current implementation in #19185, my bit concern about This branch is the future maintainence effort when we need to update the sleef source along with the cmake changes, also when new platforms support of sleef are added in jdk.

To check this, I added the riscv64 CMake steps to SleefCommon.gmk.

I had intended to factor out SetupSleefHeader anyway for aarch64, to eliminate copy-n-paste.

After that, there was one build step divergence for riscv64 for the naming of the helper header.

The two riscv64 commits are:

@Hamlin-Li
Copy link
Author

To check this, I added the riscv64 CMake steps to SleefCommon.gmk.

I had intended to factor out SetupSleefHeader anyway for aarch64, to eliminate copy-n-paste.

After that, there was one build step divergence for riscv64 for the naming of the helper header.

The two riscv64 commits are:

Thanks for your effort, this is much better.

Just one question in my mind. If there is no major refactoring in sleef in the future, I think we're fine. In case there is such refactoring in sleef's implementation, the maintanance will not be a minor work, as in This branch we need to migrate some process inside sleef into jdk?
But I'm not sure, maybe others can comment on this question.

And I think we can move the discussion about This branch to #19185, as finally this part of code will be pushed into jdk via that pr (because of legal process reason), I hope persons involved in that pr do not miss the discussion and information here.

@bridgekeeper
Copy link

bridgekeeper bot commented Aug 21, 2024

@Hamlin-Li This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@Hamlin-Li Hamlin-Li marked this pull request as draft August 21, 2024 14:50
@Hamlin-Li
Copy link
Author

depending on #19185 which is still in progress..., so change this as a draft

@openjdk openjdk bot removed the rfr Pull request is ready for review label Aug 21, 2024
@Hamlin-Li Hamlin-Li mentioned this pull request Aug 30, 2024
3 tasks
@magicus
Copy link
Member

magicus commented Sep 17, 2024

Now JDK-8329816 is finally integrated, which means the libsleef source code is present and available in the JDK repo.

I would recommend that you close this PR, and start over once again (third time's a charm!), with just the changes needed to get the libsleef conection into Hotspot. This PR, just like the other ones, have been so cluttered that it is hard to understand what and how to review.

@Hamlin-Li
Copy link
Author

@magicus Thank you for the effort!
I think your suggestion makes sense, I will start it in a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build-dev@openjdk.org hotspot hotspot-dev@openjdk.org
Development

Successfully merging this pull request may close these issues.