Skip to content

Conversation

@XiaohongGong
Copy link

@XiaohongGong XiaohongGong commented Apr 16, 2025

Summary:

JDK-8318650 added the hotspot intrinsifying of subword gather load APIs for X86 platforms [1]. This patch aims at implementing the equivalent functionality for AArch64 SVE platform. In addition to the AArch64 backend support, this patch also refactors the API implementation in Java side and the compiler mid-end part to make the operations more efficient and maintainable across different architectures.

Background:

Vector gather load APIs load values from memory addresses calculated by adding a base pointer to integer indices stored in an int array. SVE provides native vector gather load instructions for byte/short types using an int vector saving indices (see [2][3]).

The number of loaded elements must match the index vector's element count. Since int elements are 4/2 times larger than byte/short elements, and given MaxVectorSize constraints, the operation may need to be splitted into multiple parts.

Using a 128-bit byte vector gather load as an example, there are four scenarios with different MaxVectorSize:

  1. MaxVectorSize = 16, byte_vector_size = 16:

    • Can load 4 indices per vector register
    • So can finish 4 bytes per gather-load operation
    • Requires 4 times of gather-loads and final merge
      Example:
    byte[] arr = [a, b, c, d, e, f, g, h, i, g, k, l, m, n, o, p, ...]
    int[] idx = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9]
    
    4 gather-load:
    idx_v1 = [1 4 2 3]    gather_v1 = [0000 0000 0000 becd]
    idx_v2 = [2 5 7 5]    gather_v2 = [0000 0000 0000 cfhf]
    idx_v3 = [1 7 6 0]    gather_v3 = [0000 0000 0000 bhga]
    idx_v4 = [9 11 10 15] gather_v4 = [0000 0000 0000 jlkp]
    merge: v = [jlkp bhga cfhf becd]
    
  2. MaxVectorSize = 32, byte_vector_size = MaxVectorSize / 2:

    • Can load 8 indices per vector register
    • So can finish 8 bytes per gather-load operation
    • Requires 2 times of gather-loads and merge
      Example:
    byte[] arr = [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, ...]
    int[] index = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9]
    
    2 gather-load:
    idx_v1 = [2 5 7 5 1 4 2 3]
    idx_v2 = [9 11 10 15 1 7 6 0]
    gather_v1 = [0000 0000 0000 0000 0000 0000 cfhf becd]
    gather_v2 = [0000 0000 0000 0000 0000 0000 jlkp bhga]
    merge: v = [0000 0000 0000 0000 jlkp bhga cfhf becd]
    
  3. MaxVectorSize = 64, byte_vector_size = MaxVectorSize / 4:

    • Can load 16 indices per vector register
    • So can finish 16 bytes per gather-load operation
    • No splitting required
      Example:
    byte[] arr = [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, ...]
    int[] index = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9]
    
    1 gather-load:
    idx_v = [9 11 10 15 1 7 6 0 2 5 7 5 1 4 2 3]
    v = [... 0000 0000 0000 0000 jlkp bhga cfhf becd]
    
  4. MaxVectorSize > 64, byte_vector_size < MaxVectorSize / 4:

    • Can load 32+ indices per vector register
    • So can finish 16 bytes per gather-load operation
    • Requires masking to allow loading 16 active elements to keep safe
      memory access.
      Example:
    byte[] arr = [a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, ...]
    int[] index = [3, 2, 4, 1, 5, 7, 5, 2, 0, 6, 7, 1, 15, 10, 11, 9]
    
    1 gather-load:
    idx_v = [... 0 0 0 0 0 0 0 0 9 11 10 15 1 7 6 0 2 5 7 5 1 4 2 3]
    v = [... 0000 0000 0000 0000 0000 jlkp bhga cfhf becd]
    

Main changes:

  1. Java-side API refactoring:
    • Potential multiple index vectors have been generated for index checking in java-side. This patch passes all the generated index vectors to hotspot to eliminate the duplicate index vectors used for the vector gather load operations on architectures like AArch64. Existing IGVN cannot work due to the different control flow of the index vectors generated in java-side and compiler intrinsifying.
  2. C2 compiler IR refactoring:
    • Generate different IR patterns for different architectures like AArch64 and X86, based on the different index requirements.
    • Added two new IRs in C2 compiler to help implement each part of vector gather operation and merge the results at last.
    • Refactored the LoadVectorGather/LoadVectorGatherMasked IR for subword types. This patch removes the memory offset input and add it to the memory base addr in IR level for architectures that need the index array like X86. This not only simplifies the backend implementation, but also saves some add operations. Additionally, it unifies the IR for all types.
  3. Backend changes:
    • Added SVE match rules for subword gather load operations and the new added IRs.
    • Refined the X86 implementation of subword gather since the offset input has been removed from the IR level.
  4. Test:
    • Added IR tests for verification.

Testing:

  • Passed hotspot::tier1/2/3, jdk::tier1/2/3 tests
  • Passed vector api tests with all UseAVX flags on X86 and UseSVE flags on AArch64
  • No regressions found

Performance:

The performance of corresponding JMH benchmarks improve 3-11x on an NVIDIA GRACE CPU, which is a 128-bit SVE2 architecture. Following is the performance data:

Benchmark                                                (SIZE)   Mode Cnt  Units    Before     After    Gain
GatherOperationsBenchmark.microByteGather128                 64  thrpt  30  ops/ms  13447.414 43184.611  3.21
GatherOperationsBenchmark.microByteGather128                256  thrpt  30  ops/ms   3361.944 11165.006  3.32
GatherOperationsBenchmark.microByteGather128               1024  thrpt  30  ops/ms    843.501  2830.108  3.35
GatherOperationsBenchmark.microByteGather128               4096  thrpt  30  ops/ms    211.096   712.958  3.37
GatherOperationsBenchmark.microByteGather128_MASK            64  thrpt  30  ops/ms  10627.297 42818.402  4.02
GatherOperationsBenchmark.microByteGather128_MASK           256  thrpt  30  ops/ms   2675.144 11055.874  4.13
GatherOperationsBenchmark.microByteGather128_MASK          1024  thrpt  30  ops/ms    677.742  2783.920  4.10
GatherOperationsBenchmark.microByteGather128_MASK          4096  thrpt  30  ops/ms    169.416   686.783  4.05
GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF     64  thrpt  30  ops/ms  10592.545 42282.802  3.99
GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF    256  thrpt  30  ops/ms   2680.060 11039.563  4.11
GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF   1024  thrpt  30  ops/ms    678.941  2790.252  4.10
GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF   4096  thrpt  30  ops/ms    169.985   691.157  4.06
GatherOperationsBenchmark.microByteGather128_NZ_OFF          64  thrpt  30  ops/ms  13538.308 42954.988  3.17
GatherOperationsBenchmark.microByteGather128_NZ_OFF         256  thrpt  30  ops/ms   3414.237 11227.333  3.28
GatherOperationsBenchmark.microByteGather128_NZ_OFF        1024  thrpt  30  ops/ms    850.098  2821.821  3.31
GatherOperationsBenchmark.microByteGather128_NZ_OFF        4096  thrpt  30  ops/ms    213.295   705.015  3.30
GatherOperationsBenchmark.microByteGather64                  64  thrpt  30  ops/ms   8705.935 44213.982  5.07
GatherOperationsBenchmark.microByteGather64                 256  thrpt  30  ops/ms   2186.620 11407.364  5.21
GatherOperationsBenchmark.microByteGather64                1024  thrpt  30  ops/ms    545.364  2845.370  5.21
GatherOperationsBenchmark.microByteGather64                4096  thrpt  30  ops/ms    136.376   718.532  5.26
GatherOperationsBenchmark.microByteGather64_MASK             64  thrpt  30  ops/ms   6530.636 42053.044  6.43
GatherOperationsBenchmark.microByteGather64_MASK            256  thrpt  30  ops/ms   1644.069 11323.223  6.88
GatherOperationsBenchmark.microByteGather64_MASK           1024  thrpt  30  ops/ms    416.093  2844.712  6.83
GatherOperationsBenchmark.microByteGather64_MASK           4096  thrpt  30  ops/ms    105.777   716.685  6.77
GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF      64  thrpt  30  ops/ms   6619.260 42204.919  6.37
GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF     256  thrpt  30  ops/ms   1668.304 11318.298  6.78
GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF    1024  thrpt  30  ops/ms    422.085  2844.398  6.73
GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF    4096  thrpt  30  ops/ms    105.722   716.543  6.77
GatherOperationsBenchmark.microByteGather64_NZ_OFF           64  thrpt  30  ops/ms   8754.073 44232.985  5.05
GatherOperationsBenchmark.microByteGather64_NZ_OFF          256  thrpt  30  ops/ms   2195.009 11408.702  5.19
GatherOperationsBenchmark.microByteGather64_NZ_OFF         1024  thrpt  30  ops/ms    546.530  2845.369  5.20
GatherOperationsBenchmark.microByteGather64_NZ_OFF         4096  thrpt  30  ops/ms    137.713   718.391  5.21
GatherOperationsBenchmark.microShortGather128                64  thrpt  30  ops/ms   8695.558 33438.398  3.84
GatherOperationsBenchmark.microShortGather128               256  thrpt  30  ops/ms   2189.766  8533.643  3.89
GatherOperationsBenchmark.microShortGather128              1024  thrpt  30  ops/ms    546.322  2145.239  3.92
GatherOperationsBenchmark.microShortGather128              4096  thrpt  30  ops/ms    136.503   537.493  3.93
GatherOperationsBenchmark.microShortGather128_MASK           64  thrpt  30  ops/ms   6656.883 33571.619  5.04
GatherOperationsBenchmark.microShortGather128_MASK          256  thrpt  30  ops/ms   1649.233  8533.728  5.17
GatherOperationsBenchmark.microShortGather128_MASK         1024  thrpt  30  ops/ms    421.687  2135.280  5.06
GatherOperationsBenchmark.microShortGather128_MASK         4096  thrpt  30  ops/ms    105.355   537.418  5.10
GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF    64  thrpt  30  ops/ms   6675.782 33441.402  5.00
GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF   256  thrpt  30  ops/ms   1681.000  8532.770  5.07
GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF  1024  thrpt  30  ops/ms    424.024  2135.485  5.03
GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF  4096  thrpt  30  ops/ms    106.507   537.674  5.04
GatherOperationsBenchmark.microShortGather128_NZ_OFF         64  thrpt  30  ops/ms   8796.279 33441.738  3.80
GatherOperationsBenchmark.microShortGather128_NZ_OFF        256  thrpt  30  ops/ms   2198.774  8562.333  3.89
GatherOperationsBenchmark.microShortGather128_NZ_OFF       1024  thrpt  30  ops/ms    546.991  2133.496  3.90
GatherOperationsBenchmark.microShortGather128_NZ_OFF       4096  thrpt  30  ops/ms    137.191   537.390  3.91
GatherOperationsBenchmark.microShortGather64                 64  thrpt  30  ops/ms   5286.569 38042.434  7.19
GatherOperationsBenchmark.microShortGather64                256  thrpt  30  ops/ms   1312.778  9755.474  7.43
GatherOperationsBenchmark.microShortGather64               1024  thrpt  30  ops/ms    327.475  2450.755  7.48
GatherOperationsBenchmark.microShortGather64               4096  thrpt  30  ops/ms     82.490   613.481  7.43
GatherOperationsBenchmark.microShortGather64_MASK            64  thrpt  30  ops/ms   3525.102 37622.086  10.67
GatherOperationsBenchmark.microShortGather64_MASK           256  thrpt  30  ops/ms    877.877  9740.673  11.09
GatherOperationsBenchmark.microShortGather64_MASK          1024  thrpt  30  ops/ms    219.688  2446.063  11.13
GatherOperationsBenchmark.microShortGather64_MASK          4096  thrpt  30  ops/ms     54.935   613.137  11.16
GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF     64  thrpt  30  ops/ms   3509.264 35147.895  10.01
GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF    256  thrpt  30  ops/ms    880.523  9733.536  11.05
GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF   1024  thrpt  30  ops/ms    220.578  2465.951  11.17
GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF   4096  thrpt  30  ops/ms     55.790   620.465  11.12
GatherOperationsBenchmark.microShortGather64_NZ_OFF          64  thrpt  30  ops/ms   5271.218 35543.510  6.74
GatherOperationsBenchmark.microShortGather64_NZ_OFF         256  thrpt  30  ops/ms   1318.470  9735.321  7.38
GatherOperationsBenchmark.microShortGather64_NZ_OFF        1024  thrpt  30  ops/ms    328.695  2466.311  7.50
GatherOperationsBenchmark.microShortGather64_NZ_OFF        4096  thrpt  30  ops/ms     81.959   621.065  7.57

And here is the performance data on a X86 avx512 system, which shows the performance can improve at most 39%.

Benchmark                                                (SIZE)   Mode Cnt  Units    Before      After    Gain
GatherOperationsBenchmark.microByteGather128                 64  thrpt  30  ops/ms  44205.252  46829.437  1.05
GatherOperationsBenchmark.microByteGather128                256  thrpt  30  ops/ms  11243.202  12256.211  1.09
GatherOperationsBenchmark.microByteGather128               1024  thrpt  30  ops/ms   2824.094   3096.282  1.09
GatherOperationsBenchmark.microByteGather128               4096  thrpt  30  ops/ms    706.040    776.444  1.09
GatherOperationsBenchmark.microByteGather128_MASK            64  thrpt  30  ops/ms  46911.410  46321.310  0.98
GatherOperationsBenchmark.microByteGather128_MASK           256  thrpt  30  ops/ms  12850.712  12898.541  1.00
GatherOperationsBenchmark.microByteGather128_MASK          1024  thrpt  30  ops/ms   3099.038   3240.863  1.04
GatherOperationsBenchmark.microByteGather128_MASK          4096  thrpt  30  ops/ms    795.265    832.990  1.04
GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF     64  thrpt  30  ops/ms  43065.930  47164.936  1.09
GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF    256  thrpt  30  ops/ms  11537.805  13190.759  1.14
GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF   1024  thrpt  30  ops/ms   2763.036   3304.582  1.19
GatherOperationsBenchmark.microByteGather128_MASK_NZ_OFF   4096  thrpt  30  ops/ms    722.374    843.458  1.16
GatherOperationsBenchmark.microByteGather128_NZ_OFF          64  thrpt  30  ops/ms  44145.297  46845.845  1.06
GatherOperationsBenchmark.microByteGather128_NZ_OFF         256  thrpt  30  ops/ms  12172.421  12241.941  1.00
GatherOperationsBenchmark.microByteGather128_NZ_OFF        1024  thrpt  30  ops/ms   3097.042   3100.228  1.00
GatherOperationsBenchmark.microByteGather128_NZ_OFF        4096  thrpt  30  ops/ms    776.453    775.881  0.99
GatherOperationsBenchmark.microByteGather64                  64  thrpt  30  ops/ms  58541.178  59464.156  1.01
GatherOperationsBenchmark.microByteGather64                 256  thrpt  30  ops/ms  16063.284  17360.858  1.08
GatherOperationsBenchmark.microByteGather64                1024  thrpt  30  ops/ms   4126.798   4471.636  1.08
GatherOperationsBenchmark.microByteGather64                4096  thrpt  30  ops/ms   1045.116   1125.219  1.07
GatherOperationsBenchmark.microByteGather64_MASK             64  thrpt  30  ops/ms  35344.320  49062.831  1.38
GatherOperationsBenchmark.microByteGather64_MASK            256  thrpt  30  ops/ms  11946.622  13550.297  1.13
GatherOperationsBenchmark.microByteGather64_MASK           1024  thrpt  30  ops/ms   3275.053   3359.737  1.02
GatherOperationsBenchmark.microByteGather64_MASK           4096  thrpt  30  ops/ms    844.575    858.487  1.01
GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF      64  thrpt  30  ops/ms  43550.522  48875.831  1.12
GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF     256  thrpt  30  ops/ms  12216.995  13522.420  1.10
GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF    1024  thrpt  30  ops/ms   3053.068   3391.067  1.11
GatherOperationsBenchmark.microByteGather64_MASK_NZ_OFF    4096  thrpt  30  ops/ms    753.042    869.774  1.15
GatherOperationsBenchmark.microByteGather64_NZ_OFF           64  thrpt  30  ops/ms  52082.307  58847.230  1.12
GatherOperationsBenchmark.microByteGather64_NZ_OFF          256  thrpt  30  ops/ms  14210.930  17389.898  1.22
GatherOperationsBenchmark.microByteGather64_NZ_OFF         1024  thrpt  30  ops/ms   3697.996   4476.988  1.21
GatherOperationsBenchmark.microByteGather64_NZ_OFF         4096  thrpt  30  ops/ms    921.524   1125.308  1.22
GatherOperationsBenchmark.microShortGather128                64  thrpt  30  ops/ms  44325.212  44843.853  1.01
GatherOperationsBenchmark.microShortGather128               256  thrpt  30  ops/ms  11675.510  12630.103  1.08
GatherOperationsBenchmark.microShortGather128              1024  thrpt  30  ops/ms   1260.004   1373.395  1.09
GatherOperationsBenchmark.microShortGather128              4096  thrpt  30  ops/ms    761.857    814.790  1.06
GatherOperationsBenchmark.microShortGather128_MASK           64  thrpt  30  ops/ms  36339.450  36951.803  1.01
GatherOperationsBenchmark.microShortGather128_MASK          256  thrpt  30  ops/ms   9843.842  10018.754  1.01
GatherOperationsBenchmark.microShortGather128_MASK         1024  thrpt  30  ops/ms   2515.702   2595.312  1.03
GatherOperationsBenchmark.microShortGather128_MASK         4096  thrpt  30  ops/ms    616.450    661.402  1.07
GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF    64  thrpt  30  ops/ms  34078.747  33712.577  0.98
GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF   256  thrpt  30  ops/ms   9018.316   8515.947  0.94
GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF  1024  thrpt  30  ops/ms   2250.813   2595.847  1.15
GatherOperationsBenchmark.microShortGather128_MASK_NZ_OFF  4096  thrpt  30  ops/ms    563.182    659.087  1.17
GatherOperationsBenchmark.microShortGather128_NZ_OFF         64  thrpt  30  ops/ms  39909.543  44063.331  1.10
GatherOperationsBenchmark.microShortGather128_NZ_OFF        256  thrpt  30  ops/ms  10690.582  12437.166  1.16
GatherOperationsBenchmark.microShortGather128_NZ_OFF       1024  thrpt  30  ops/ms   2677.219   3151.078  1.17
GatherOperationsBenchmark.microShortGather128_NZ_OFF       4096  thrpt  30  ops/ms    681.705    802.929  1.17
GatherOperationsBenchmark.microShortGather64                 64  thrpt  30  ops/ms  45836.789  50883.505  1.11
GatherOperationsBenchmark.microShortGather64                256  thrpt  30  ops/ms  12269.355  13614.567  1.10
GatherOperationsBenchmark.microShortGather64               1024  thrpt  30  ops/ms   3010.548   3437.973  1.14
GatherOperationsBenchmark.microShortGather64               4096  thrpt  30  ops/ms    734.634    899.070  1.22
GatherOperationsBenchmark.microShortGather64_MASK            64  thrpt  30  ops/ms  39753.487  39319.742  0.98
GatherOperationsBenchmark.microShortGather64_MASK           256  thrpt  30  ops/ms  10615.540  10648.996  1.00
GatherOperationsBenchmark.microShortGather64_MASK          1024  thrpt  30  ops/ms   2653.485   2782.477  1.04
GatherOperationsBenchmark.microShortGather64_MASK          4096  thrpt  30  ops/ms    678.165    686.024  1.01
GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF     64  thrpt  30  ops/ms  37742.593  40491.965  1.07
GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF    256  thrpt  30  ops/ms  10096.251  11036.785  1.09
GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF   1024  thrpt  30  ops/ms   2526.374   2812.550  1.11
GatherOperationsBenchmark.microShortGather64_MASK_NZ_OFF   4096  thrpt  30  ops/ms    642.484    656.152  1.02
GatherOperationsBenchmark.microShortGather64_NZ_OFF          64  thrpt  30  ops/ms  40602.930  50921.048  1.25
GatherOperationsBenchmark.microShortGather64_NZ_OFF         256  thrpt  30  ops/ms  10972.083  14151.666  1.28
GatherOperationsBenchmark.microShortGather64_NZ_OFF        1024  thrpt  30  ops/ms   2726.248   3662.293  1.34
GatherOperationsBenchmark.microShortGather64_NZ_OFF        4096  thrpt  30  ops/ms    670.735    933.299  1.39

[1] https://bugs.openjdk.org/browse/JDK-8318650
[2] https://developer.arm.com/documentation/ddi0602/2024-12/SVE-Instructions/LD1B--scalar-plus-vector---Gather-load-unsigned-bytes-to-vector--vector-index--?lang=en
[3] https://developer.arm.com/documentation/ddi0602/2024-12/SVE-Instructions/LD1H--scalar-plus-vector---Gather-load-unsigned-halfwords-to-vector--vector-index--?lang=en


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Integration blocker

 ⚠️ Title mismatch between PR and JBS for issue JDK-8351623

Issue

  • JDK-8351623: VectorAPI: Add SVE implementation of subword gather load operation (Enhancement - P4) ⚠️ Title mismatch between PR and JBS.

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24679/head:pull/24679
$ git checkout pull/24679

Update a local copy of the PR:
$ git checkout pull/24679
$ git pull https://git.openjdk.org/jdk.git pull/24679/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24679

View PR using the GUI difftool:
$ git pr show -t 24679

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24679.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 16, 2025

👋 Welcome back xgong! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 16, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added the rfr Pull request is ready for review label Apr 16, 2025
@openjdk
Copy link

openjdk bot commented Apr 16, 2025

@XiaohongGong The following labels will be automatically applied to this pull request:

  • core-libs
  • graal
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added graal graal-dev@openjdk.org hotspot hotspot-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Apr 16, 2025
@XiaohongGong
Copy link
Author

/label hotspot-compiler

@XiaohongGong
Copy link
Author

/label remove graal

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Apr 16, 2025
@openjdk
Copy link

openjdk bot commented Apr 16, 2025

@XiaohongGong
The hotspot-compiler label was successfully added.

@openjdk openjdk bot removed the graal graal-dev@openjdk.org label Apr 16, 2025
@openjdk
Copy link

openjdk bot commented Apr 16, 2025

@XiaohongGong
The graal label was successfully removed.

@mlbridge
Copy link

mlbridge bot commented Apr 16, 2025

Webrevs

@XiaohongGong
Copy link
Author

Hi @jatin-bhateja , could you please help take a look at this PR especially the X86 part? Thanks a lot!
Hi @RealFYang , could you please help review the RVV part? Thanks a lot!

* @library /test/lib /
* @modules jdk.incubator.vector
*
* @run driver compiler.vectorapi.VectorGatherSubwordTest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use @run main instead of @run driver

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look at this PR! I think it's fine using @run main instead.

@eme64
Copy link
Contributor

eme64 commented Apr 23, 2025

@XiaohongGong I had a quick look at your changes and PR description. I wonder if you could split some of the refactoring into a separate PR? That would make it easier to review. Currently, you basically have x64 changes, aarch64 changes, Java library changes, and C2 changes. That's a lot at once. And it would basically require the review from a lot of different people at once.

Splitting would make it easier to review, less work for the reviewer. It would ensure everybody can look at a smaller change set, and that would also increase the quality of the code after review, I think.

What do you think?

@XiaohongGong
Copy link
Author

@XiaohongGong I had a quick look at your changes and PR description. I wonder if you could split some of the refactoring into a separate PR? That would make it easier to review. Currently, you basically have x64 changes, aarch64 changes, Java library changes, and C2 changes. That's a lot at once. And it would basically require the review from a lot of different people at once.

Splitting would make it easier to review, less work for the reviewer. It would ensure everybody can look at a smaller change set, and that would also increase the quality of the code after review, I think.

What do you think?

Thanks for looking at this PR @eme64 ! It's a good idea splitting this PR as smaller ones. I will consider about this. Maybe I can do a refactoring first, and then implement the compiler support for AArch64 as a followed-up PR. WDYT?

@eme64
Copy link
Contributor

eme64 commented Apr 24, 2025

Thanks for looking at this PR @eme64 ! It's a good idea splitting this PR as smaller ones. I will consider about this. Maybe I can do a refactoring first, and then implement the compiler support for AArch64 as a followed-up PR. WDYT?

That sounds excellent :)

@XiaohongGong
Copy link
Author

I‘d like to close this PR and split the change with two new PRs. Thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-libs core-libs-dev@openjdk.org hotspot hotspot-dev@openjdk.org hotspot-compiler hotspot-compiler-dev@openjdk.org rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

3 participants