Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate LLVM at a1d43c14d (+1 revert) #17380

Merged
merged 2 commits into from
May 14, 2024
Merged

Integrate LLVM at a1d43c14d (+1 revert) #17380

merged 2 commits into from
May 14, 2024

Conversation

bjacob
Copy link
Contributor

@bjacob bjacob commented May 13, 2024

This allows dropping our existing local-revert of llvm/llvm-project#89131 and cherry-pick of llvm/llvm-project#91654 which we had introduced in the earlier integrate #17330.

This locally reverts llvm/llvm-project#90802 because it causes numerical errors, reported at llvm/llvm-project#90802 (comment).

Copy link

github-actions bot commented May 13, 2024

Abbreviated Benchmark Summary

@ commit becbc67550cfcf57f956475df43e0b4d85f9c516 (vs. base 01ef465ead9c1aa036c14a60352892001e35ca32)

Data-Tiling Comparison Table

Click to show
Name No-DT (baseline) DT-Only DT-UK
BertForMaskedLMTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 230.746 (1.0X) 139.804 (1.7X) 113.789 (2.0X)
BertLargeTF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[30-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 678.115 (1.0X) 277.192 (2.4X) 228.512 (3.0X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 32.253 (1.0X) 40.453 (0.8X) 33.033 (1.0X)
DeepLabV3_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 7.068 (1.0X) 9.525 (0.7X) 8.546 (0.8X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 267.528 (1.0X) 264.549 (1.0X) 233.781 (1.1X)
EfficientNetV2STF(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.844 (1.0X) 36.935 (0.9X) 33.866 (1.0X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 28.694 (1.0X) 52.710 (0.5X) 15.444 (1.9X)
EfficientNet_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.915 (1.0X) 11.099 (0.5X) 5.265 (1.1X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 70.441 (1.0X) 37.406 (1.9X) 39.836 (1.8X)
GPT2_117M_TF_1X1XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.923 (1.0X) 8.696 (1.0X) 8.525 (1.0X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 89.464 (1.0X) 42.464 (2.1X) 41.761 (2.1X)
GPT2_117M_TF_1X4XI32(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 10.613 (1.0X) 8.665 (1.2X) 8.182 (1.3X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 81.191 (1.0X) 86.148 (0.9X) 62.698 (1.3X)
MiniLML12H384Uncased(stablehlo) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 12.287 (1.0X) 14.639 (0.8X) 12.787 (1.0X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 179.059 (1.0X) 250.681 (0.7X) 187.751 (1.0X)
MobileBertSquad_fp16(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 33.791 (1.0X) 62.828 (0.5X) 57.737 (0.6X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 179.236 (1.0X) 251.118 (0.7X) 192.335 (0.9X)
MobileBertSquad_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 34.264 (1.0X) 62.873 (0.5X) 58.941 (0.6X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 482.457 (1.0X) 1055.178 (0.5X) 213.614 (2.3X)
MobileBertSquad_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[15-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 60.867 (1.0X) 220.234 (0.3X) 64.187 (0.9X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 25.762 (1.0X) 22.909 (1.1X) 18.205 (1.4X)
MobileNetV1_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.811 (1.0X) 5.113 (0.9X) 4.497 (1.1X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 11.589 (1.0X) 15.660 (0.7X) 12.818 (0.9X)
MobileNetV2_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 3.659 (1.0X) 5.280 (0.7X) 4.874 (0.8X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 21.480 (1.0X) 44.676 (0.5X) 13.971 (1.5X)
MobileNetV2_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 5.706 (1.0X) 9.878 (0.6X) 5.539 (1.0X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 2.839 (1.0X) 3.880 (0.7X) 3.078 (0.9X)
MobileNetV3Small_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 2.897 (1.0X) 3.972 (0.7X) 3.199 (0.9X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 33.760 (1.0X) 40.718 (0.8X) 32.548 (1.0X)
MobileSSD_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 8.566 (1.0X) 10.668 (0.8X) 9.464 (0.9X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 0.711 (1.0X) 1.421 (0.5X) 0.596 (1.2X)
PersonDetect_int8(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 0.773 (1.0X) 1.497 (0.5X) 0.661 (1.2X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 17.918 (1.0X) 26.626 (0.7X) 21.406 (0.8X)
PoseNet_fp32(tflite) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_task(embedded_elf)[8-thread,full-inference,default-flags] with default @ c2-standard-60[cpu] 4.111 (1.0X) 6.063 (0.7X) 5.263 (0.8X)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [x86_64-cascadelake-linux_gnu-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ c2-standard-60[cpu] 7.580 (1.0X) 7.590 (1.0X) 7.587 (1.0X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 49.804 (1.0X) 83.953 (0.6X) 78.482 (0.6X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 51.590 (1.0X) 85.207 (0.6X) 79.628 (0.6X)
DeepLabV3_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 31.368 (1.0X) 50.179 (0.6X) 47.158 (0.7X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 93.244 (1.0X) 22.163 (4.2X) 21.036 (4.4X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 93.766 (1.0X) 21.509 (4.4X) 21.795 (4.3X)
GPT2_117M_TF_1X1XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 52.621 (1.0X) 21.847 (2.4X) 21.927 (2.4X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 125.936 (1.0X) 28.009 (4.5X) 27.383 (4.6X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 137.082 (1.0X) 30.083 (4.6X) 29.647 (4.6X)
GPT2_117M_TF_1X4XI32(stablehlo) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 72.727 (1.0X) 26.607 (2.7X) 26.676 (2.7X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 717.375 (1.0X) 443.547 (1.6X) 371.132 (1.9X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 714.635 (1.0X) 452.747 (1.6X) 374.555 (1.9X)
MobileBertSquad_fp32(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 409.759 (1.0X) 275.327 (1.5X) 225.037 (1.8X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 1045.466 (1.0X) 629.316 (1.7X) 261.135 (4.0X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1045.985 (1.0X) 627.996 (1.7X) 259.008 (4.0X)
MobileBertSquad_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 552.266 (1.0X) 345.533 (1.6X) 152.515 (3.6X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 2098.148 (1.0X) 1083.617 (1.9X) 305.306 (6.9X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 2097.328 (1.0X) 1087.447 (1.9X) 304.153 (6.9X)
Vit_int8(tflite) [armv8.2-a-generic-linux_android29-llvm_cpu] local_task(embedded_elf)[2-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 1139.894 (1.0X) 610.219 (1.9X) 183.619 (6.2X)
matmul_256x256x2048_i8_i4_i32_tile_config_default(linalg) [armv8.2-a-generic-linux_android29-llvm_cpu] local_sync(embedded_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 12.266 (1.0X) 10.039 (1.2X) 1.462 (8.4X)

Regressed Latencies 🚩

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding,max-concurrency,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 110.018 (vs. 94.657, 16.23%↑) 110.209 1.052
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 137.082 (vs. 129.025, 6.24%↑) 137.079 0.201

Improved Latencies 🎉

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileBertSquad\_fp16(tflite) [arm-valhall-vulkan\_android31-vulkan\_spirv][default-flags,demote-f32-to-f16] vulkan(none)[full-inference,default-flags] with default @ pixel-6-pro[gpu] 71.801 (vs. 85.737, 16.25%↓) 71.676 1.042
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,no-dt] local\_sync(embedded\_elf)[full-inference,default-flags] with default @ pixel-6-pro[big-cores] 125.936 (vs. 142.557, 11.66%↓) 126.028 0.342
GPT2\_117M\_TF\_1X4XI32(stablehlo) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,dt-only] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with default @ pixel-6-pro[big-cores] 30.083 (vs. 33.055, 8.99%↓) 30.202 0.953

[Top 3 out of 25 results showed]

No improved or regressed compilation metrics 🏖️

For more information:

Source Workflow Run

@bjacob bjacob force-pushed the llvm-integrate-20240513 branch 2 times, most recently from 4c7db1b to 1441e23 Compare May 14, 2024 03:27
@bjacob bjacob changed the title Integrate LLVM at a1d43c14d8a672730af48d946acc41fa01cf301e Integrate LLVM at a1d43c14d (+1 revert) May 14, 2024
@bjacob bjacob marked this pull request as ready for review May 14, 2024 03:53
@bjacob bjacob enabled auto-merge (squash) May 14, 2024 04:26
@bjacob bjacob merged commit 2ed4778 into main May 14, 2024
64 checks passed
@bjacob bjacob deleted the llvm-integrate-20240513 branch May 14, 2024 09:31
ingomueller-net added a commit that referenced this pull request May 15, 2024
This integrates four new MLIR-related commits from LLVM (until
llvm/llvm-project@c5e67b86) and preserves the local revert from #17380.

I assume I could integrate more commits but I am still getting to know
the process and have the feeling that doing one successful integrate
will help me with that.
bangtianliu pushed a commit to bangtianliu/iree that referenced this pull request Jun 5, 2024
This allows dropping our existing local-revert of
llvm/llvm-project#89131 and cherry-pick of
llvm/llvm-project#91654 which we had introduced
in the earlier integrate iree-org#17330.

This locally reverts llvm/llvm-project#90802
because it causes numerical errors, reported at
llvm/llvm-project#90802 (comment).
bangtianliu pushed a commit to bangtianliu/iree that referenced this pull request Jun 5, 2024
This integrates four new MLIR-related commits from LLVM (until
llvm/llvm-project@c5e67b86) and preserves the local revert from iree-org#17380.

I assume I could integrate more commits but I am still getting to know
the process and have the feeling that doing one successful integrate
will help me with that.
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this pull request Jul 30, 2024
This allows dropping our existing local-revert of
llvm/llvm-project#89131 and cherry-pick of
llvm/llvm-project#91654 which we had introduced
in the earlier integrate iree-org#17330.

This locally reverts llvm/llvm-project#90802
because it causes numerical errors, reported at
llvm/llvm-project#90802 (comment).

Signed-off-by: Lubo Litchev <lubol@google.com>
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this pull request Jul 30, 2024
This integrates four new MLIR-related commits from LLVM (until
llvm/llvm-project@c5e67b86) and preserves the local revert from iree-org#17380.

I assume I could integrate more commits but I am still getting to know
the process and have the feeling that doing one successful integrate
will help me with that.

Signed-off-by: Lubo Litchev <lubol@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants