CodeGen: Preserve known tags for LOAD_TVALUE synthesized from LOADK #1201

zeux · 2024-03-14T21:02:26Z

When lowering LOADK for booleans/numbers/nils, we deconstruct the operation using STORE_TAG which informs the rest of the optimization pipeline about the tag of the value. This is helpful to remove various tag checks.

When the constant is a string or a vector, we just use LOAD_TVALUE/STORE_TVALUE. For strings, this could be replaced by pointer load/store, but for vectors there's no great alternative using current IR ops; in either case, the optimization needs to be carefully examined for profitability as simply copying constants into registers for function calls could become more expensive.

However, there are cases where it's still valuable to preserve the tag. For vectors, doing any math with vector constants contains tag checks that could be removed. For both strings and vectors, storing them into a table has a barrier that for vectors could be elided, and for strings could be simplified as there's no need to confirm the tag.

With this change we now carry the optional tag of the value with LOAD_TVALUE. This has no performance effect on existing benchmarks but does reduce the generated code for benchmarks by ~0.1%, and it makes vector code more efficient (~5% lift on X64 log1p approximation).

When lowering LOADK for booleans/numbers/nils, we deconstruct the operation using STORE_TAG which informs the rest of the optimization pipeline about the tag of the value. This is helpful to remove various tag checks. When the constant is a string or a vector, we just use LOAD_TVALUE/STORE_TVALUE. For strings, this could be replaced by pointer load/store, but for vectors there's no great alternative using current IR ops; in either case, the optimization needs to be carefully examined for profitability as simply copying constants into registers for function calls could become more expensive. However, there are cases where it's still valuable to preserve the tag. For vectors, doing any math with vector constants contains tag checks that could be removed. For both strings and vectors, storing them into a table has a barrier that for vectors could be elided, and for strings could be simplified as there's no need to confirm the tag. With this change we now carry the optional tag of the value with LOAD_TVALUE. This has no performance effect on existing benchmarks but does reduce the generated code for benchmarks by ~0.1%, and it makes vector code more efficient (~5% lift on X64 log1p approximation).

zeux · 2024-03-14T21:05:53Z

This change results in ~4% lift on A64 and ~5% lift on X64 for vector approximation which is the main goal here, but it seems like it doesn't hurt for strings (the code size reduction on benchmark code is due to strings exclusively - I don't have access to a large Luau codebase so can't measure broad impact but assume it's a slight codesize reduction as well) so it's applied unconditionally.

CodeGen/include/Luau/IrData.h

CodeGen/src/OptimizeConstProp.cpp

zeux · 2024-03-14T22:42:59Z

fwiw assembly diff for A64 for vector approximation:

     8: 	local r: vector = vector(0, 0, 0)
 LOADK R1 K0 [0, 0, 0]
 vector <- vector, any
-#   %4 = LOAD_TVALUE K0                                       ; useCount: 2, lastUse: %14
+#   %4 = LOAD_TVALUE K0, 0i, tvector                          ; useCount: 1, lastUse: %14
  ldr         q31,[x22]
-#   STORE_TVALUE R1, %4                                       ; %5
- str         q31,[x25,#16]
     9: 	local a: vector = x
 MOVE R2 R0
 vector <- vector, any
 .L22:
-#   %6 = LOAD_TVALUE R0                                       ; useCount: 8, lastUse: %123
+#   %6 = LOAD_TVALUE R0                                       ; useCount: 7, lastUse: %123
  ldr         q30,[x25]
-#   STORE_TVALUE R2, %6                                       ; %7
- str         q30,[x25,#32]
    10: 	r += a -- todo: * 1 doesn't fold yet
 ADD R1 R1 R2
 vector <- vector, vector
 .L23:
-#   %8 = LOAD_TAG R1                                          ; useCount: 1, lastUse: %9
- ldr         w17,[x25,#28]
-#   CHECK_TAG %8, tvector, exit(2)                            ; %9
- cmp         w17,#4
- b.ne        .L24
 #   %14 = ADD_VEC %4, %6                                      ; useCount: 1, lastUse: %39
  fadd        v31.4s,v31.4s,v30.4s
    11: 	a *= x
 MUL R2 R2 R0
 vector <- vector, vector

vegorov-rbx · 2024-03-15T10:45:42Z

I think this change triggers a bug in another optimization and causes tests on Ubuntu to fail.
I believe it will be fixed by the sync PR today, so will have to be rebased on that.

zeux · 2024-03-15T16:16:43Z

My prediction is that the sync will hit the same issue but we'll see :)

The error shows up in a slightly different run every time afaict - the first run of this PR actually failed the main test execution (no codegen). A rerun just now failed in a flags=false no-codegen run:

The failure happens before any tests actually run (no doctest banner which is printed first). This is an ASAN issue with ASLR that I also have locally on my system, it's spuriously failing test runs on master as well as on this branch. It's been reported here google/sanitizers#1614 and fixed in some versions of clang/gcc but if you don't have the latest build then you're going to run into issues...

Locally, sudo sysctl -w vm.mmap_rnd_bits=28 fixes the issue completely - if the sync still sees it we may need to add this to the build steps before running any tests for the time being...

zeux · 2024-03-15T16:18:59Z

Judging from the references that are starting to appear on the linked issue, this started happening on GHA recently. Perhaps the number of randomized bits set to 32 is a recent GHA-side or Ubuntu-side config change.

zeux · 2024-03-15T16:24:15Z

#1203

vegorov-rbx · 2024-03-15T16:34:12Z

Thanks for figuring that out.

vegorov-rbx reviewed Mar 14, 2024

View reviewed changes

CodeGen/include/Luau/IrData.h Show resolved Hide resolved

vegorov-rbx reviewed Mar 14, 2024

View reviewed changes

CodeGen/src/OptimizeConstProp.cpp Show resolved Hide resolved

vegorov-rbx approved these changes Mar 15, 2024

View reviewed changes

Merge branch 'master' into cg-ltvtag

b6efddf

vegorov-rbx merged commit a768311 into master Mar 15, 2024
8 checks passed

vegorov-rbx deleted the cg-ltvtag branch March 15, 2024 16:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CodeGen: Preserve known tags for LOAD_TVALUE synthesized from LOADK #1201

CodeGen: Preserve known tags for LOAD_TVALUE synthesized from LOADK #1201

zeux commented Mar 14, 2024

zeux commented Mar 14, 2024

zeux commented Mar 14, 2024

vegorov-rbx commented Mar 15, 2024

zeux commented Mar 15, 2024

zeux commented Mar 15, 2024 •

edited

Loading

zeux commented Mar 15, 2024

vegorov-rbx commented Mar 15, 2024

CodeGen: Preserve known tags for LOAD_TVALUE synthesized from LOADK #1201

CodeGen: Preserve known tags for LOAD_TVALUE synthesized from LOADK #1201

Conversation

zeux commented Mar 14, 2024

zeux commented Mar 14, 2024

zeux commented Mar 14, 2024

vegorov-rbx commented Mar 15, 2024

zeux commented Mar 15, 2024

zeux commented Mar 15, 2024 • edited Loading

zeux commented Mar 15, 2024

vegorov-rbx commented Mar 15, 2024

zeux commented Mar 15, 2024 •

edited

Loading