-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CodeGen: Preserve known tags for LOAD_TVALUE synthesized from LOADK #1201
Conversation
When lowering LOADK for booleans/numbers/nils, we deconstruct the operation using STORE_TAG which informs the rest of the optimization pipeline about the tag of the value. This is helpful to remove various tag checks. When the constant is a string or a vector, we just use LOAD_TVALUE/STORE_TVALUE. For strings, this could be replaced by pointer load/store, but for vectors there's no great alternative using current IR ops; in either case, the optimization needs to be carefully examined for profitability as simply copying constants into registers for function calls could become more expensive. However, there are cases where it's still valuable to preserve the tag. For vectors, doing any math with vector constants contains tag checks that could be removed. For both strings and vectors, storing them into a table has a barrier that for vectors could be elided, and for strings could be simplified as there's no need to confirm the tag. With this change we now carry the optional tag of the value with LOAD_TVALUE. This has no performance effect on existing benchmarks but does reduce the generated code for benchmarks by ~0.1%, and it makes vector code more efficient (~5% lift on X64 log1p approximation).
This change results in ~4% lift on A64 and ~5% lift on X64 for vector approximation which is the main goal here, but it seems like it doesn't hurt for strings (the code size reduction on benchmark code is due to strings exclusively - I don't have access to a large Luau codebase so can't measure broad impact but assume it's a slight codesize reduction as well) so it's applied unconditionally. |
fwiw assembly diff for A64 for vector approximation: 8: local r: vector = vector(0, 0, 0)
LOADK R1 K0 [0, 0, 0]
vector <- vector, any
-# %4 = LOAD_TVALUE K0 ; useCount: 2, lastUse: %14
+# %4 = LOAD_TVALUE K0, 0i, tvector ; useCount: 1, lastUse: %14
ldr q31,[x22]
-# STORE_TVALUE R1, %4 ; %5
- str q31,[x25,#16]
9: local a: vector = x
MOVE R2 R0
vector <- vector, any
.L22:
-# %6 = LOAD_TVALUE R0 ; useCount: 8, lastUse: %123
+# %6 = LOAD_TVALUE R0 ; useCount: 7, lastUse: %123
ldr q30,[x25]
-# STORE_TVALUE R2, %6 ; %7
- str q30,[x25,#32]
10: r += a -- todo: * 1 doesn't fold yet
ADD R1 R1 R2
vector <- vector, vector
.L23:
-# %8 = LOAD_TAG R1 ; useCount: 1, lastUse: %9
- ldr w17,[x25,#28]
-# CHECK_TAG %8, tvector, exit(2) ; %9
- cmp w17,#4
- b.ne .L24
# %14 = ADD_VEC %4, %6 ; useCount: 1, lastUse: %39
fadd v31.4s,v31.4s,v30.4s
11: a *= x
MUL R2 R2 R0
vector <- vector, vector |
I think this change triggers a bug in another optimization and causes tests on Ubuntu to fail. |
My prediction is that the sync will hit the same issue but we'll see :) The error shows up in a slightly different run every time afaict - the first run of this PR actually failed the main test execution (no codegen). A rerun just now failed in a flags=false no-codegen run: The failure happens before any tests actually run (no doctest banner which is printed first). This is an ASAN issue with ASLR that I also have locally on my system, it's spuriously failing test runs on master as well as on this branch. It's been reported here google/sanitizers#1614 and fixed in some versions of clang/gcc but if you don't have the latest build then you're going to run into issues... Locally, |
Judging from the references that are starting to appear on the linked issue, this started happening on GHA recently. Perhaps the number of randomized bits set to 32 is a recent GHA-side or Ubuntu-side config change. |
Thanks for figuring that out. |
When lowering LOADK for booleans/numbers/nils, we deconstruct the operation using STORE_TAG which informs the rest of the optimization pipeline about the tag of the value. This is helpful to remove various tag checks.
When the constant is a string or a vector, we just use LOAD_TVALUE/STORE_TVALUE. For strings, this could be replaced by pointer load/store, but for vectors there's no great alternative using current IR ops; in either case, the optimization needs to be carefully examined for profitability as simply copying constants into registers for function calls could become more expensive.
However, there are cases where it's still valuable to preserve the tag. For vectors, doing any math with vector constants contains tag checks that could be removed. For both strings and vectors, storing them into a table has a barrier that for vectors could be elided, and for strings could be simplified as there's no need to confirm the tag.
With this change we now carry the optional tag of the value with LOAD_TVALUE. This has no performance effect on existing benchmarks but does reduce the generated code for benchmarks by ~0.1%, and it makes vector code more efficient (~5% lift on X64 log1p approximation).