Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize vector literals by storing them in the constant table #1096

Merged
merged 22 commits into from
Nov 17, 2023

Conversation

petrihakkinen
Copy link
Contributor

@petrihakkinen petrihakkinen commented Nov 10, 2023

With this optimization, built-in vector constructor calls with exactly three arguments are detected by the compiler and turned into vector constants when the arguments are constant numbers. Requires optimization level 2 because built-ins are not folded otherwise by the compiler. Bytecode version is bumped because of the new constant type, but old bytecode versions can still be loaded.

The following synthetic benchmark shows ~6.6x improvement.

local v
for i = 1, 10000000 do
	v = vector(1, 2, 3)
	v = vector(1, 2, 3)
	v = vector(1, 2, 3)
	v = vector(1, 2, 3)
	v = vector(1, 2, 3)
	v = vector(1, 2, 3)
	v = vector(1, 2, 3)
	v = vector(1, 2, 3)
	v = vector(1, 2, 3)
	v = vector(1, 2, 3)
end

Also tried a more real world scenario and could see a few percent improvement.

Added a new fast flag LuauVectorLiterals for enabling the feature.

@vegorov-rbx
Copy link
Collaborator

Linking #1091 because all the context is missing.

LBC_VERSION_MAX = 4,
LBC_VERSION_TARGET = 4,
LBC_VERSION_MAX = 5,
LBC_VERSION_TARGET = 5,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TARGET has to be left at 4 until flag is removed (Roblox will take care of this future removal).

BytecodeBuilder::getVersion() should be updated to return 5 when flag is enabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

petrihakkinen and others added 5 commits November 10, 2023 15:49
Co-authored-by: vegorov-rbx <75688451+vegorov-rbx@users.noreply.github.com>
Co-authored-by: vegorov-rbx <75688451+vegorov-rbx@users.noreply.github.com>
Co-authored-by: vegorov-rbx <75688451+vegorov-rbx@users.noreply.github.com>
@petrihakkinen
Copy link
Contributor Author

@vegorov-rbx Thanks! Committed your suggestions and fixed bytecode versioning.

uint64_t value;
uint32_t extra = 0;
Copy link
Collaborator

@zeux zeux Nov 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unnecessarily increases the size of ConstantKey to 24 bytes due to its placement after value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll fix it on Monday.

@petrihakkinen
Copy link
Contributor Author

petrihakkinen commented Nov 11, 2023

One more thought about 4d literals:

We could always store 4 components in compiler data structures and bytecode, i.e. LBC_CONSTANT_VECTOR would have 4 components regardless of LUA_VECTOR_SIZE. In luau_load call setvvalue with the 4 components read from the input. When LUA_VECTOR_SIZE = 3, the VM would simply truncate the vector to 3 components. The only downside I see is that constants would always be stored with 4 components in bytecode dumps. I think this would not break any existing code or assumptions and would be a reasonable compromise. Note that luaF_vector has the same truncating behavior.

My main motivation is to avoid having to remember rules of thumb such as:

  • To stay on the fast path, move vector literals with 4 arguments out of inner loops
  • Prefer 3d literals when the fourth component is zero

We could leave this for future work, but if we did this now we could avoid another bytecode version bump.

What do you think @zeux?

@zeux
Copy link
Collaborator

zeux commented Nov 11, 2023

I'm curious what @vegorov-rbx thinks, I think I'd be fine with this given that the behavior of builtin vector constructor is to ignore 4th component in vector3 builds, and default it to 0 in vector4 builds, so replicating this logic in the compiler is fine. The one small downside is that vector4 is an exceptional configuration (I know you use it, but that's not the default and most configs won't use it), so we're paying a small extra price in bytecode size for a non-default configuration. (plus small extra compile time cost for inflated constants unfortunately)

However, the price is not that large, and I think it's probably still a net win vs what we currently get with fastcall and separate numeric constants encoded into instructions / constant table as appropriate. FWIW on our internal codebase this entire change is fairly neutral on code size, so we don't use constant vectors as heavily, so from the operational perspective it's fine either way.

@petrihakkinen
Copy link
Contributor Author

I've updated the PR to support four arguments so that we can see more easily how the code looks like in practice. Waiting for comments, but from my side it looks ok now.

@vegorov-rbx
Copy link
Collaborator

I'm curious what @vegorov-rbx thinks

I think having 4 components in bytecode is a good solution given that while vector4 configuration is not the default one, it's still supported.
While there are some differences of how the same bytecode executes on different VMs, I don't think such bytecode sharing is desirable overall.
As long as we only have one bytecode format under a single bytecode version, this is ok.

@petrihakkinen
Copy link
Contributor Author

I'm getting codecov test failure... I think these happen because LuauVectorLiterals is not enabled by default. For example, codecov reports that case LBC_CONSTANT_VECTOR in lvmload.c is not covered, but this case is tested by conformance/vector.lua when run with -O2 (this is covered by extra conformance tests afaik). Should we simply ignore these?

@zeux
Copy link
Collaborator

zeux commented Nov 15, 2023

Not sure what's up with codecov; it's not supposed to fail at all regardless of the diff coverage. Should be safe to ignore, might be an infrastructure issue on their side.

Add a comment
Now with one less space
@petrihakkinen
Copy link
Contributor Author

Thanks for the approval!

@vegorov-rbx vegorov-rbx merged commit 298cd70 into luau-lang:master Nov 17, 2023
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants