Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External mode compiler: Compile-time evaluate most constant subexpressions #5482

Merged
merged 4 commits into from
May 19, 2024

Conversation

solardiz
Copy link
Member

This replaces yesterday's optimization of negative integer constants.

While the code explicitly implements constant subexpression evaluation only for individual operations (unary or binary), one at a time, it usually does also work for longer constant subexpressions, where one operation's result may get further merged into another's, all at compile time.

For unary operations, this is the easiest - a VM push instruction's last immediate operand can get repeatedly updated with subsequent unary operations' results.

For binary operations, we first get a second push_imm optimized into push_imm_imm, then the binary operation optimizes that "back" into a single push_imm of the result, which can get further optimized into a push_imm_imm to accommodate a third constant input and then into a push_imm again with the second binary operation's result, and so on.

This also works for mixes of unary and binary operations.

That said, there's still an unimplemented case here as documented in a comment, which makes optimization stop when encountered:

        /* Compile-time evaluate binary op after push_*_imm, push_imm */
        /* Unimplemented, need to start tracking penultimate op first */

Our multi-push instructions on one hand help here (we can detect that a binary op has two immediate operands with one check and without keeping track of more than the last instruction), but on the other they hurt (there are many combinations, including those where a binary op's operands may have been split across two pushes anyway).

The compile-time evaluation is done by invoking tiny pieces of temporary VM code, with the VM assign_pop instruction tricked into it patching an immediate operand in the final VM code.

While this is fun and something I've been meaning to add for years, there's a reason why I didn't approach it sooner - there's almost no speedup from it on our existing external modes, where we don't commonly write constant subexpressions in performance-critical places. The occasional -1 constants are pretty much the only cases where this matters in practice so far (and this was also taken care of by a more specialized change yesterday).

My new test case, which now gets optimized a lot:

[List.External:Test]
void generate()
{
	if (word[2] == 666) {
		word = 0;
		return;
	}
	word[0] =-1 + 'a' + 1 - -111 -111 - 222 + 222 -111-111+222;
	word[0] -=-22;
	word[0] -= 22;
	word[0]-=-22;
	word[0]-=22;
	word[0]+=(33+(33+33))-(44+44+44)+(11+11+11);
	word[1] = -1 + 1;
	word[2] = 666;
}

The 33+(33+33) part triggers the yet unimplemented case mentioned above.

@solardiz
Copy link
Member Author

In my testing, I also try #define PRINT_INSNS and separately #undef __GNUC__, because we have (almost) two implementations of the VM - for GNU C and not - and tricks like calling into the VM code from the compiler obviously need to be tested on both.

@solardiz solardiz merged commit f693040 into openwall:bleeding-jumbo May 19, 2024
31 of 32 checks passed
@solardiz
Copy link
Member Author

I also try #define PRINT_INSNS and separately #undef __GNUC__, because we have (almost) two implementations of the VM - for GNU C and not - and tricks like calling into the VM code from the compiler obviously need to be tested on both.

... but I had only tested this without ASan. Thinking of the code later, I realized I probably have a VM stack underflow. Retesting the above with ASan indeed detects it (or something like it). I'll look into fixing that now.

@solardiz
Copy link
Member Author

I probably have a VM stack underflow. Retesting the above with ASan indeed detects it (or something like it). I'll look into fixing that now.

Fixed now after testing on my machine. I pushed directly to bleeding-jumbo since our CI setup does not currently compile that code anyway - something we'd want to fix separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant