New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irjit: Optimize out more temps and lwl/lwr operations #10516

Merged
merged 9 commits into from Jan 10, 2018

Conversation

Projects
None yet
2 participants
@unknownbrackets
Collaborator

unknownbrackets commented Jan 8, 2018

This moves to using a Load32Left and related op, instead of generating the masking right then and there. It still generates the masking before execution, just as an optimization pass.

The pass will be skipped when using a breakpoint, but then we can just leave the backend to interp fallback. This way we never have to reimplement these in a backend, since they're annoying.

On the way, I optimized out two cases, which happened with lwr but also happened generally. Most specifically, this:

or t0, a0, a1
beq t0, v0, foo
lui t0, 0x1234

Would previously turn into:

Or t0, a0, a1
Mov lhs, t0
Mov rhs, v0
SetConst t0, 0x12340000
ExitIfEq lhs, rhs, foo

Now:

Or lhs, a0, a1
SetConst t0, 0x12340000
ExitIfEq lhs, v0, foo

(which is actually two separate optimizations.)

I haven't really tested the changes to the disabled passes though, just tried to keep them right. Don't remember if they work or were just not worth the extra cycles...

-[Unknown]

unknownbrackets added some commits Jan 7, 2018

irjit: Optimize out temp lhs copies.
Common example:
li v0, 1
beq s2, v0, somewhere
li v0, 2

Which was copying s2 before.  This pattern generally doesn't happen in
MIPS code, though, so really only catches that (very common) case.
irjit: Add dedicated ops for lwl/swl and friends.
Temporarily removes optimizations.
irjit: Combine lwl/lwr and swl/swr, like before.
Still want to inline the operation, because the backend shouldn't have to
redo it every time, and we want the temps cleaned up if possible.
irjit: Convert lwr and friends to easier code.
This makes it easier to write a (working) jit backend from IR, since these
ops are always annoying to get right.
irjit: Swap moves when it may allow clobbering.
Example:
addiu a0, a1, a2
mov s0, a0
addiu a0, a2, a3

By swapping the mov, we can eliminate it.

Only going one back because it's common and didn't want to track reads.
GPU: Improve some bezier logging.
Meant to do this when splines were changed.
// RAM(addrReg) = valueReg
ir.Write(IROp::Store32, valueReg, addrReg, ir.AddConstant(0));
}
// Should never get here, done in Comp_ITypeMem().

This comment has been minimized.

@hrydgard

hrydgard Jan 9, 2018

Owner

Can't we just delete this whole function then?

This comment has been minimized.

@unknownbrackets

unknownbrackets Jan 9, 2018

Collaborator

Ah, oops, I thought it was part of the jit interface or something. Will remove in some hours.

-[Unknown]

@hrydgard

This comment has been minimized.

Owner

hrydgard commented Jan 9, 2018

Very neat.

For the interpreter, it might actually be faster to leave the LoadLeft etc ops behind when they can't be combined, but we definitely want to combine them when we later emit native code from the IR, to avoid duplicating that work for each backend. Anyway, I'm not suggesting to change anything here, just making a note.

@hrydgard

This comment has been minimized.

Owner

hrydgard commented Jan 9, 2018

Another thing. The current IR is nice for interpretation but I think if we really want to optimize whole functions with it eventually, we might want to raise it to tree-form, like most other IRs. That avoids all the silliness of having to move ops up and down in order to be able to do peephole optimizations, or various scans in passes - ops will always have their inputs accessible directly in the tree form. But that's for later of course, if at all.

@hrydgard hrydgard merged commit 4a32ec3 into hrydgard:master Jan 10, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@unknownbrackets unknownbrackets deleted the unknownbrackets:irjit-lwr branch Jan 10, 2018

@unknownbrackets

This comment has been minimized.

Collaborator

unknownbrackets commented Jan 10, 2018

Yeah, I'm a little worried about such optimizations since we (might) still have to keep filling regs (see: discard jr ra not working everywhere.)

Though, I wonder if we could use the analyst to detect if jal ever reads from those regs. Would have to discard those results if we ever compiled code not analyzed, though.

-[Unknown]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment