Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: make rulegen produce less verbose code #33644

Open
mvdan opened this issue Aug 14, 2019 · 9 comments

Comments

@mvdan
Copy link
Member

commented Aug 14, 2019

This is a continuation to #30810, once the rewrite has been merged. A number of ideas will be proposed below. The purpose of this is to:

  • Produce less lines of code, reducing the size of diffs and files on disk
  • Produce simpler code, making it more readable to humans
  • Hopefully reduce compile time of the generated code slightly

Here are my ideas so far:

  1. Simplifying of boolean expressions. For example, turn !(d == 32-c) into d != 32-c.

  2. Joining of contiguous break conditions, such as:

// verbose
if v_0.Op != OpAMD64SHLLconst {
        break
}
if v_0.AuxInt != 3 {
        break
}

// simpler
if v_0.Op != OpAMD64SHLLconst || v_0.AuxInt != 3 {
        break
}
  1. Removing of uninteresting lines, such as the cond line here:
// match: (ADDL x (SHLLconst [3] y))
// cond:
// result: (LEAL8 x y)
  1. Move declarations closer to their uses, reducing the number of variables in scope in some cases, such as:
for {
        carry := v.Args[2]
        x := v.Args[0] // Not actually used until the very end!
        v_1 := v.Args[1]
        if v_1.Op != OpAMD64MOVQconst {
                break
        }
        c := v_1.AuxInt
        if !(is32Bit(c)) {
                break
        }
        v.reset(OpAMD64ADCQconst)
        v.AuxInt = c
        v.AddArg(x)
        v.AddArg(carry)
        return true
}
  1. Reorder slice accesses to reduce the amount of manual bounds checks speed-ups:
// verbose
_ = v.Args[1]
x := v.Args[0]
v_1 := v.Args[1]

// simpler
v_1 := v.Args[1]
x := v.Args[0]

These are the ideas I have so far. If we think any of them would reduce readability, they can be dropped. I'm also open to more ideas and suggestions, of course. I'd send these as separate CLs so we can measure their impact on the generated code separately.

/cc @josharian @randall77 @laboger @mundaym @cherrymui

@mvdan mvdan added this to the Go1.14 milestone Aug 14, 2019

@mvdan mvdan added the NeedsDecision label Aug 14, 2019

@mvdan mvdan self-assigned this Aug 14, 2019

@mvdan

This comment has been minimized.

Copy link
Member Author

commented Aug 14, 2019

I also wonder if some lines, such as b.Aux = nil, could be recognised as being unnecessary.

@laboger

This comment has been minimized.

Copy link
Contributor

commented Aug 14, 2019

I would like to see a way to indicate that a rule should not be expanded to generate all possible operand orderings even though the opcode is marked commutative. The main example where this is a problem is in the rules to combine loads and stores which currently exist for many GOARCH targets. Those rules consist of multiple ORs, the statement to be matched can really only occur with a few of the possible operand orders, and yet all combinations are generated for all possible orders. This is especially costly for the 8 byte cases.

This would reduce the size of the rewrite files, the compile binary, and compile time.

@mvdan

This comment has been minimized.

Copy link
Member Author

commented Aug 14, 2019

That's an interesting idea, thanks. Could this be done automatically, or would the rules syntax need extra information to know when or not to do this expansion?

@laboger

This comment has been minimized.

Copy link
Contributor

commented Aug 14, 2019

I'm not sure this could be done automatically, so I was thinking of some syntax to add to the rule, like maybe a modifier to the opcode or a prefix with a special name that could be added to the outermost level to restrict the number of combinations generated. In the case I've mentioned with combined loads and stores, we would probably want to allow just 2 combinations to be generated: the operands in the order they appear in the rule, and the reverse order. That way it could still match either of these for the 4 byte case:

x := b[0] | b[1]<<8 | b[2]<<16 | b[3]<<24
y := b[3]<<24 | b[2]<<16 | b[1]<<8 | b[0]

It is highly unlikely that this statement would be written with the operands in a different order from these, yet many combinations are generated and matches are attempted.

I'd be willing to try out something to see how it helps.

We currently do not have any of the combined load and store rules for ppc64 big endian, and we are missing some for the 4 byte cases on ppc64le, but I've avoided adding them because of how much they will increase the size of rewritePPC64.go.

@mundaym

This comment has been minimized.

Copy link
Member

commented Aug 14, 2019

I agree with Lynn that reducing the amount of code generated for commutative operations would be really great and a big code size win. I tried writing generic rules for unaligned loads and the 64-bit load generated megabytes of code due to the 7 commutative ops. My attempt to use generic rules to match variable rotates also ran into this combinatorial explosion.

As well as the approach suggested we could also look at swapping arguments to commutative operations so that they are put into a canonical order (I think there was a CL where Ian mentioned this is how GCC deals with this issue).

Another approach we could investigate is swapping argument values in the generated code. For example, I think in a lot of cases we could generate code that looked like this (assuming v is commutative):

...
v1 := v.Args[0]
v2 := v.Args[1]
if v1.Op != OpBlah {
    v1, v2 = v2, v1
}
if v1.Op != OpBlah {
    break
}
...

This only works when we know that the arguments to a value have different opcodes though (i.e. if in the above example both v1 and v2 were OpBlah values we would need to traverse further through the graph before deciding whether to swap them).

@laboger

This comment has been minimized.

Copy link
Contributor

commented Aug 22, 2019

I did try a change to add something to a rule to indicate that it should not commute its operands. It reduced the size of the rewritePPC64.go file by about 1/3. However I ran into 2 cases that no longer match because the order of operands did not match the order of operands from the source statement, so some kind of reordering would be needed. I do recall that @ianlancetaylor made a comment about putting arguments in canonical order for this situation but I'm not sure on the details of how to implement it. (Force them in the order that they appear in the original source code? Or some other ordering.)

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 23, 2019

Not the order of the original source code, no. The issue arises when trying to match sequences like (plus (plus a b) c). The idea is to pick some canonical order that you use when writing the rules, like, the deeper tree is always first. Then when comparing against the rules, you check the values, and if the deeper tree is second, you swap them before comparing. Of course this can only be used for commutative operators.

@laboger

This comment has been minimized.

Copy link
Contributor

commented Aug 26, 2019

OK, got it. That's what Mike was probably trying to explain above but I didn't get it.

@gopherbot

This comment has been minimized.

Copy link

commented Sep 19, 2019

Change https://golang.org/cl/196498 mentions this issue: cmd/compile: reduce rulegen's output by 200 KiB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.