Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: rewrite rulegen to not generate plaintext Go code directly #30810

Open
mvdan opened this Issue Mar 13, 2019 · 5 comments

Comments

Projects
None yet
3 participants
@mvdan
Copy link
Member

mvdan commented Mar 13, 2019

cmd/compile/internal/ssa/gen/rulegen.go is a program that generates the code for all the SSA rewrite rules.

The way it currently works, the code is almost always generated directly. This keeps the code simple, but also limits it quite a bit. For example, it's hard to tell if a variable will be used or not, so declarations like typ := &config.Types tend to be followed by _ = typ.

It also means we tend to write repetitive and verbose code. For example, these two if statements could be collapsed like if X || Y { break }:

if z1.Op != OpAMD64SHRLconst {
        break
}
if z1.AuxInt != 31 {
        break
}

@josharian tried to do this last bit with string handling in https://go-review.googlesource.com/c/go/+/167438, but that turns into spaghetti code that noone can maintain.

There are other ways we could post-process the code to simplify it, too. For example, we could move declarations closer to their uses, and we could remove unused variable declarations instead of using so many _ = foo safeguards.

We could use go/ast for this, but that's probably too complex for our needs. The generated code only uses a tiny subset of Go, and we could have nodes for higher-level concepts like "rewrite this value".

The last step of the program would be to stringify the node tree, and pass it through go/format as usual.

The purpose of this issue is first to gather feedback. If it sounds like a good idea, I'd like to begin work on this soon, hopefully before the freeze is here again. The first incremental step would probably be to rewrite rulegen to use the intermediate node tree, without changing the generated code at all.

/cc @josharian @mundaym @cherrymui @randall77 @dr2chase

@mvdan

This comment has been minimized.

Copy link
Member Author

mvdan commented Mar 13, 2019

I forgot to mention; we should synchronize so that this doesn't cause painful conflicts with other rulegen CLs. Ideally, I wouldn't start work until all outstanding rulegen CLs are merged, and any further rulegen work would be delayed until the rewrite is finished. I should need about a week to rewrite the program.

@josharian

This comment has been minimized.

Copy link
Contributor

josharian commented Mar 13, 2019

As long as the result ends up being reasonably clear and simple (as I believe it can be), this sounds fine to me.

@randall77

This comment has been minimized.

Copy link
Contributor

randall77 commented Mar 13, 2019

I'm not really clear on what the goal of this is.

  1. Make the generated code more readable.
  2. Make the generated code compile faster.
  3. Make the generated code run faster.

I don't think the rewrites you suggested help much with 3, as the compiler does effectively those rewrites internally anyway.

It would probably help 2, but I don't think the compile time of the compiler is much of an issue.

I don't really see much benefit for 1 either. Not many people are reading the generated code.

Or is there something else?

@josharian

This comment has been minimized.

Copy link
Contributor

josharian commented Mar 13, 2019

The motivation for me is that the giant rewrite files make my editor and git client unhappy.

I also actually care a little bit about 1 and 2, both of which I believe this could help with a bit.

@mvdan

This comment has been minimized.

Copy link
Member Author

mvdan commented Mar 14, 2019

For some quick numbers: we have ~300k generated lines of code for the rules. Josh's CL 167438 removes ~20k, or ~6%. I presume we could realistically shave off another 5-10% elsewhere, by removing more unnecessary lines and simplifying statements.

As another data point, I sent CL 167399 yesterday, which shaves off ~4k, or ~1.5%. Its implementation would be simpler if we had an intermediate step; the way I hack this together with the plaintext is by using two layers of buffers, which isn't great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.