Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: missed opportunity to inline runtime.memmove #41662

Open
josharian opened this issue Sep 27, 2020 · 8 comments
Open

cmd/compile: missed opportunity to inline runtime.memmove #41662

josharian opened this issue Sep 27, 2020 · 8 comments

Comments

@josharian
Copy link
Contributor

@josharian josharian commented Sep 27, 2020

package p

func f(b []byte, x *[8]byte) {
	_ = b[8]
	copy(b, x[:])
}

This should compile down to a pair of MOVQs, one to load from x and one to write to b. It doesn't; it still contains a call to runtime.memmove. From a quick glance at ssa.html, the problem is that the lowering of runtime.memmove to Move happens during generic.rules, but we haven't detected that the size of the memmove is a constant until after lowering.

To fix this, we could either improve the analysis in the generic stages or add arch-specific runtime.memmove-to-Move lowering.

cc @randall77 @dr2chase @martisch @mundaym

@kels-ng
Copy link
Contributor

@kels-ng kels-ng commented Jan 15, 2021

Is the issue still actual? Let me take it.

@dr2chase
Copy link
Contributor

@dr2chase dr2chase commented Jan 15, 2021

Here's a sample of a similar optimization where small memequals calls are optimized.

@josharian
Copy link
Contributor Author

@josharian josharian commented Jan 15, 2021

@Kels9009 great! Ask here if you have questions. Please be sure to add a codegen test (see test/codegen).

@kels-ng
Copy link
Contributor

@kels-ng kels-ng commented Jan 18, 2021

@dr2chase @josharian OK, thanks for useful information.

@kels-ng
Copy link
Contributor

@kels-ng kels-ng commented Jan 20, 2021

Hello,

I investigated the issue and I would like to get an advice to select the right direction in order to fix it.

First, I tried to make generic solution. Here the SSA dump:
image

As you can see on the late stage of optimization (late opt) the problem is memmove are using Phi operation as a size argument (v50). Phi operation can't be eliminated due to rules require both arguments to be constant (we have Arg + Const):

// basic phi simplifications
...
(Phi (Const64  [c]) (Const64  [c])) => (Const64  [c])
...

But 'memmove' size argument detected as a constant on the next stage (before lowering) - dead auto elim + generic deadcode:
image
In this case memmove can be inlined using current rules without modification.

So, I added another one opt stage after dead auto elim + generic deadcode. As a result memmove was inlined successfully and almost all all.bash tests were passed except few codegen tests (for example writebarrier).
Another solution was to move late opt to position after dead auto elim + generic deadcode. Unfortunately, the result is broken code generation.

So, as a result a have few branches for next steps:

  1. Add arch-specific runtime.memmove-to-Move lowering. I already have the first implementation in my local branch for AMD64 and it works fine. I don't like this solution because it requires to implement this rule for each archs and I think generic rules are more proper place for such optimization. But still, we can use it and ignore my doubts :).
  2. Add another generic opt stage after dead auto elim + generic deadcode and fix all failed tests. I think this is the worst solution, because we can't predict how many hidden problems we will create.
  3. Extract memmove recognition rules from generic to the new file and insert new opt stage after dead auto elim + generic deadcode. This solution move memmove inlining to the later position. In this case we reduce chance to overlook mistake and restructure rules. The problem is current source-code structure doesn't have dedicated files for each types of optimizations (they all are stored in generic and arch-specific rules), and I'm not sure I can decide by myself such change without experts.

What do you think about it? Which one of these directions is right solution?

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Jan 20, 2021

@kels-ng A note for the future: when pasting text, please just paste plain text, not an image. At least for me your images are nearly illegible, whereas everyone using this issue tracker can read plain text. Thanks.

@josharian
Copy link
Contributor Author

@josharian josharian commented Jan 20, 2021

Thanks, @kels-ng. We should do option (1).

New rules are pretty cheap. There are a fair number of rules duplicated in generic and lowered opt, for similar reasons. We don't want to introduce new opt passes without a very compelling need, and moving around existing passes is risky.

@gopherbot
Copy link

@gopherbot gopherbot commented Feb 3, 2021

Change https://golang.org/cl/289151 mentions this issue: cmd/compile: add arch-specific inlining for runtime.memmove

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants