Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: mark CSE'd loads as rematerializable #49332

Open
josharian opened this issue Nov 4, 2021 · 3 comments
Open

cmd/compile: mark CSE'd loads as rematerializable #49332

josharian opened this issue Nov 4, 2021 · 3 comments

Comments

@josharian
Copy link
Contributor

@josharian josharian commented Nov 4, 2021

CSE will unify loads from the same memory state. In computation-heavy code, such the generic code in crypto/md5, these loads can have a long lifetime, leading the loaded values being spilled to the stack. That's silly: We already have the value in memory, so we can reload it from there as needed, without using stack space.

Perhaps we can somehow mark such loads as rematerializable, so that the register allocator knows it can re-issue the load instead of spilling. There are some non-obvious details here, like what to do if we need to recalculate the address of the load, and if the calculation of the address is more expensive than spilling the loaded value.

But we currently generate some very silly code for crypto/md5's generic block.

To see this, patch in https://gist.github.com/josharian/42e66bf022f32da3da0a9b1bdf0a974b into the md5 code and then observe the effect of disabling CSE for loads, for example by adding to cmpVal code like:

if len(v.Args) > 0 && v.Args[len(v.Args)-1].Type.IsMemory() {
    return lt2Cmp(v.ID < w.ID)
}

The generated code goes from a morass of spills and restores to long, straightforward calculation.

There's no way I found to tweak the code to prevent such CSEs. (Issuing irrelevant writes causes other values to get spilled, negating the benefits of preventing CSE.)

cc @randall77

@randall77
Copy link
Contributor

@randall77 randall77 commented Nov 4, 2021

The need to keep the address of the load alive for longer definitely throws a wrench in this. We'd need some analysis to see if it is even worthwhile.

So far we've been maintaining the invariant that if you do:

x := *p
if x != 0 {
   ... use x ...
}

Then we guarantee that at the use of x, it is still nonzero. In the presence of reloading and races, that is no longer true.
That may sound like not much of a problem, because if there's a race we're allowed to do anything. But it isn't that simple - there are races where we can't just say "you lose". For instance, the code in sync/mutex.go:tryLock depends on old := m.state only happening once. if old was reloaded from m.state, bad things happen.

Fun related issue: #48222

Loading

@josharian
Copy link
Contributor Author

@josharian josharian commented Nov 4, 2021

In this case, we would only reload in a situation in which the program already includes a load—we are letting regalloc undo the work of CSE. Alternatively, we could try to guess when CSEing a load will be worse than leaving it.

I can tell you that refusing to CSE anything with a final memory arg is a bad idea. :)

Loading

@randall77
Copy link
Contributor

@randall77 randall77 commented Nov 4, 2021

That's kind of the problem in #48222 - there's no way to tell the difference between a program with 2 loads that were CSEd, in which case it would be ok to split it back up, and a program with a single load which we want to split up into 2 loads, in which case that splitting can be problematic. If you want to split loads back up, we'd have to keep track of its progenitor loads somehow.

I agree that the easier thing might be to not CSE loads in cases where we think it might introduce too much register pressure. But "easier" is a relative term here, both solutions seem hard.

Loading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants