New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: avoiding zeroing new allocations when possible #24926

Open
josharian opened this Issue Apr 18, 2018 · 8 comments

Comments

Projects
None yet
4 participants
@josharian
Contributor

josharian commented Apr 18, 2018

Consider:

func f() *int {
  x := new(int)
  *x = 1
  return x
}

The first line gets translated into x = newobject(type-of-int64), which calls mallocgc with a "needszero" argument of true. But it doesn't need zeroing: it has no pointers, and data gets written to the whole thing.

Same holds for:

func f() *[2]int {
  x := new([2]int)
  x[0] = 1
  x[1] = 2
  return x
}

and more interestingly:

func f() *[1024]int {
  x := new([1024]int)
  for i := range x {
    x[i] = i
  }
  return x
}

We could detect such scenarios in the SSA backend and replace the call to newobject to a call to a (newly created) newobjectNoClr, which is identical to newobject except that it passes false to mallocgc for needszero.

Aside: The SSA backend already understands newobject a little. It removes the pointless zero assignment from:

func f() *[2]int {
  x := new([2]int)
  x[0] = 0 // removed
  return x
}

although not from:

func f() *[2]int {
  x := new([2]int)
  x[0] = 1
  x[1] = 0 // not removed, but could be
  return x
}

Converting to newobjectNoClr would probably require a new SSA pass, in which we put values in store order, detect calls to newobject, and then check whether subsequent stores obviate the need for zeroing. And also at the same time eliminate unnecessary zeroing that the existing rewrite rules don't cover.

This new SSA pass might also someday grow to understand and rewrite e.g. calls to memmove and memequal with small constant sizes.

It is not obvious to me that this pass would pull its weight, compilation-time-wise. Needs experimentation. Filing an issue so that I don't forget about it. :)

@TocarIP

This comment has been minimized.

Contributor

TocarIP commented Apr 18, 2018

We already inline memmove for small constant size in generic.rules.
So if we introduce a newobjectNoClr, we could probably also rewrite newobject with rules. which may remove some overhead of the full pass.

@josharian

This comment has been minimized.

Contributor

josharian commented Apr 18, 2018

There are two problems with doing it with rules.

(1) What we need to do is modify the OpStaticCall's Aux value and leave the overall set of values unchanged; that's hard to do with rules. The first half you can fake by having a condition that has a side-effect, but the second half is not currently possible in a clean way. Maybe we could have a magic "origv" RHS value or some such.

(2) For the *[2]int case, you need nested rules to check that both ints are overwritten. For the *[3]int case you need deeper nested rules. You see the problem. :)

@josharian

This comment has been minimized.

Contributor

josharian commented Apr 19, 2018

Hmm. Another possible use for an rtcall pass: slicebytetostring.

Consider something like:

func f(b []byte) int {
	s := string(b[:4])
	return len(s) // or do something else non-escaping with s
}

This ends up compiling to runtime.slicebytetostring(SP+48, b.ptr, 4, b.cap). This is equivalent to memmove(SP+48, b.ptr, 4). And that can be simplified further in turn.

This optimization is hard to do during walk, because b[:4] gets rewritten into an autotmp during order, and we don't rediscover the constant length until SSA.

@josharian

This comment has been minimized.

Contributor

josharian commented Apr 19, 2018

cc @randall77 for opinions in general about this

@dotaheor

This comment has been minimized.

dotaheor commented Apr 19, 2018

please also do this for make: #23905

@randall77

This comment has been minimized.

Contributor

randall77 commented Apr 21, 2018

I'd like to see removal of zeroing if we know the object will be completely overwritten.
It could be implemented using an upgraded deadstore pass. We can check at each allocation whether all its fields are shadowed.
Pointers fields are a bit tricky because of the write barriers. I think we can only do it for completely scalar objects.

@josharian

This comment has been minimized.

Contributor

josharian commented Apr 21, 2018

Re: ptr-containing objects, do you think #24928 would do the trick?

@randall77

This comment has been minimized.

Contributor

randall77 commented Apr 21, 2018

I'm not sure. I think it's probably faster to zero a whole object than to selectively zero fields. Unless there are very large sections which don't have pointers, but that seems rare, and detecting subsequent complete overwriting of such objects also seems hard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment