Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: add bounds information from non-control ops to prove pass #25086

Open
josharian opened this Issue Apr 25, 2018 · 9 comments

Comments

Projects
None yet
4 participants
@josharian
Copy link
Contributor

josharian commented Apr 25, 2018

Probably low priority; I'm not sure.

Some ops have strongly bounded output. For example, Ctz64 is always in [0,64], and Ctz64NonZero is in [0, 63]. Rsh by a constant also provides bounds.

The prove pass seems unaware of this. Does it / should it care? Where/how should it be added?

@rasky

@josharian josharian added this to the Unplanned milestone Apr 25, 2018

@rasky

This comment has been minimized.

Copy link
Member

rasky commented Apr 26, 2018

I would be surprised if it matters much right now.

If you want to try something, I'd try adding it at the end of (*factsTable).update: if v (or w) is OpCtz*, call ft.update recursively adding a relation between v and 64 (in the quick test, just materialize the constant there with parent.NewValue0I). I would do that on top of my CL serie that includes transitive relations, as I can see this information being useful as part of a more complex relation of vars rather than being used directly. That would increase the likeliness of this information being useful.

To check if the code is really helping, check if ft.unsat is true when ft.update returns. If it's true, you're learning something that is helping remove a condition and wasn't known before.

@randall77

This comment has been minimized.

Copy link
Contributor

randall77 commented Apr 26, 2018

Should do this for the And ops as well, they can provide strong bounds info, e.g. a[i & 0xff].
We handle a bunch of cases using rules, but doing it in prove would be more complete.

@rasky rasky self-assigned this Sep 6, 2018

@rasky

This comment has been minimized.

Copy link
Member

rasky commented Sep 6, 2018

I'll mail a CL that adds bounds derived from:

  • OpAnd*: z=x&y ⇒ z<=x && <=y (unsigned)
  • OpZeroExt*: z=zext(x) ⇒ z==x (unsigned)
  • OpSignExt*: z=sext(x) ⇒ z==x (signed)
  • OpRsh*: z=x>>y ⇒ z<=x (signed and unsigned)
  • OpRshU*: z=x>>y ⇒ z<=x && z <= (1<<(64-y))-1 (unsigned)

The result in terms of proved ops can be seen here:
https://gist.github.com/rasky/6f4931f25d32bbd78f2d35044f316599

It's easy to add Ctz following the lead, but it's not used in the compiler or standard library itself, so I'm not sure how to measure the impact.

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Sep 9, 2018

If it is easy, I think it is worth adding the full math/bits intrisincs set, on pain of having to rediscover it later, even if usage is rare; when math/bits is in use people frequently care lots about performance.

I’ll note in passing that we can convert signed ext to zero ext if we can prove non-negative. Not sure if that is very useful in this context, but I can imagine on amd64 that it could eliminate some 32-to-64 extensions by folding them into MOVLs.

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Feb 14, 2019

cc @zdjones

A couple of days ago, I was saddened to see that uint64(1)<<bits.Len8(x) generates masking code for the shift. The reason is that the compiler doesn't know that bits.Len8 is heavily bounded. It was hot enough code that I even bothered to hack a fix into my local compiler. So this does actually happen in real life. :)

To reiterate for Zach from my previous comment: I think it is worth adding the full math/bits intrinsics set. (Plus whatever else you are inspired by from the rest of this issue.)

@zdjones

This comment has been minimized.

Copy link

zdjones commented Feb 14, 2019

I've been tossing around this idea in my head a bit. My intuition is that there is some gains to be had by collecting range information from any number of non-control ops. But I'm also wary of the cost in terms of compute and maintenance complexity. I suppose it'd be a matter of determining which ops justified the cost.

My understanding is that prove, at a high level, currently does three passes of it's own:

  1. Static data range pass - currently only collects facts from StringLen, SliceLen, and SliceCap ops.
  2. Induction variable pass - identify loop induction variables
  3. Control range pass - A DFS walk of the dominator tree, asserting and accumulating live control-op conditions, removing branches when it finds a range contradiction.

Adding non control-ops should be a matter of determining whether the range information is static or not. If static, add it to the data range pass, if not, to the control range pass (which I suppose I'd then have to rename). Dynamic op ranges are also more complicated by SSA values being unordered within their block.

As to the costs of adding more ops, the partially ordered set that holds all the transitive inequality information (ie v1 <= v2), is effectively O(n) where n is the number of values in the poset. There are 2 posets, one each for the signed and unsigned domains. Each value we add to the poset makes querying later facts slower. Its only linear, so it may be negligible, but I get the feeling that everyone's quite sensitive to changes in compiler speed. This, at least, means we want to be sure that the facts being added are actually helping prove make better decisions. @rasky, please let me know if I'm wrong about this.

The small data range pass currently uses a switch to check for the three ops it uses. I assume we'd need to change this lookup in a map[Op]struct, like the one used by the control range pass here. I don't know memory allocation intimately, but I think the map would mean another heap allocation? Or maybe not since it'd never be written to at runtime?

Hope this is clear, let me know what you think.

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Feb 14, 2019

A few thoughts:

(1) math/bits ops are relatively rare, and most likely to be used in performance-sensitive places, so I think they probably offer good bang-for-buck.

(2) It's relatively easy to add them (I hacked in mine quickly), so we can simply measure the toolspeed impact to make a more informed decision. Have you found compilecmp? Btw, you are right that we zealously guard compiler performance.

(3) Another option is to do a non-poset manual look-up. For example, we have a static isNonNegative that we can use to make local inferences on the fly. We could add an isBelow(v *Value, n int) bool that reports whether v is statically known to be in the range [0, n). The implementation of isBelow could check for math/bits ops, rhs by a constant, And with a small constant, mod with a small constant (maybe, have to think about signedness), etc. Then you could query that during shifts and slice/string/array accesses. (There's also a dynamic isNonNegative that can also use other gathered range info; again, we could mimic that structure.)

(4) A switch is likely to outperform a map until the number of ops gets pretty high. But: benchmark. :)

(5) Somewhat unrelated: I noticed while I was there that the code working with StringLen, SliceLen, and SliceCap goes to some pains to detect and reuse a zero value. But we have Func methods to get constant int values that handles re-use already. We could probably use that instead and simplify the code.

@zdjones

This comment has been minimized.

Copy link

zdjones commented Feb 14, 2019

(2) Agreed, not difficult to add these. toolspeed? Saw your mention of compilecmp on the Compiler Developer Guide issue from Daniel. I've been running and comparing compilebench manually. I'd love to hear how you do your compiler comparisons and benchmarking generally, as I'm doing it mostly ad-hoc still.

(3) Good thought. If the inferences are only required locally, there is no need to reach for the poset. I'm familiar with the two isNonNegative's, I just mailed a CL to make the dynamic one use the poset. I'll play around with the isBelow idea a bit. I know of at least one related open issue, BCE for _ = s[i%3].

(4) Didn't know this. Why is that?

(5) Aaah, ConstInt. Looks like an easy win. Maybe I'll just add that into my isNonNegative CL because it touches that code and relies on the zero constant. Do you think it'd be out of place, or require a standalone issue for any reason? Doesn't look like it should change behaviour at all.

@josharian

This comment has been minimized.

Copy link
Contributor Author

josharian commented Feb 14, 2019

(2) compilecmp: https://github.com/josharian/compilecmp. Sorry it doesn't have docs. One day, one day. It's pretty easy to use, I believe running with -h should be all you need. I find -obj to be particularly useful. I usually run once with -obj -n 5 to get a sense for memory usage and object file impact (which is generally very stable) and then with -cpu -n 50 to get a sense for CPU impact.

(3) There are probably cases where it would be used non-locally, too. I'd say if you're up for it to try it both ways and see how much the extra horsepower buys you and at what cost.

(4) switch does binary search, and each search is usually just a plain compare-and-jump, so it is very cheap. It takes a whole lot of mispredicted jumps to add up to the full map machinery (hashing, allocation, etc.).

(5) Either way. I personally prefer to separate refactoring-only changes from substantive changes. It makes reviewing easier and bisection/rollback more fine-grained. But I know other folks tend to like fewer CLs. So do what you think is best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.