New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: allow signed shift counts #19113

Open
griesemer opened this Issue Feb 15, 2017 · 27 comments

Comments

Projects
None yet
@griesemer
Contributor

griesemer commented Feb 15, 2017

Proposal: Permit signed integer values as shift count (language change)

Author: Robert Griesemer

Last updated: 2/15/2017

Abstract

We propose to change the language spec such that the shift count (the rhs operand in a << or >> operation) may be a signed or unsigned (non-constant) integer, or any non-negative constant value that can be represented as an integer.

Background

See Rationale section below.

Proposal

We change the language spec regarding shift operations as follows: In the section on Operators, the text:

The right operand in a shift expression must have unsigned integer type or be an untyped constant that can be converted to unsigned integer type.

to

The right operand in a shift expression must have integer type or be an untyped constant that can be converted to an integer type. If the right operand is constant, it must not be negative.

Furthermore, in the section on Integer operators, we change the text:

The shift operators shift the left operand by the shift count specified by the right operand.

to

The shift operators shift the left operand by the shift count specified by the right operand. A run-time panic occurs if a non-constant shift count is negative.

Rationale

Since Go's inception, shift counts had to be of unsigned integer type (or a non-negative constant representable as an unsigned integer). The idea behind this rule was that a) the spec didn't have to explain what happened for negative values, and b) the implementation didn't have to deal with negative values possibly occurring at run-time.

In retrospect, this may have been a mistake; a sentiment most recently expressed by @rsc here. It turns out that we could actually change the spec in a backward-compatible way in this regard, and this proposal is suggesting that we do that.

There are other language features where the result (len(x)), argument (n in make([]T, n)) or constant (n in [n]T) are known to be never negative or must not be negative, yet we return an int (for len, cap) or permit any integer type. Requiring an unsigned integer type for shift counts is frequently a non-issue because the shift count is constant (see below); but in some cases explicit uint conversions are needed, or the code around the shift is carefully crafted to use unsigned integers. In either case, readability is (slightly) infringed, and more decision making is required when crafting the code (should we use a conversion or type other variables as unsigned integers). Finally, and perhaps most importantly, there may be cases where we simply convert an integer to an unsigned integer and in the process inadvertently make an (invalid) negative value positive in the process, possibly hiding a bug that way (resulting in a shift by a very large number leading to 0).

If we permit any integer type, the existing code will continue to work. Places where we currently use a uint conversion won't need it anymore, and code that is crafted such that we have an unsigned shift count may not require unsigned integers elsewhere.

An investigation of shifts in the current std library and tests (excluding package-external tests) as of 2/15/2017 (this includes the proposed math/bits package) shows that we have:

  • 8081 shifts; or 5457 (68%) right shifts vs 2624 (32%) left shifts
  • 6151 (76%) of those are shifts by a (typed or untyped) constant
  • 1666 (21%) shifts are in tests (_test.go files)
  • 253 (3.1%) shifts use an explicit uint conversion for the shift count

If we only look at shifts outside of test files we have:

  • 6415 shifts; or 4548 (71%) right shifts vs 1867 (29%) left shifts
  • 5759 (90%) of those are shifts by a (typed or untyped) constant
  • 243 (3.8%) shifts use an explicit uint conversion for the shift count

The overwhelming majority (90%) of shifts outside of testing code is by constant values, and none of those turns out to require a conversion. This proposal won't affect that code.

From the remaining 10% of all shifts, 38% (i.e., 3.8% of all shifts) require a uint conversion. That's a significant number. In the remaining 62% of non-constant shifts, the shift count expression must be using a variable that's of unsigned integer type, and often a conversion is required there. A typical example is (archive/tar/strconv.go:77):

func fitsInBase256(n int, x int64) bool {
	var binBits = uint(n-1) * 8    // <<<< uint cast
	return n >= 9 || (x >= -1<<binBits && x < 1<<binBits)
}

In this case, n is an incoming argument and we can't be sure that n > 1 without further analysis of the callers, and thus there's a possibility that n - 1 is negative. The uint conversions hides that error silently.

Another one is (cmd/compile/internal/gc/esc.go:1417):

	shift := uint(bitsPerOutputInTag*(vargen-1) + EscReturnBits)    // <<<< uint cast
	old := (e >> shift) & bitsMaskForTag

Or this one (/Users/gri/go/src/fmt/scan.go:613):

	n := uint(bitSize)    // uint cast
	x := (r << (64 - n)) >> (64 - n)

Many (most?) of the non-constant shifts that don't use an explicit uint conversion in the shift expression itself appear to have a uint conversion before that expression. Most (all?) of these conversions wouldn't be necessary anymore.

The drawback of permitting signed integers where negative values are not permitted is that we need to check for them (negative values) at run-time and (most probably) panic, as we do elsewhere (e.g., for make). This requires a bit more code in the critical path (an estimated two instructions per non-constant shift: a test and a branch), and it probably will cost extra execution time.

However, none of the existing code will incur that cost because all shift counts are unsigned integers at this point, thus the compiler can omit the check. For new code using non-constant integer shift counts, often the compiler may be able to prove that the operand is non-negative. Furthermore, we can always introduce an explicit uint conversion, telling the compiler that we know the value is non-negative. This may be the policy of choice in performance-critical code using shifts (e.g., math/bits).

On the plus side, code that used a uint conversion before won't need it anymore, and will be safer for that since possibly negative values are not silently converted into positive ones.

Compatibility

This is a backward-compatible language change: Any valid program will continue to be valid, and will continue to run exactly the same, without any performance impact. New programs may be using non-constant integer shift counts as right operands in shift operations. Except for fairly small changes to the spec, the compiler, and go/types, (and possibly go/vet and go/lint if they look at shift operations), no other code needs to be changed.

There's a (remote) chance that some code makes use of negative shift count values:

var shift int = <some expression> // use negative value to indicate that we want a 0 result
result := x << uint(shift)

Here, uint(shift) will produce a very large positive value if shift is negative, resulting in x << uint(shift) becoming 0. Because such code required an explicit conversion and will continue to require an explicit conversion, it will continue to work.

Implementation

TBD

Open issues (if applicable)

TBD

@rasky

This comment has been minimized.

Member

rasky commented Feb 15, 2017

Is there any other case in Go where casting a type generates better/worse code? I think it would be surprising to users that a << b can generate better code if b is changed to an unsigned type.

Moreover, this also means that, eventually, there will be more code using int as shift count (as int is more common than uint, which is the basic motivating factor of this proposal) and thus paying a price in runtime cost just because most users won't be aware of this nit (in fact, I hope we don't really want to educate everybody on this; the whole idea is to make it more natural).

@griesemer

This comment has been minimized.

Contributor

griesemer commented Feb 15, 2017

@rasky There are other situations: For instance in x / y, where y is a power of 2 (y = 1<<k), we cannot in general replace the division by a simple shift x >> k because x may be negative and then the result is incorrect (see https://play.golang.org/p/NfP7MUU4Nt for an example).

On the other hand, if we know that x is non-negative (for instance because it's an unsigned int), then we can do the strength reduction and replace the division by a shift (or the remainder operation by an AND instruction) in this case.

Anybody dealing with performance critical code will already be aware of this (must be aware of the above, for instance, in math/big). For anybody else, this is a net-win because it may actually expose bugs that are now silently swept under the rug via an unchecked uint conversion. Finally, I believe that the compiler may be able to prove that a value is non-negative in many situations, a property that is useful for the optimization of other operations.

@minux

This comment has been minimized.

Member

minux commented Feb 15, 2017

@dongweigogo

This comment has been minimized.

dongweigogo commented Feb 16, 2017

uint is the right choice.

@griesemer

This comment has been minimized.

Contributor

griesemer commented Feb 16, 2017

@dongweigogo: Your comment is simply stating a personal preference that is not substantiated. Please provide arguments as to why you believe the status quo is "the right choice".

@robpike

This comment has been minimized.

Contributor

robpike commented Feb 16, 2017

I don't think the problem this solves has been explained well.

The current situation is very clear. Yes, it requires some conversions but in my experience they are not common. For that matter, shifts are not common.

Also, if you permit ints, you allow negative values. Now, you could just panic but many expect a negative shift to reverse the direction of the shift, which is not a good design in practice. People will be confused.

So requiring uint eliminates the consideration completely, which I remember being the main point in Go's decision.

@griesemer

This comment has been minimized.

Contributor

griesemer commented Feb 16, 2017

@robpike As you state, requiring uint eliminates the consideration completely, and which is why I also was a strong supporter of the original Go decision. But I am not so convinced anymore that was the best decision looking at actual data.

As it turns out, much of the code (at least in the std lib) that uses non-constants shifts simply uses an unchecked uint conversion of the shift count, either right at the shift, or on an expression before the shift that then flows into the shift. This may in fact hide errors. It seems better to have those shift counts checked if necessary and avoid the uint cast that's only there to satisfy the shift. Again, this only affects 10% of all shifts; the remaining 90% are constant shifts by untyped constant values.

@rasky

This comment has been minimized.

Member

rasky commented Feb 16, 2017

When you say unchecked conversion, you mean that there is no explicit check that the number is not negative? I don't fully buy this point. If there's some calculation computing a shift amount, your change would allow to catch the subset of bugs which cause the number to be negative (either by really ending up negative, or by becoming negative through overflow); you would still be missing all errors in which the number is just bigger than expected (but not big enough to overflow); so for instance you would still miss most of the off-by-ones, or wrong multiplications by 2/4/8 wrt to the width of the integer being shifted.

I wonder how many hidden bugs of this kind there really are. In fact, shifting by a wrong amount doesn't sound like something that can go unnoticed for a long time, but maybe I just can't picture enough scenarios right now.

I would say that the bug-catching benefit is the one that sounds less convincing to me. The most convincing is making the language simpler for the programmer in the average case (though the runtime cost is hard to swallow -- but I write a lot of performance sensitive code, so I'm sure I'm biased).

@griesemer

This comment has been minimized.

Contributor

griesemer commented Feb 16, 2017

@rasky: Answering your respective points:

  1. By "unchecked conversion" I mean that any uint(x) conversion will succeed no matter whether x actually fits into a uint or not (it may be negative).

  2. Of course this doesn't solve arbitrary program errors... - but there is plenty of code where we subtract a small number (1, k, etc.) from a shift count and then convert to uint. Those are all cases where the count possibly may end up negative before the conversion (see my examples in the proposal).

  3. We don't know how many such hidden bugs exist w/o trying it out, perhaps as an experiment.

  4. Regarding your performance worries: First of all, assuming the std lib is a representative body of code, 90% of all shifts are constant shifts which are not affected. The remaining 10% of all shifts are non-constant shifts. As long as you have a uint shift count (as is always the case now), there is zero performance impact. Even if you move to integer shift counts in your code, often the compiler will be able to prove that the value is always non-negative, in which case no extra check is needed and there will be no performance impact. Finally, in the few cases where there is an extra check, and which are performance-critical, choosing an unsigned shift count will get you back to where you are now at no performance or code cost. With very few exceptions (math/big core routines being an example), the performance difference when switching away from unsigned integer shift counts will be irrelevant to non-existent.

@rsc

This comment has been minimized.

Contributor

rsc commented Feb 16, 2017

I haven't thought a lot about this. It seems like there are three options:

  1. Require uint conversions (as now).
  2. Allow int; shift by negative is shift by >= word length (automatic uint conversion, aka unchecked).
  3. Allow int; shift by negative is panic (aka checked).

We have 1 now; the proposal above is for 3. I think we should evaluate 2 as well. The rationale would be that if the motivation for dropping the conversion is convenience, then make the change only about convenience, not also a semantic change. That would introduce no new overhead or behaviors.

Not advocating for 2, just saying that we should evaluate it along with 3.

@mdempsky

This comment has been minimized.

Member

mdempsky commented Feb 16, 2017

The drawback of permitting signed integers where negative values are not permitted is that we need to check for them (negative values) at run-time and (most probably) panic, as we do elsewhere (e.g., for make).

Another comparison is integer division and modulo, which are implemented with runtime checks for divisor==0. (Though I'm not sure why, since I see there's also code in runtime to handle SIGFPE.)

We also already have to generate extra code for variable shifts, because the machine instructions ignore the higher order bits of the shift register. (I.e., otherwise x << n might behave like x << (n % 64).)

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Feb 16, 2017

We used to rely on SIGFPE to catch division by zero. The explicit test in the generated code was introduced by https://golang.org/cl/16450.

@robpike

This comment has been minimized.

Contributor

robpike commented Feb 16, 2017

It seems to me you are proposing to complicate things substantially for a small unimportant simplicity in programs, coupled with the introduction of surprising semantics for some interpretations of negative shift. I just don't see the point.

@rsc

This comment has been minimized.

Contributor

rsc commented Feb 16, 2017

I think the main point would be to remove the argument that can be made for certain APIs returning uint instead of int in order to be more convenient to expected use in shifts. This happened in math/big and almost happened in math/bits.

It's also just a little less fiddly: if everyone is just blindly converting without thinking about it anyway, there's no point to making people do it. It becomes a tiny "debate with a compiler".

@griesemer

This comment has been minimized.

Contributor

griesemer commented Feb 16, 2017

@robpike: Shifts are not as uncommon as one might think. Again, using the std lib as reference (incl. _test.go files), I see:

  • 8081 shifts (<< or >>)
  • 12573 additions, incl. + signs
  • 1461 make calls with 2 or 3 arguments

As expected, there's more additions, but only about 56% more. In contrast, there are 5.5 times fewer make calls. Yet we permit an int argument for length and capacity there as well, even though we know full well that they always must be non-negative.

Of course, for make the arguments are often results of len and cap which are integers (even though we know they are never negative). They are integers exactly because it makes programming so much easier where code deals with indices and lengths.

The same argument (I believe now) can be made for shifts: In all the cases I've been looking at, it's either necessary to add a conversion, or APIs need to be carefully crafted just the right way using uint's such that we can avoid the conversion. And if the conversion is just written carelessly, why bother in the first place?

@rsc rsc added the Proposal-Hold label Feb 27, 2017

@rsc

This comment has been minimized.

Contributor

rsc commented Feb 27, 2017

Leaving on hold until Go 1.10. (Go 1.9 has type aliases.)

@gopherbot gopherbot added this to the Proposal milestone Mar 20, 2017

@rsc rsc added Go2 and removed Proposal-Hold labels Apr 10, 2017

@rsc rsc changed the title from Proposal: Permit integer values as shift count (language change) to proposal: spec: allow signed shift counts Jun 16, 2017

@fclaude

This comment has been minimized.

Contributor

fclaude commented Nov 30, 2018

I'm not a big fan of the proposed change as it will create this strange behavior that shifts are faster if you use uints, which seems obscure in my opinion. The potential effect I see on this is that people will probably write using ints to try things out, and move to uints once it's ready anyways (because of performance).

I think option 2 proposed by @rsc is best, as it allows you to use ints, but does not present the same problem.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 30, 2018

@fclaude Can you expand on that? Why would shifts using uint be faster?

@fclaude

This comment has been minimized.

Contributor

fclaude commented Nov 30, 2018

@ianlancetaylor in the case of uints you know the value is positive and therefore you don't need a runtime check for it. When the shift is a variable int, you need to check at runtime that the value is non-negative. This extra check would make it in general more desirable (from a performance point of view) to use the uint version.

@mvdan

This comment has been minimized.

Member

mvdan commented Nov 30, 2018

The compiler has a "prove" pass that track an integer's bounds. For example:

if i > 0 {
    n << i // 'i' can't be negative
}

I'm sure that the compiler could reuse this knowledge to avoid checks on most signed integer shifts. It already does that for index expressions, where a negative value is always a panic.

@rasky

This comment has been minimized.

Member

rasky commented Nov 30, 2018

I made the same point: #19113 (comment)

Prove can help but i would be surprised if this would not bite someone every once in a while.

@ianlancetaylor

This comment has been minimized.

Contributor

ianlancetaylor commented Nov 30, 2018

We already need to check at run time whether the shift count is larger than the size of the value being shifted. Checking for a negative value can be done in the same instruction using an unsigned comparison. If we need to distinguish negative and too-large, we can do that after the single comparison. The normal case of a valid value will run at the same speed as before.

@TuomLarsen

This comment has been minimized.

TuomLarsen commented Nov 30, 2018

@ianlancetaylor Sorry for a slight detour but it always interested me, why there is the need to check whether is the shift count larger than the size of value? At least on x86 the CX register (the count) is masked to 5 bits.

@mdempsky

This comment has been minimized.

Member

mdempsky commented Nov 30, 2018

@TuomLarsen Because x := 1; x <<= 64 needs to set x to 0. The masking naturally leaves it set to 1.

@TuomLarsen

This comment has been minimized.

TuomLarsen commented Nov 30, 2018

@mdempsky Stupid me, of course! Thanks for the explanation!

@josharian

This comment has been minimized.

Contributor

josharian commented Nov 30, 2018

Checking for a negative value can be done in the same instruction using an unsigned comparison.

Except that we currently use masking tricks instead of branches to handle large shift counts (at least on AMD64), so while the comparison will be free, there will still be a new conditional branch in the happy path.

That said, I think the performance/readability concern here already exists. For example, many existing shifts can be made faster by masking the shift amount, to aid the compiler in proving that the shift amount is smaller than the word size. And exactly what the compiler can and cannot prove w.r.t. the shift amount varies significantly already. So I'm not sure allowing ints here makes things worse in the normal case. And as always, premature micro-optimization will hurt readability.

@robpike

This comment has been minimized.

Contributor

robpike commented Dec 2, 2018

At least on x86 the CX register (the count) is masked to 5 bits.

That's why. A shift of 87 will be masked to 23, which is not 87 and therefore not what the programmer wrote. In Go, a shift of 87 shifts all the bits away - it's 87 bits of shift.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment