New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: change all int types to panic on wraparound, overflow #19624

Open
bcmills opened this Issue Mar 20, 2017 · 67 comments

Comments

Projects
None yet
@bcmills
Member

bcmills commented Mar 20, 2017

I know this has been discussed before, but I didn't see a specific proposal filed for it yet and I think it's important.

Unexpected integer overflow can lead to serious bugs, including bugs in Go itself. Go's bounds-checking on slices and arrays mitigates some of the harmful effects of overflow, but not all of them. For example, programs that make system calls may pass data structures into the kernel, bypassing Go's usual bounds checks. Programs that marshal data-structures to be sent over the wire (such as protocol buffers) may send silently-corrupted data instead of returning errors as they ought to. And programs that use unsafe to access addresses with offsets are vulnerable to exactly the same overflow bugs as in C.

In my experience, Go programs and libraries are often written assuming "reasonable inputs" and no overflow. For such programs, it would be clearer for overflow to cause a run-time panic (similar to dividing by zero) rather than silently wrapping around. Even in the case where the unintended overflow is subsequently caught by a slice bounds check, reporting the error at the overflowing operation rather than the slice access would make the source of the bug easier to diagnose.

The potential performance impact of this proposal is similar to bounds-checking in general, and likely lower than using arbitrary-precision ints (#19623). The checks can be omitted when the compiler can prove the result is within bounds, any new branches will be trivially predictable (they'll occupy some CPU resources in the branch-predictor but otherwise add little overhead), and in some cases the checks might be able to use bounds-check instructions or other hardware traps.

For the subset of programs and libraries that intentionally make use of wraparound, we could provide one of several alternatives:

  1. "comma, ok" forms or "comma, carry" forms (#6815) that ignore overflow panics, analogous to how the "comma, ok" form of a type-assertion ignores the panic from a mismatched type.
  2. Separate "integer mod 2ⁿ" types (requiring explicit conversions from ordinary integer types), perhaps named along the lines of int32wrap or int32mod.
  3. Implicit wrapping only for unsigned types (uint32 and friends), since they're used for bit-manipulation code more often than the signed equivalents.

Those alternatives could also be used to optimize out the overflow checks in inner-loop code when the programmer has already validated the inputs by some other means.


[Edit: added this section in response to comments.]

Concretely, the proposed changes to the spec are:

Integer operators

For two integer values x and y, the integer quotient q = x / y and remainder r = x % y satisfy the following relationships:

[…]

As an exception to this rule, if the dividend x is the most negative value for the int type of x, the quotient q = x / -1 is equal to x (and r = 0).

[…]

The shift operators shift the left operand by the shift count specified by the right operand. They implement arithmetic shifts if the left operand is a signed integer and logical shifts if it is an unsigned integer. The result of a logical shift is truncated to the bit width of the type: a logical shift never results in overflow. Shifts behave as if the left operand is shifted n times by 1 for a shift count of n. As a result, x << 1 is the same as x*2 and x >> 1 is the same as x/2 but truncated towards negative infinity.

[…]

Integer overflow

If the result of any arithmetic operator or conversion to an integer type cannot be represented in the type, a run-time panic occurs.

An expression consisting of arithmetic operators and / or conversions between integer types used in an assignment or initialization of the special form

v, ok = expr
v, ok := expr
var v, ok = expr
var v, ok T1 = expr

yields an additional untyped boolean value. The value of ok is true if the results of all arithmetic operators and conversions could be represented in their respective types. Otherwise it is false and the value of v is computed as follows. No run-time panic occurs in this case.

For unsigned integer values, the operations +, -, *, and << are computed modulo 2ⁿ upon overflow, where n is the bit width of the unsigned integer's type. Loosely speaking, these unsigned integer operations discard high bits upon overflow, and programs may rely on ``wrap around''.

For signed integers, the operations +, -, *, and << are computed using two's complement arithmetic and truncated to the bit width of the signed integer's type upon overflow. No exception is raised as a result of overflow. A compiler may not optimize code under the assumption that overflow does not occur. For instance, it may not assume that x < x + 1 is always true.

If the dividend x of a quotient or remainder operation is the most negative value for the int type of x, evaluation of x / -1 overflows and its result upon overflow is equal to x. In contrast, evaluation of x % -1 does not overflow and yields a result of 0.

[…]

Conversions between numeric types

For the conversion of non-constant numeric values, the following rules apply:

  1. When converting between integer types, if the value is a signed integer, it is sign extended to implicit infinite precision; otherwise it is zero extended. If the value cannot be represented in the destination type, an overflow occurs; see the section on integer overflow. Upon overflow, the result is truncated to fit in the result type's size. For example, if v := uint16(0x10F0), then w, _ := uint32(int8(v)) results in w == 0xFFFFFFF0.

This proposal is obviously not compatible with Go 1, but I think we should seriously consider it for Go 2.

@dr2chase

This comment has been minimized.

Show comment
Hide comment
@dr2chase

dr2chase Mar 20, 2017

Contributor

This simplifies bounds check elimination, since guarding-against/proving-impossibility-of overflow in indexing calculations can be tricky.

Contributor

dr2chase commented Mar 20, 2017

This simplifies bounds check elimination, since guarding-against/proving-impossibility-of overflow in indexing calculations can be tricky.

@randall77

This comment has been minimized.

Show comment
Hide comment
@randall77

randall77 Mar 20, 2017

Contributor

Just a datapoint, the code in https://golang.org/src/runtime/hash64.go takes advantage of overflow in almost every line of source (for the uintptr type).

Maybe we can't do this until Go 2, but we can do experiments to get some data about it now. We could hack panic on overflow into the compiler today. What happens performance-wise with the go1 benchmarks? How many false positives are there? How many true positives are there?

Contributor

randall77 commented Mar 20, 2017

Just a datapoint, the code in https://golang.org/src/runtime/hash64.go takes advantage of overflow in almost every line of source (for the uintptr type).

Maybe we can't do this until Go 2, but we can do experiments to get some data about it now. We could hack panic on overflow into the compiler today. What happens performance-wise with the go1 benchmarks? How many false positives are there? How many true positives are there?

@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Mar 20, 2017

Contributor

From a programmer's point of view, proposal #19623 is a significant simplification (wrap around and overflow disappear, and the new int type is both simpler in semantics and more powerful in use), while this proposal is a significant complication (wrap around and overflow now panic, and the new int type is both more complex in semantics and more difficult to use).

One of Go's ideas is to reduce the intrinsic complexity of the programming language at hand while at the same time provide mechanisms that are general and powerful and thus enable programmer productivity and increased readability. We need to look at language changes not from a compiler writer's point of view, but from a programmer's productivity point of view. I think this proposal would be a step backward in the philosophy of Go.

I am also not convinced about the claim that overflow checking is "likely much lower than using arbitrary-precision ints": The cost of arbitrary precision ints is there when one actually uses them, otherwise their cost is similar to what needs to be done for bounds/overflow checking (it's the same test, essentially). There's a grey area for ints that use all 64 (or 32) bits as they will become "big ints" internally (at least one bit is usually reserved to implement "tagged ints" efficiently) - but in code that straddles this boundary and if it matters one might be better off using a sized intxx type anyway. Finally, there's a GC cost since the garbage collector will need to do extra work. But all that said, dealing with overflow panic will also require extra code, and that has to be written by each programmer by hand. It is also much harder to verify/read that code.

I'm not in favor of this proposal.

Contributor

griesemer commented Mar 20, 2017

From a programmer's point of view, proposal #19623 is a significant simplification (wrap around and overflow disappear, and the new int type is both simpler in semantics and more powerful in use), while this proposal is a significant complication (wrap around and overflow now panic, and the new int type is both more complex in semantics and more difficult to use).

One of Go's ideas is to reduce the intrinsic complexity of the programming language at hand while at the same time provide mechanisms that are general and powerful and thus enable programmer productivity and increased readability. We need to look at language changes not from a compiler writer's point of view, but from a programmer's productivity point of view. I think this proposal would be a step backward in the philosophy of Go.

I am also not convinced about the claim that overflow checking is "likely much lower than using arbitrary-precision ints": The cost of arbitrary precision ints is there when one actually uses them, otherwise their cost is similar to what needs to be done for bounds/overflow checking (it's the same test, essentially). There's a grey area for ints that use all 64 (or 32) bits as they will become "big ints" internally (at least one bit is usually reserved to implement "tagged ints" efficiently) - but in code that straddles this boundary and if it matters one might be better off using a sized intxx type anyway. Finally, there's a GC cost since the garbage collector will need to do extra work. But all that said, dealing with overflow panic will also require extra code, and that has to be written by each programmer by hand. It is also much harder to verify/read that code.

I'm not in favor of this proposal.

@dr2chase

This comment has been minimized.

Show comment
Hide comment
@dr2chase

dr2chase Mar 20, 2017

Contributor

I think either proposal is an improvement over the status quo. Programmers who assert the nonexistence of overflow will write the same code they do today, so no cognitive overhead there, and I won't have to worry about silent errors if they're wrong.

Contributor

dr2chase commented Mar 20, 2017

I think either proposal is an improvement over the status quo. Programmers who assert the nonexistence of overflow will write the same code they do today, so no cognitive overhead there, and I won't have to worry about silent errors if they're wrong.

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Mar 20, 2017

Member

@griesemer

proposal #19623 is a significant simplification […], while this proposal is a significant complication

I believe that the two proposals are compatible (and even complementary). We could make int and uint arbitrary-precision types (to make default behavior simpler), and also make the sized integer types panic on overflow (to make complex behavior safer).

But all that said, dealing with overflow panic will also require extra code, and that has to be written by each programmer by hand. It is also much harder to verify/read that code.

Could you elaborate on this point? I would expect that most code would either not handle the panic, or use a wrapping integer type explicitly. Even the latter option does not seem like a lot of extra code.

Member

bcmills commented Mar 20, 2017

@griesemer

proposal #19623 is a significant simplification […], while this proposal is a significant complication

I believe that the two proposals are compatible (and even complementary). We could make int and uint arbitrary-precision types (to make default behavior simpler), and also make the sized integer types panic on overflow (to make complex behavior safer).

But all that said, dealing with overflow panic will also require extra code, and that has to be written by each programmer by hand. It is also much harder to verify/read that code.

Could you elaborate on this point? I would expect that most code would either not handle the panic, or use a wrapping integer type explicitly. Even the latter option does not seem like a lot of extra code.

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Mar 20, 2017

Member

@randall77

Just a datapoint, the code in https://golang.org/src/runtime/hash64.go takes advantage of overflow in almost every line of source (for the uintptr type).

That is a nice data point, and I think it nicely illustrates the three options I propose for intentional wraparound. Consider this snippet:

	h := uint64(seed + s*hashkey[0])
tail:
	switch {
	case s == 0:
	case s < 4:
		h ^= uint64(*(*byte)(p))
		h ^= uint64(*(*byte)(add(p, s>>1))) << 8
		h ^= uint64(*(*byte)(add(p, s-1))) << 16
		h = rotl_31(h*m1) * m2
	case s <= 8:
		h ^= uint64(readUnaligned32(p))
		h ^= uint64(readUnaligned32(add(p, s-4))) << 32
		h = rotl_31(h*m1) * m2
  1. With the "comma, ok" option it becomes unwieldy: there are many lines which combine the ^ with shifting or multiplication, and it is the latter which may overflow (so using the "comma, ok" form requires splitting lines):
	h := uint64(seed + s*hashkey[0])
tail:
	switch {
	case s == 0:
	case s < 4:
		h ^= uint64(*(*byte)(p))
		x, _ := uint64(*(*byte)(add(p, s>>1))) << 8
		h ^= x
		x, _ = uint64(*(*byte)(add(p, s-1))) << 16
		h ^= x
		x, _ = h * m1
		h, _ = rotl_31(x) * m2
	case s <= 8:
		h ^= uint64(readUnaligned32(p))
		x, _ := uint64(readUnaligned32(add(p, s-4))) << 32
		h ^= x
		x, _ = h * m1
		h = rotl_31(x) * m2
  1. With explicitly-wrapping types, only the conversions (which are mostly already present in the code) need to change:
	h := uint64mod(uintptrmod(seed) + uintptrmod(s)*hashkey[0])
tail:
	switch {
	case s == 0:
	case s < 4:
		h ^= uint64mod(*(*byte)(p))
		h ^= uint64mod(*(*byte)(add(p, s>>1))) << 8
		h ^= uint64mod(*(*byte)(add(p, s-1))) << 16
		h = rotl_31(h*m1) * m2
	case s <= 8:
		h ^= uint64mod(readUnaligned32(p))
		h ^= uint64mod(readUnaligned32(add(p, s-4))) << 32
		h = rotl_31(h*m1) * m2
  1. With implicit wrapping only for unsigned types, that function wouldn't change at all, although I think that also demonstrates that the safety advantages of detecting overflow diminish with that approach (since uintptr is a fairly common type to see in code using unsafe).
Member

bcmills commented Mar 20, 2017

@randall77

Just a datapoint, the code in https://golang.org/src/runtime/hash64.go takes advantage of overflow in almost every line of source (for the uintptr type).

That is a nice data point, and I think it nicely illustrates the three options I propose for intentional wraparound. Consider this snippet:

	h := uint64(seed + s*hashkey[0])
tail:
	switch {
	case s == 0:
	case s < 4:
		h ^= uint64(*(*byte)(p))
		h ^= uint64(*(*byte)(add(p, s>>1))) << 8
		h ^= uint64(*(*byte)(add(p, s-1))) << 16
		h = rotl_31(h*m1) * m2
	case s <= 8:
		h ^= uint64(readUnaligned32(p))
		h ^= uint64(readUnaligned32(add(p, s-4))) << 32
		h = rotl_31(h*m1) * m2
  1. With the "comma, ok" option it becomes unwieldy: there are many lines which combine the ^ with shifting or multiplication, and it is the latter which may overflow (so using the "comma, ok" form requires splitting lines):
	h := uint64(seed + s*hashkey[0])
tail:
	switch {
	case s == 0:
	case s < 4:
		h ^= uint64(*(*byte)(p))
		x, _ := uint64(*(*byte)(add(p, s>>1))) << 8
		h ^= x
		x, _ = uint64(*(*byte)(add(p, s-1))) << 16
		h ^= x
		x, _ = h * m1
		h, _ = rotl_31(x) * m2
	case s <= 8:
		h ^= uint64(readUnaligned32(p))
		x, _ := uint64(readUnaligned32(add(p, s-4))) << 32
		h ^= x
		x, _ = h * m1
		h = rotl_31(x) * m2
  1. With explicitly-wrapping types, only the conversions (which are mostly already present in the code) need to change:
	h := uint64mod(uintptrmod(seed) + uintptrmod(s)*hashkey[0])
tail:
	switch {
	case s == 0:
	case s < 4:
		h ^= uint64mod(*(*byte)(p))
		h ^= uint64mod(*(*byte)(add(p, s>>1))) << 8
		h ^= uint64mod(*(*byte)(add(p, s-1))) << 16
		h = rotl_31(h*m1) * m2
	case s <= 8:
		h ^= uint64mod(readUnaligned32(p))
		h ^= uint64mod(readUnaligned32(add(p, s-4))) << 32
		h = rotl_31(h*m1) * m2
  1. With implicit wrapping only for unsigned types, that function wouldn't change at all, although I think that also demonstrates that the safety advantages of detecting overflow diminish with that approach (since uintptr is a fairly common type to see in code using unsafe).
@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Mar 20, 2017

Contributor

@bcmills The point of the sized integer types is a) to be able to control actual space consumed when laid out in memory, and b) often enough that they do wrap around. There's tons of code that makes use of that. Almost any left-shift operation would become more complicated if there wasn't silent overflow. Thus, a lot of code would have to deal with overflow.

Contributor

griesemer commented Mar 20, 2017

@bcmills The point of the sized integer types is a) to be able to control actual space consumed when laid out in memory, and b) often enough that they do wrap around. There's tons of code that makes use of that. Almost any left-shift operation would become more complicated if there wasn't silent overflow. Thus, a lot of code would have to deal with overflow.

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Mar 20, 2017

Member

The point of the sized integer types is a) to be able to control actual space consumed when laid out in memory, and b) often enough that they do wrap around.

That's part of my point? At the moment, the sized integer types conflate together (a) and (b). I'm proposing that we make them orthogonal, not eliminate (b).

Almost any left-shift operation would become more complicated if there wasn't silent overflow.

Perhaps that's a good argument for making the left-shift operator not panic? (None of the other bitwise operators can overflow, and that would make the shift operator somewhat less redundant with multiplication.)

Member

bcmills commented Mar 20, 2017

The point of the sized integer types is a) to be able to control actual space consumed when laid out in memory, and b) often enough that they do wrap around.

That's part of my point? At the moment, the sized integer types conflate together (a) and (b). I'm proposing that we make them orthogonal, not eliminate (b).

Almost any left-shift operation would become more complicated if there wasn't silent overflow.

Perhaps that's a good argument for making the left-shift operator not panic? (None of the other bitwise operators can overflow, and that would make the shift operator somewhat less redundant with multiplication.)

@bcmills bcmills changed the title from proposal: Go 2: integer overflow should panic by default to proposal: Go 2: fixed-width integer overflow should panic by default Mar 20, 2017

@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Mar 20, 2017

Contributor

@bcmills Perhaps. My point is that this all adds extra complexity to the language where I am not convinced that we need more. We need less. Most people couldn't care less about overflow and simply want integers that "just work".

Contributor

griesemer commented Mar 20, 2017

@bcmills Perhaps. My point is that this all adds extra complexity to the language where I am not convinced that we need more. We need less. Most people couldn't care less about overflow and simply want integers that "just work".

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Mar 20, 2017

Member

Most people couldn't care less about overflow and simply want integers that "just work".

That's also part of my point? The fact that most people couldn't care less about overflow is what leads to the bugs in the first place. Most people couldn't care less about bounds-checking either, but Go has bounds checks nonetheless.

I agree that 'integers that "just work"' is a desirable goal, and ideally I would like to see this proposal combined with an arbitrary-precision int type. However, it's not obvious to me that that will be sufficient to make a dent in the incidence of overflow bugs in practice.

Member

bcmills commented Mar 20, 2017

Most people couldn't care less about overflow and simply want integers that "just work".

That's also part of my point? The fact that most people couldn't care less about overflow is what leads to the bugs in the first place. Most people couldn't care less about bounds-checking either, but Go has bounds checks nonetheless.

I agree that 'integers that "just work"' is a desirable goal, and ideally I would like to see this proposal combined with an arbitrary-precision int type. However, it's not obvious to me that that will be sufficient to make a dent in the incidence of overflow bugs in practice.

@gopherbot gopherbot added this to the Proposal milestone Mar 20, 2017

@gopherbot gopherbot added the Proposal label Mar 20, 2017

@bronze1man

This comment has been minimized.

Show comment
Hide comment
@bronze1man

bronze1man Mar 21, 2017

Is it possible to add this type into go1.9?
Just add a type called uint64OverflowPanic may be enough to start try this stuff.
I think We may need both overflow panic uint64 and non overflow panic uint64 in go 2.

Is it possible to add this type into go1.9?
Just add a type called uint64OverflowPanic may be enough to start try this stuff.
I think We may need both overflow panic uint64 and non overflow panic uint64 in go 2.

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Mar 21, 2017

Member

@bronze1man
A type called uint64OverflowPanic would be counterproductive. Users who are spelling out verbose type-names are presumably already thinking about overflow, and at that level of verbosity they can just as easily make calls to some library to check the operations.

The point of this proposal is to make detection of overflow the default behavior. That's why it's a language proposal and not just a library.

Member

bcmills commented Mar 21, 2017

@bronze1man
A type called uint64OverflowPanic would be counterproductive. Users who are spelling out verbose type-names are presumably already thinking about overflow, and at that level of verbosity they can just as easily make calls to some library to check the operations.

The point of this proposal is to make detection of overflow the default behavior. That's why it's a language proposal and not just a library.

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Mar 21, 2017

Member

Regarding cost: overflow-checking is normally one instruction (a conditional branch based on the ALU's overflow flag), and for addition and subtraction my understanding is that modern Intel hardware will fuse the arithmetic instruction and the branch into one µop to be executed on the branch unit.

I don't see how we could implement arbitrary-precision integers with any fewer than two additional instructions per op. If we encode tags in the sign bit then every operation needs a shift in, shift out, and mask. If we encode tags in the least-significant bit then every operation needs at least a mask and a branch, and it's not obvious to me that the branch can be fused.

Member

bcmills commented Mar 21, 2017

Regarding cost: overflow-checking is normally one instruction (a conditional branch based on the ALU's overflow flag), and for addition and subtraction my understanding is that modern Intel hardware will fuse the arithmetic instruction and the branch into one µop to be executed on the branch unit.

I don't see how we could implement arbitrary-precision integers with any fewer than two additional instructions per op. If we encode tags in the sign bit then every operation needs a shift in, shift out, and mask. If we encode tags in the least-significant bit then every operation needs at least a mask and a branch, and it's not obvious to me that the branch can be fused.

@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Mar 21, 2017

Contributor

@bcmills Regarding arbitrary-precision integers: It's probably 2-3 additional instructions in the general case. But there's usually no masking needed in the common case if the tag bits are at the bottom (see #19623 (comment)).

Contributor

griesemer commented Mar 21, 2017

@bcmills Regarding arbitrary-precision integers: It's probably 2-3 additional instructions in the general case. But there's usually no masking needed in the common case if the tag bits are at the bottom (see #19623 (comment)).

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Mar 21, 2017

Member

How do you test the tag bits without masking them?

I would expect panic-on-overflow to look like:

    add %rax, %rdx
    jo $overflowPanic

with a single overflowPanic defined for the entire program.

Or perhaps, if we need to save the faulting instruction more precisely, something like:

    add %rax, %rdx
    cmovo $0, $overflowPanic

and let the SIGSEGV handler actually produce the panic (the same way we do for dereferencing nil).

If I'm understanding correctly, an arbitrary-precision operation would look like:

add %rax, %rdx
test %rax, $0x3
jnz $addSlow

and it's not at all obvious to me that we could get by with a small number of variants of addSlow without either overconstraining the register allocator or adding even more instructions (perhaps a cmovnz, consuming an additional register?) to tell addSlow which registers (and widths) are involved.

Member

bcmills commented Mar 21, 2017

How do you test the tag bits without masking them?

I would expect panic-on-overflow to look like:

    add %rax, %rdx
    jo $overflowPanic

with a single overflowPanic defined for the entire program.

Or perhaps, if we need to save the faulting instruction more precisely, something like:

    add %rax, %rdx
    cmovo $0, $overflowPanic

and let the SIGSEGV handler actually produce the panic (the same way we do for dereferencing nil).

If I'm understanding correctly, an arbitrary-precision operation would look like:

add %rax, %rdx
test %rax, $0x3
jnz $addSlow

and it's not at all obvious to me that we could get by with a small number of variants of addSlow without either overconstraining the register allocator or adding even more instructions (perhaps a cmovnz, consuming an additional register?) to tell addSlow which registers (and widths) are involved.

@randall77

This comment has been minimized.

Show comment
Hide comment
@randall77

randall77 Mar 21, 2017

Contributor

@bcmills: on x86 at least, we can use the parity (low bit of op result) condition code.
We only need a single bit of tag. Integers have a low bit of 1, pointers 0. z = x+y translates to:

    SUB   x, $1, a  // remove x's tag bit
    JP    addSlow // x is a pointer
    ADD   a, y, z
    JNP   addSlow // y is a pointer
    JO    addSlow // z overflowed

(Those are 3-operand ADD and SUB, we'd need 2 moves also to do this with 2-operand instructions, I think.)

How to implement addSlow will be a problem. Because we'd have to spill all live registers around any runtime call, we'd need essentially infinite variations. We'd have to generate them as needed and it would probably be a lot of code. We could use faulting instructions, but those are slow and we'd still need a stack + register map for each.

Contributor

randall77 commented Mar 21, 2017

@bcmills: on x86 at least, we can use the parity (low bit of op result) condition code.
We only need a single bit of tag. Integers have a low bit of 1, pointers 0. z = x+y translates to:

    SUB   x, $1, a  // remove x's tag bit
    JP    addSlow // x is a pointer
    ADD   a, y, z
    JNP   addSlow // y is a pointer
    JO    addSlow // z overflowed

(Those are 3-operand ADD and SUB, we'd need 2 moves also to do this with 2-operand instructions, I think.)

How to implement addSlow will be a problem. Because we'd have to spill all live registers around any runtime call, we'd need essentially infinite variations. We'd have to generate them as needed and it would probably be a lot of code. We could use faulting instructions, but those are slow and we'd still need a stack + register map for each.

@randall77

This comment has been minimized.

Show comment
Hide comment
@randall77

randall77 Mar 21, 2017

Contributor

Just a dumb experiment - I hacked the compiler to add the following sequence after very int and uint addition:

   TESTQ x, x
   JLT 3(PC)
   TESTQ x, x
   JLT 1(PC)

It doesn't do anything, just adds some cruft to simulate overflow checks.

It makes the go binary 1% larger. That is way undercounting what it would actually cost, as it doesn't include code needed for addSlow.
Here's the go1 benchmarks results:

name                     old time/op    new time/op    delta
BinaryTree17-8              2.36s ± 3%     2.40s ± 3%     ~     (p=0.095 n=5+5)
Fannkuch11-8                2.96s ± 0%     3.57s ± 0%  +20.74%  (p=0.008 n=5+5)
FmtFprintfEmpty-8          43.7ns ± 2%    44.2ns ± 1%     ~     (p=0.119 n=5+5)
FmtFprintfString-8         68.0ns ± 0%    68.8ns ± 0%   +1.06%  (p=0.008 n=5+5)
FmtFprintfInt-8            75.7ns ± 0%    79.6ns ± 0%   +5.15%  (p=0.008 n=5+5)
FmtFprintfIntInt-8          118ns ± 1%     122ns ± 0%   +3.38%  (p=0.008 n=5+5)
FmtFprintfPrefixedInt-8     159ns ± 0%     195ns ± 1%  +22.77%  (p=0.016 n=4+5)
FmtFprintfFloat-8           206ns ± 1%     226ns ± 1%   +9.30%  (p=0.008 n=5+5)
FmtManyArgs-8               469ns ± 1%     505ns ± 1%   +7.54%  (p=0.008 n=5+5)
GobDecode-8                6.53ms ± 1%    6.53ms ± 1%     ~     (p=1.000 n=5+5)
GobEncode-8                5.05ms ± 1%    5.07ms ± 0%     ~     (p=0.690 n=5+5)
Gzip-8                      213ms ± 1%     259ms ± 0%  +21.60%  (p=0.008 n=5+5)
Gunzip-8                   37.3ms ± 2%    38.3ms ± 2%   +2.57%  (p=0.032 n=5+5)
HTTPClientServer-8         84.1µs ± 0%    85.3µs ± 3%   +1.44%  (p=0.016 n=4+5)
JSONEncode-8               14.5ms ± 1%    15.6ms ± 0%   +8.12%  (p=0.008 n=5+5)
JSONDecode-8               52.0ms ± 1%    56.7ms ± 0%   +9.11%  (p=0.008 n=5+5)
Mandelbrot200-8            3.81ms ± 1%    3.71ms ± 0%   -2.72%  (p=0.008 n=5+5)
GoParse-8                  2.93ms ± 1%    2.97ms ± 0%   +1.50%  (p=0.008 n=5+5)
RegexpMatchEasy0_32-8      69.9ns ± 2%    70.3ns ± 1%     ~     (p=0.460 n=5+5)
RegexpMatchEasy0_1K-8       223ns ± 1%     229ns ± 1%   +2.69%  (p=0.008 n=5+5)
RegexpMatchEasy1_32-8      66.3ns ± 1%    67.3ns ± 1%   +1.60%  (p=0.008 n=5+5)
RegexpMatchEasy1_1K-8       352ns ± 1%     360ns ± 1%   +2.04%  (p=0.008 n=5+5)
RegexpMatchMedium_32-8      104ns ± 1%     105ns ± 0%     ~     (p=0.167 n=5+5)
RegexpMatchMedium_1K-8     33.6µs ± 1%    34.8µs ± 1%   +3.52%  (p=0.008 n=5+5)
RegexpMatchHard_32-8       1.77µs ± 5%    1.90µs ± 4%   +7.41%  (p=0.032 n=5+5)
RegexpMatchHard_1K-8       54.3µs ± 5%    56.6µs ± 4%     ~     (p=0.310 n=5+5)
Revcomp-8                   433ms ± 1%     595ms ± 3%  +37.52%  (p=0.008 n=5+5)
Template-8                 64.9ms ± 1%    64.4ms ± 2%     ~     (p=0.222 n=5+5)
TimeParse-8                 305ns ± 0%     332ns ± 0%   +8.93%  (p=0.008 n=5+5)
TimeFormat-8                320ns ± 0%     347ns ± 1%   +8.57%  (p=0.008 n=5+5)

A few are hurt quite a bit, but a surprising number don't care so much.

Contributor

randall77 commented Mar 21, 2017

Just a dumb experiment - I hacked the compiler to add the following sequence after very int and uint addition:

   TESTQ x, x
   JLT 3(PC)
   TESTQ x, x
   JLT 1(PC)

It doesn't do anything, just adds some cruft to simulate overflow checks.

It makes the go binary 1% larger. That is way undercounting what it would actually cost, as it doesn't include code needed for addSlow.
Here's the go1 benchmarks results:

name                     old time/op    new time/op    delta
BinaryTree17-8              2.36s ± 3%     2.40s ± 3%     ~     (p=0.095 n=5+5)
Fannkuch11-8                2.96s ± 0%     3.57s ± 0%  +20.74%  (p=0.008 n=5+5)
FmtFprintfEmpty-8          43.7ns ± 2%    44.2ns ± 1%     ~     (p=0.119 n=5+5)
FmtFprintfString-8         68.0ns ± 0%    68.8ns ± 0%   +1.06%  (p=0.008 n=5+5)
FmtFprintfInt-8            75.7ns ± 0%    79.6ns ± 0%   +5.15%  (p=0.008 n=5+5)
FmtFprintfIntInt-8          118ns ± 1%     122ns ± 0%   +3.38%  (p=0.008 n=5+5)
FmtFprintfPrefixedInt-8     159ns ± 0%     195ns ± 1%  +22.77%  (p=0.016 n=4+5)
FmtFprintfFloat-8           206ns ± 1%     226ns ± 1%   +9.30%  (p=0.008 n=5+5)
FmtManyArgs-8               469ns ± 1%     505ns ± 1%   +7.54%  (p=0.008 n=5+5)
GobDecode-8                6.53ms ± 1%    6.53ms ± 1%     ~     (p=1.000 n=5+5)
GobEncode-8                5.05ms ± 1%    5.07ms ± 0%     ~     (p=0.690 n=5+5)
Gzip-8                      213ms ± 1%     259ms ± 0%  +21.60%  (p=0.008 n=5+5)
Gunzip-8                   37.3ms ± 2%    38.3ms ± 2%   +2.57%  (p=0.032 n=5+5)
HTTPClientServer-8         84.1µs ± 0%    85.3µs ± 3%   +1.44%  (p=0.016 n=4+5)
JSONEncode-8               14.5ms ± 1%    15.6ms ± 0%   +8.12%  (p=0.008 n=5+5)
JSONDecode-8               52.0ms ± 1%    56.7ms ± 0%   +9.11%  (p=0.008 n=5+5)
Mandelbrot200-8            3.81ms ± 1%    3.71ms ± 0%   -2.72%  (p=0.008 n=5+5)
GoParse-8                  2.93ms ± 1%    2.97ms ± 0%   +1.50%  (p=0.008 n=5+5)
RegexpMatchEasy0_32-8      69.9ns ± 2%    70.3ns ± 1%     ~     (p=0.460 n=5+5)
RegexpMatchEasy0_1K-8       223ns ± 1%     229ns ± 1%   +2.69%  (p=0.008 n=5+5)
RegexpMatchEasy1_32-8      66.3ns ± 1%    67.3ns ± 1%   +1.60%  (p=0.008 n=5+5)
RegexpMatchEasy1_1K-8       352ns ± 1%     360ns ± 1%   +2.04%  (p=0.008 n=5+5)
RegexpMatchMedium_32-8      104ns ± 1%     105ns ± 0%     ~     (p=0.167 n=5+5)
RegexpMatchMedium_1K-8     33.6µs ± 1%    34.8µs ± 1%   +3.52%  (p=0.008 n=5+5)
RegexpMatchHard_32-8       1.77µs ± 5%    1.90µs ± 4%   +7.41%  (p=0.032 n=5+5)
RegexpMatchHard_1K-8       54.3µs ± 5%    56.6µs ± 4%     ~     (p=0.310 n=5+5)
Revcomp-8                   433ms ± 1%     595ms ± 3%  +37.52%  (p=0.008 n=5+5)
Template-8                 64.9ms ± 1%    64.4ms ± 2%     ~     (p=0.222 n=5+5)
TimeParse-8                 305ns ± 0%     332ns ± 0%   +8.93%  (p=0.008 n=5+5)
TimeFormat-8                320ns ± 0%     347ns ± 1%   +8.57%  (p=0.008 n=5+5)

A few are hurt quite a bit, but a surprising number don't care so much.

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Mar 21, 2017

Member

That still seems like a lot of extra instructions compared to what is needed for panic-on-overflow.

At any rate, my performance point on this issue is more "the cost of overflow checks is fairly low", with arbitrary-length integers as a reference point for an integer cost that a lot of folks believe to be reasonable.

Member

bcmills commented Mar 21, 2017

That still seems like a lot of extra instructions compared to what is needed for panic-on-overflow.

At any rate, my performance point on this issue is more "the cost of overflow checks is fairly low", with arbitrary-length integers as a reference point for an integer cost that a lot of folks believe to be reasonable.

@griesemer

This comment has been minimized.

Show comment
Hide comment
@griesemer

griesemer Mar 21, 2017

Contributor

@randall77 The problem with using only one bit is that you cannot optimistically add two tagged ints. The problem with using a 1 (instead of a 0) as tag bit is that one has to correct for it each time. Having a 1-offset pointer is trivial to correct when accessing through pointer-indirection. Again, using the scheme I have outlined before, addition is (dst on the right):

ADD x, y, z
JO overflow
TEST $3, z
JNZ bigint

If both x and y are tagged ints, they have a 00 tag (least significant 2 bits). The result is already correct. If one or both of them have a 01 tag, the result tags are going to be 01 or 10 - either way its not 00 after masking. In that case we need to run the slow routine. This is 4 instructions per addition in the best case.

Contributor

griesemer commented Mar 21, 2017

@randall77 The problem with using only one bit is that you cannot optimistically add two tagged ints. The problem with using a 1 (instead of a 0) as tag bit is that one has to correct for it each time. Having a 1-offset pointer is trivial to correct when accessing through pointer-indirection. Again, using the scheme I have outlined before, addition is (dst on the right):

ADD x, y, z
JO overflow
TEST $3, z
JNZ bigint

If both x and y are tagged ints, they have a 00 tag (least significant 2 bits). The result is already correct. If one or both of them have a 01 tag, the result tags are going to be 01 or 10 - either way its not 00 after masking. In that case we need to run the slow routine. This is 4 instructions per addition in the best case.

@randall77

This comment has been minimized.

Show comment
Hide comment
@randall77

randall77 Mar 21, 2017

Contributor

@griesemer , yes, I guess you're trading a bit in the representation for one less instruction.
Also to your benefit, your scheme doesn't use parity. It exists on x86 but not on ARM, for example.

Contributor

randall77 commented Mar 21, 2017

@griesemer , yes, I guess you're trading a bit in the representation for one less instruction.
Also to your benefit, your scheme doesn't use parity. It exists on x86 but not on ARM, for example.

@cherrymui

This comment has been minimized.

Show comment
Hide comment
@cherrymui

cherrymui Mar 21, 2017

Contributor

With panic-on-overflow, the compiler even cannot fold (x + 1) - 1?

Contributor

cherrymui commented Mar 21, 2017

With panic-on-overflow, the compiler even cannot fold (x + 1) - 1?

@bronze1man

This comment has been minimized.

Show comment
Hide comment
@bronze1man

bronze1man Mar 21, 2017

@randall77
Does it mean that I have to buy 8.12% more CPUs for an useless overflow check in JSONEncode?

As I assume that JSONEncode do not have int overflow bug right now,and JSONEncode/JSONDecode uses 60% of my server's CPUs.😁

I hope golang can do better than that.

JSONEncode-8 14.5ms ± 1% 15.6ms ± 0% +8.12% (p=0.008 n=5+5)

bronze1man commented Mar 21, 2017

@randall77
Does it mean that I have to buy 8.12% more CPUs for an useless overflow check in JSONEncode?

As I assume that JSONEncode do not have int overflow bug right now,and JSONEncode/JSONDecode uses 60% of my server's CPUs.😁

I hope golang can do better than that.

JSONEncode-8 14.5ms ± 1% 15.6ms ± 0% +8.12% (p=0.008 n=5+5)

@ianlancetaylor

This comment has been minimized.

Show comment
Hide comment
@ianlancetaylor

ianlancetaylor Mar 21, 2017

Contributor

Even without panic on overflow we can't fold x < x + 1, which is important because it means we can't determine the number of iterations for loops like for i := j; i < j + 10; i++, which means we can't unroll the loop without run time checks. I don't think the lack of folding opportunities is going to be significant, at least not compared to the run time overhead of overflow checks.

Contributor

ianlancetaylor commented Mar 21, 2017

Even without panic on overflow we can't fold x < x + 1, which is important because it means we can't determine the number of iterations for loops like for i := j; i < j + 10; i++, which means we can't unroll the loop without run time checks. I don't think the lack of folding opportunities is going to be significant, at least not compared to the run time overhead of overflow checks.

@ianlancetaylor

This comment has been minimized.

Show comment
Hide comment
@ianlancetaylor

ianlancetaylor Mar 21, 2017

Contributor

@bronze1man Do you have reason to think that the cost of JSON encoding is dominated by integer arithmetic? On a modern CPU many algorithms are dominated by the time it takes to load values from memory, and that should not change under this proposal.

Contributor

ianlancetaylor commented Mar 21, 2017

@bronze1man Do you have reason to think that the cost of JSON encoding is dominated by integer arithmetic? On a modern CPU many algorithms are dominated by the time it takes to load values from memory, and that should not change under this proposal.

@bronze1man

This comment has been minimized.

Show comment
Hide comment
@bronze1man

bronze1man Mar 21, 2017

@ianlancetaylor
Sorry for those careless comments.

I am afraid of @randall77 test result. But I know that simulation is not a release version. That cost of overflow checks may be cut by some compiler magics.Maybe I just too subjective.

Do you have reason to think that the cost of JSON encoding is dominated by integer arithmetic?
The reason is @randall77 test result.It looks terrible to me when i saw it for the first time.

Thanks.

On a modern CPU many algorithms are dominated by the time it takes to load values from memory

That is interesting information.
And from my test result, I found that alloc memory to heap is the main CPU killer.Less alloc less CPU usage.
But those stuff may not relative to overflow checks.

@ianlancetaylor
Sorry for those careless comments.

I am afraid of @randall77 test result. But I know that simulation is not a release version. That cost of overflow checks may be cut by some compiler magics.Maybe I just too subjective.

Do you have reason to think that the cost of JSON encoding is dominated by integer arithmetic?
The reason is @randall77 test result.It looks terrible to me when i saw it for the first time.

Thanks.

On a modern CPU many algorithms are dominated by the time it takes to load values from memory

That is interesting information.
And from my test result, I found that alloc memory to heap is the main CPU killer.Less alloc less CPU usage.
But those stuff may not relative to overflow checks.

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Mar 21, 2017

Member

@cherrymui

With panic-on-overflow, the compiler even cannot fold (x + 1) - 1?

Sure it can: just needs to be able to prove by some other means that x+1 doesn't overflow.

Note that many of the "traditional" compiler constant-folding optimizations are for loop bounds, which would be mostly unaffected by this proposal: Go loops usually use range, and the compiler already knows that the range code it generates cannot overflow.

Member

bcmills commented Mar 21, 2017

@cherrymui

With panic-on-overflow, the compiler even cannot fold (x + 1) - 1?

Sure it can: just needs to be able to prove by some other means that x+1 doesn't overflow.

Note that many of the "traditional" compiler constant-folding optimizations are for loop bounds, which would be mostly unaffected by this proposal: Go loops usually use range, and the compiler already knows that the range code it generates cannot overflow.

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Mar 21, 2017

Member

@bronze1man
Note that the benchmark @randall77 ran more closely resembles the code for arbitrary-length integers rather than panic-on-overflow. I expect the latter to be significantly less expensive due to its amenability to macro-op fusion, though of course that would also need to be measured.

Also note that microbenchmarks are not macrobenchmarks. Large regressions in a single library may represent only small regressions in an overall binary, depending on what fraction of that binary is spent in the library. And individual libraries can be optimized by hoisting out the overflow checks and using unchecked variants of the types.

Member

bcmills commented Mar 21, 2017

@bronze1man
Note that the benchmark @randall77 ran more closely resembles the code for arbitrary-length integers rather than panic-on-overflow. I expect the latter to be significantly less expensive due to its amenability to macro-op fusion, though of course that would also need to be measured.

Also note that microbenchmarks are not macrobenchmarks. Large regressions in a single library may represent only small regressions in an overall binary, depending on what fraction of that binary is spent in the library. And individual libraries can be optimized by hoisting out the overflow checks and using unchecked variants of the types.

@randall77

This comment has been minimized.

Show comment
Hide comment
@randall77

randall77 Mar 21, 2017

Contributor

Yes, the test I did was more of a test for bigint than panicing overflow. The overhead for panicing would be smaller (~~1/4 as much?).
Also keep in mind that the go1 benchmark set is a pretty bad benchmark set - lots of the benchmarks in there have very small loops that account for 90+% of their runtime. That's not a pattern we typically see in larger programs.

Contributor

randall77 commented Mar 21, 2017

Yes, the test I did was more of a test for bigint than panicing overflow. The overhead for panicing would be smaller (~~1/4 as much?).
Also keep in mind that the go1 benchmark set is a pretty bad benchmark set - lots of the benchmarks in there have very small loops that account for 90+% of their runtime. That's not a pattern we typically see in larger programs.

@nnemkin

This comment has been minimized.

Show comment
Hide comment
@nnemkin

nnemkin Mar 22, 2017

Wrap-around arithmetic is necessary for crypto and other algorithms, but panic on overflow is very useful for security and correctness (i.e. not continuing silently with invalid data).

The balance can be achieved if only signed arithmetic panics on overflow (option 3 in the proposal). App logic dealing with quantities should use the default type int or appropriately large type for the task (int64, int128). Binary math algorithms can happily continue to use uint, uint64, uintptr etc.

The cost of signed overflow checking is minor. It is also a lot easier to optimize statically and locally, compared to arbitrary precision math proposal. There's no fallback path and overflow branches can be inserted late, all targeting the same panic call (one per function, to distinguish them).
Single JO after every ADD + a dozen dead bytes per function should give a pretty realistic performance impact estimate.

FWIW both GCC and Clang have -ftrapv, and Clang also has -fsanitize=signed-integer-overflow.

nnemkin commented Mar 22, 2017

Wrap-around arithmetic is necessary for crypto and other algorithms, but panic on overflow is very useful for security and correctness (i.e. not continuing silently with invalid data).

The balance can be achieved if only signed arithmetic panics on overflow (option 3 in the proposal). App logic dealing with quantities should use the default type int or appropriately large type for the task (int64, int128). Binary math algorithms can happily continue to use uint, uint64, uintptr etc.

The cost of signed overflow checking is minor. It is also a lot easier to optimize statically and locally, compared to arbitrary precision math proposal. There's no fallback path and overflow branches can be inserted late, all targeting the same panic call (one per function, to distinguish them).
Single JO after every ADD + a dozen dead bytes per function should give a pretty realistic performance impact estimate.

FWIW both GCC and Clang have -ftrapv, and Clang also has -fsanitize=signed-integer-overflow.

@bunsim

This comment has been minimized.

Show comment
Hide comment
@bunsim

bunsim Mar 25, 2017

Signed overflow panicking is even somewhat expected behavior for somebody coming from languages like C where it is literally undefined. I also don't expect any real code to rely on signed integers overflowing; it is almost always a bug.

Unsigned integers (at least ones like uint32, not sure about uint) should definitely not overflow, as doing bit operations on them assuming overflow is very common, and the way they overflow is much easier to understand.

bunsim commented Mar 25, 2017

Signed overflow panicking is even somewhat expected behavior for somebody coming from languages like C where it is literally undefined. I also don't expect any real code to rely on signed integers overflowing; it is almost always a bug.

Unsigned integers (at least ones like uint32, not sure about uint) should definitely not overflow, as doing bit operations on them assuming overflow is very common, and the way they overflow is much easier to understand.

@cznic

This comment has been minimized.

Show comment
Hide comment
@cznic

cznic Mar 25, 2017

Contributor

I also don't expect any real code to rely on signed integers overflowing; it is almost always a bug.

Quite possibly, but not necessarily. Last time I checked, signed and unsigned add, for example, is the very same CPU instruction on at least some architectures - if the sizes od the operand match, which in Go they always do. With C integer promoting rules it's a different story. (eg. unsigned short promoted to signed int is a nice trap for some opearations, like -x.)

Contributor

cznic commented Mar 25, 2017

I also don't expect any real code to rely on signed integers overflowing; it is almost always a bug.

Quite possibly, but not necessarily. Last time I checked, signed and unsigned add, for example, is the very same CPU instruction on at least some architectures - if the sizes od the operand match, which in Go they always do. With C integer promoting rules it's a different story. (eg. unsigned short promoted to signed int is a nice trap for some opearations, like -x.)

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Mar 25, 2017

Member

@cznic Well-defined behavior (from the perspective of the language or the platform) is not necessarily intentional behavior (from the perspective of the programmer).

Member

bcmills commented Mar 25, 2017

@cznic Well-defined behavior (from the perspective of the language or the platform) is not necessarily intentional behavior (from the perspective of the programmer).

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Mar 25, 2017

Member

@griesemer

We need to look at language changes not from a compiler writer's point of view, but from a programmer's productivity point of view.

I want to come back to this point: I agree with it and I agree that it's important. In fact, it is essentially the whole motivation behind this proposal.

Implicit wraparound favors the compiler-writer over the programmer. Let me explain.


The things that implicit wraparound makes easier are:

Implementation of hash functions.

Hash functions ignore overflow because they're preserving the overflowed information in the other bits of the computed hash.

We've got lots of hash functions in the standard library (in hash/ and crypto/), but how many have you seen in user code? Even when programmers implement their own hash-based data structures (currently very difficult due to #15292), they should generally be using an existing "good" hash function rather than trying to "roll their own". Fast, low-collision hash functions are notoriously difficult to get right.

So "writing hash functions" is usually the complier-writer's task, not the programmer's task.

Implementation of multi-word integer types

Multi-word integer implementations (such as big.Int) ignore overflow because they're preserving the overflowed bits in the next word.

I believe that you yourself are the author of much of math/big. So implicit overflow makes your job a bit easier — but how many application programmers do you think are writing that sort of code, especially given the existence of math/big?

Compiler code generation

Most CPUs running Go code today implement two's-complement integer arithmetic and don't trap on overflow. So if you define the language to implement two's-complement arithmetic that ignores overflow, it's easy to generate efficient code in the compiler: you don't have to generate or optimize the code to check for overflows, and you don't have to worry about writing optimizations to reduce the need for overflow checks.


The things that implicit wraparound makes harder are:

Programs that count things reliably

If you're writing programs that deal with 32-bit timestamps, you want to avoid Y2038 bugs, such as silently adding a future calendar entry at a point in the distant past. If you're writing programs that display view counts, you want to avoid treating popular items as negatively popular. If you're writing programs that tally votes or compute financial statements, you'd really rather detect when you've hit the limits of your program rather than, say, accidentally treat a $3B expenditure as a credit. If you're serving user data, you'd really rather detect invalid document IDs than accidentally serve unrelated documents with the same ID mod 2^32.

Programs that marshal and unmarshal large data structures

There have been multiple protobuf bugs due to overflows in the encode/decode paths. For example, see google/protobuf#1731 and google/protobuf#157 in the public repo — the latter being a potential DoS attack. Even in the cases in which Go's slice bounds checks protect against that sort of bug, it's not good for Go marshalers to generate payloads that might potentially corrupt or crash downstream clients or servers.


I don't know about you, but to me, "counting things" and "marshaling data" seem much more representative of "programmer" use-cases. So I think you've got it backwards: the status quo favors compiler writers, and detecting overflow by default would push the balance more toward general programmers.

Member

bcmills commented Mar 25, 2017

@griesemer

We need to look at language changes not from a compiler writer's point of view, but from a programmer's productivity point of view.

I want to come back to this point: I agree with it and I agree that it's important. In fact, it is essentially the whole motivation behind this proposal.

Implicit wraparound favors the compiler-writer over the programmer. Let me explain.


The things that implicit wraparound makes easier are:

Implementation of hash functions.

Hash functions ignore overflow because they're preserving the overflowed information in the other bits of the computed hash.

We've got lots of hash functions in the standard library (in hash/ and crypto/), but how many have you seen in user code? Even when programmers implement their own hash-based data structures (currently very difficult due to #15292), they should generally be using an existing "good" hash function rather than trying to "roll their own". Fast, low-collision hash functions are notoriously difficult to get right.

So "writing hash functions" is usually the complier-writer's task, not the programmer's task.

Implementation of multi-word integer types

Multi-word integer implementations (such as big.Int) ignore overflow because they're preserving the overflowed bits in the next word.

I believe that you yourself are the author of much of math/big. So implicit overflow makes your job a bit easier — but how many application programmers do you think are writing that sort of code, especially given the existence of math/big?

Compiler code generation

Most CPUs running Go code today implement two's-complement integer arithmetic and don't trap on overflow. So if you define the language to implement two's-complement arithmetic that ignores overflow, it's easy to generate efficient code in the compiler: you don't have to generate or optimize the code to check for overflows, and you don't have to worry about writing optimizations to reduce the need for overflow checks.


The things that implicit wraparound makes harder are:

Programs that count things reliably

If you're writing programs that deal with 32-bit timestamps, you want to avoid Y2038 bugs, such as silently adding a future calendar entry at a point in the distant past. If you're writing programs that display view counts, you want to avoid treating popular items as negatively popular. If you're writing programs that tally votes or compute financial statements, you'd really rather detect when you've hit the limits of your program rather than, say, accidentally treat a $3B expenditure as a credit. If you're serving user data, you'd really rather detect invalid document IDs than accidentally serve unrelated documents with the same ID mod 2^32.

Programs that marshal and unmarshal large data structures

There have been multiple protobuf bugs due to overflows in the encode/decode paths. For example, see google/protobuf#1731 and google/protobuf#157 in the public repo — the latter being a potential DoS attack. Even in the cases in which Go's slice bounds checks protect against that sort of bug, it's not good for Go marshalers to generate payloads that might potentially corrupt or crash downstream clients or servers.


I don't know about you, but to me, "counting things" and "marshaling data" seem much more representative of "programmer" use-cases. So I think you've got it backwards: the status quo favors compiler writers, and detecting overflow by default would push the balance more toward general programmers.

@ianlancetaylor

This comment has been minimized.

Show comment
Hide comment
@ianlancetaylor

ianlancetaylor Mar 27, 2017

Contributor

I note that for every case in Go that currently causes a run time panic there is a straightforward way to avoid the panic: write p != nil or i < len(s) or divisor != 0 and so forth. What is the straightforward way to avoid arithmetic overflow in the type int, for the six operations that can overflow (I'm not counting ++ or --)?

Separate question: should we treat x >> y as an overflow if the value of y is greater than the number of bits in the type of x?

Contributor

ianlancetaylor commented Mar 27, 2017

I note that for every case in Go that currently causes a run time panic there is a straightforward way to avoid the panic: write p != nil or i < len(s) or divisor != 0 and so forth. What is the straightforward way to avoid arithmetic overflow in the type int, for the six operations that can overflow (I'm not counting ++ or --)?

Separate question: should we treat x >> y as an overflow if the value of y is greater than the number of bits in the type of x?

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Apr 14, 2017

Member

See also #13876.

Member

bcmills commented Apr 14, 2017

See also #13876.

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Jun 13, 2017

Member

Another example: #20235

Member

bcmills commented Jun 13, 2017

Another example: #20235

@rsc rsc changed the title from proposal: Go 2: fixed-width integer overflow should panic by default to proposal: spec: change all int types to panic on wraparound, overflow Jun 16, 2017

@nathany

This comment has been minimized.

Show comment
Hide comment
@nathany

nathany Jun 19, 2017

Contributor

When I was researching type conversions and overflows for my Go book, the case I came up with was a rocket exploding due to an unhandled software exception (panic):

In 1996, the unmanned Arianne 5 rocket veered off its flight path, broke up, and exploded just 40 seconds after launch. The reported cause was a type conversion error from a float64 to an int16 with a value that exceeded 32,767 - the maximum value an int16 can hold. The unhandled failure left the flight control system without orientation data, causing it to veer off course, break apart, and ultimately self-destruct.
...
The Ada language used for the Arianne 5 behaves differently. The type conversion from float64 to int16 with an out-of-range value caused a software exception. According to the report, this particular calculation was only meaningful prior to lift-off, so Go’s approach may have been better in this instance, but usually incorrect data is best to avoid.

- From Lesson 10, Get Programming with Go

So I'm not sure if I like the idea of panic by default. 🤔

My take is:

  • Arbitrary precision ints (#19623) are simple to work with -- even for new programmers -- and they interoperate nicely with arbitrary precision constants.
  • For int8, etc.:
    • It is sometimes desirable to allow them to wrap around
    • There are exceptions to the exceptions, such as shifts << >> or intentionally truncating an int.
    • Exposing the overflow (#6815) of the previous operation could be a useful, though verbose.
    • If there still is a natural "word" type (intn? present day int), a Min/MaxNaturalInt variable provided by the runtime might help in some situations, though "did the last operation overflow" seems like a more useful check.
Contributor

nathany commented Jun 19, 2017

When I was researching type conversions and overflows for my Go book, the case I came up with was a rocket exploding due to an unhandled software exception (panic):

In 1996, the unmanned Arianne 5 rocket veered off its flight path, broke up, and exploded just 40 seconds after launch. The reported cause was a type conversion error from a float64 to an int16 with a value that exceeded 32,767 - the maximum value an int16 can hold. The unhandled failure left the flight control system without orientation data, causing it to veer off course, break apart, and ultimately self-destruct.
...
The Ada language used for the Arianne 5 behaves differently. The type conversion from float64 to int16 with an out-of-range value caused a software exception. According to the report, this particular calculation was only meaningful prior to lift-off, so Go’s approach may have been better in this instance, but usually incorrect data is best to avoid.

- From Lesson 10, Get Programming with Go

So I'm not sure if I like the idea of panic by default. 🤔

My take is:

  • Arbitrary precision ints (#19623) are simple to work with -- even for new programmers -- and they interoperate nicely with arbitrary precision constants.
  • For int8, etc.:
    • It is sometimes desirable to allow them to wrap around
    • There are exceptions to the exceptions, such as shifts << >> or intentionally truncating an int.
    • Exposing the overflow (#6815) of the previous operation could be a useful, though verbose.
    • If there still is a natural "word" type (intn? present day int), a Min/MaxNaturalInt variable provided by the runtime might help in some situations, though "did the last operation overflow" seems like a more useful check.
@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Jun 20, 2017

Member

@nathany

The Ariane 5 explosion is a nice example, but note that it involved many layers of failures:

  1. The value overflowed.
  2. The overflow resulted in an exception.
  3. The exception was not handled.
  4. The unhandled exception in a non-flight-critical task terminated flight-critical tasks on the computer.
  5. …and the major part you're leaving out:
    The overflow did not occur during testing.

The Ariane 501 Failure Report has a lot of interesting analysis, but I want to call out the second recommendation in particular:

R2 […] [T]est […] as much real equipment as technically feasible, inject realistic input data, and perform complete, closed-loop, system testing. […]

With silent overflow, it is easy to miss overflow errors even with fairly comprehensive tests: comparisons of overflowed values may work out anyway during tests because they happen to wrap around to values that satisfy some of the same invariants. (For example, the overflow in #20687 was not caught in previous testing because the inputs happen to overflow to nonnegative values.)

On the other hand, a panic is hard to miss. It can be accidentally or intentionally recovered (e.g. by the fmt or http package), but in most cases a test that triggers a panic will fail, visibly and with an easy-to-debug stack trace. With visible test failures, overflow bugs are more likely to be caught before reaching production.

Member

bcmills commented Jun 20, 2017

@nathany

The Ariane 5 explosion is a nice example, but note that it involved many layers of failures:

  1. The value overflowed.
  2. The overflow resulted in an exception.
  3. The exception was not handled.
  4. The unhandled exception in a non-flight-critical task terminated flight-critical tasks on the computer.
  5. …and the major part you're leaving out:
    The overflow did not occur during testing.

The Ariane 501 Failure Report has a lot of interesting analysis, but I want to call out the second recommendation in particular:

R2 […] [T]est […] as much real equipment as technically feasible, inject realistic input data, and perform complete, closed-loop, system testing. […]

With silent overflow, it is easy to miss overflow errors even with fairly comprehensive tests: comparisons of overflowed values may work out anyway during tests because they happen to wrap around to values that satisfy some of the same invariants. (For example, the overflow in #20687 was not caught in previous testing because the inputs happen to overflow to nonnegative values.)

On the other hand, a panic is hard to miss. It can be accidentally or intentionally recovered (e.g. by the fmt or http package), but in most cases a test that triggers a panic will fail, visibly and with an easy-to-debug stack trace. With visible test failures, overflow bugs are more likely to be caught before reaching production.

@nathany

This comment has been minimized.

Show comment
Hide comment
@nathany

nathany Jun 22, 2017

Contributor

@bcmills I don't disagree.

The Ariane 5 explosion is an example from long ago, in a different language and context. It was as much to do with moving old code from one rocket to the next without adequate review or testing, as it is to do with Ada's exceptions. Also, if I recall, the exception recovery code was left out in the initial version due to constraints (performance or some other overhead).

Of the current language constructs, panic/recover seems like the cleanest way to catch overflows in a given block of code -- vs. if statements after each operation. Possibly with more overhead -- at least in the case of an overflow.

There is still the need to allow wrap-around behaviour when desired. Maybe the cleanest way to do that would be to actually have different types? A checked int8 and an unchecked int8 -- two types with different behaviour, that can be converted to each other?

That way the expressions with the code don't need any extra syntax.

(I see you have this option as bullet point 2 in your original post -- bad on me for skimming too quickly)

I am curious how other (modern) programming languages like Swift approach this.

Contributor

nathany commented Jun 22, 2017

@bcmills I don't disagree.

The Ariane 5 explosion is an example from long ago, in a different language and context. It was as much to do with moving old code from one rocket to the next without adequate review or testing, as it is to do with Ada's exceptions. Also, if I recall, the exception recovery code was left out in the initial version due to constraints (performance or some other overhead).

Of the current language constructs, panic/recover seems like the cleanest way to catch overflows in a given block of code -- vs. if statements after each operation. Possibly with more overhead -- at least in the case of an overflow.

There is still the need to allow wrap-around behaviour when desired. Maybe the cleanest way to do that would be to actually have different types? A checked int8 and an unchecked int8 -- two types with different behaviour, that can be converted to each other?

That way the expressions with the code don't need any extra syntax.

(I see you have this option as bullet point 2 in your original post -- bad on me for skimming too quickly)

I am curious how other (modern) programming languages like Swift approach this.

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Jun 22, 2017

Member

I am curious how other (modern) programming languages like Swift approach this.

An arbitrary sampling:

Swift traps overflows by default and provides a set of parallel overflow operators (for example, &+ means "+ ignoring overflow").

Rust traps overflows in debug mode, and provides explicit wrapping, saturating, checked, and overflowing variants. The overflowing variant is delightfully similar to what I'm proposing here. The Rust folks are keeping a handy list of overflow bugs they've found since adding the checks in debug mode.

C# has a checked keyword that enables overflow checks; overflows are silent by default.

Clojure throws an ArithmeticException by default; it can be disabled by setting *unchecked-math* or using explicit functions like unchecked-add.

The Kotlin reference doesn't talk about overflow at all. In this thread the suggestion to check overflow appears to be rejected on efficiency grounds.

D overflows silently, but provides a checkedint library that sets a boolean on overflow.

And of course Java "[does] not indicate overflow or underflow in any way", but Java 8 added Math.addExact and related functions, and the SEI CERT coding standard requires detection or prevention of overflow except for bitwise operations and "carefully documented" benign usage (such as hashing).

Member

bcmills commented Jun 22, 2017

I am curious how other (modern) programming languages like Swift approach this.

An arbitrary sampling:

Swift traps overflows by default and provides a set of parallel overflow operators (for example, &+ means "+ ignoring overflow").

Rust traps overflows in debug mode, and provides explicit wrapping, saturating, checked, and overflowing variants. The overflowing variant is delightfully similar to what I'm proposing here. The Rust folks are keeping a handy list of overflow bugs they've found since adding the checks in debug mode.

C# has a checked keyword that enables overflow checks; overflows are silent by default.

Clojure throws an ArithmeticException by default; it can be disabled by setting *unchecked-math* or using explicit functions like unchecked-add.

The Kotlin reference doesn't talk about overflow at all. In this thread the suggestion to check overflow appears to be rejected on efficiency grounds.

D overflows silently, but provides a checkedint library that sets a boolean on overflow.

And of course Java "[does] not indicate overflow or underflow in any way", but Java 8 added Math.addExact and related functions, and the SEI CERT coding standard requires detection or prevention of overflow except for bitwise operations and "carefully documented" benign usage (such as hashing).

@nathany

This comment has been minimized.

Show comment
Hide comment
@nathany

nathany Jun 22, 2017

Contributor
  1. Implicit wrapping only for unsigned types (uint32 and friends), since they're used for bit-manipulation code more often than the signed equivalents.

If proceeding down the path of checked integer types, I think it would be valuable to have checked signed and checked unsigned types. If a uint64 is needed to interact with a database or whatnot, it shouldn't be necessary to throw out the safety measures.

My current questions to ponder/research:

  • Are unchecked signed types useful, or are unsigned integers enough for those use cases?
  • Would it make sense to limit bitwise operations to the unchecked types? Not just math/bits, but shifts as well << >> and maybe even & |?
  • Even if this is a Go 2 feature, would it ease the transition to introduce new checked types, and emphasize there use in documentation, instead of changing the default behaviour?
  • What are some downsides to introducing more integer types -- vs. say an (un)checked keyword?
Contributor

nathany commented Jun 22, 2017

  1. Implicit wrapping only for unsigned types (uint32 and friends), since they're used for bit-manipulation code more often than the signed equivalents.

If proceeding down the path of checked integer types, I think it would be valuable to have checked signed and checked unsigned types. If a uint64 is needed to interact with a database or whatnot, it shouldn't be necessary to throw out the safety measures.

My current questions to ponder/research:

  • Are unchecked signed types useful, or are unsigned integers enough for those use cases?
  • Would it make sense to limit bitwise operations to the unchecked types? Not just math/bits, but shifts as well << >> and maybe even & |?
  • Even if this is a Go 2 feature, would it ease the transition to introduce new checked types, and emphasize there use in documentation, instead of changing the default behaviour?
  • What are some downsides to introducing more integer types -- vs. say an (un)checked keyword?
@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Jun 22, 2017

Member
  • Are unchecked signed types useful, or are unsigned integers enough for those use cases?
  • What are some downsides to introducing more integer types — vs. say an (un)checked keyword?

Adding more distinct types would not only expand the language spec, but also add substantial surface area to packages like reflect that need to enumerate types.

Per #19624 (comment), I believe that separate "unchecked" types are not worth the complexity: a _, ok assignment form that suppresses overflow panics turns out to be fairly readable even in the implementation of a hash function.

A checked or unchecked keyword would also avoid the type-bloat, but the _, ok form seems to harmonize better with the rest of the language: we already have the same construct for type-assertions, map lookups, and channel receives.

  • Would it make sense to limit bitwise operations to the unchecked types? Not just math/bits, but shifts as well << >> and maybe even & |?

The bitwise operators &, |, and >> by definition cannot overflow. In the concrete proposal (which can be found at the bottom of #19624 (comment)), I recommend that << should not overflow when it represents a "logical" shift (that is, when the shift operand is an unsigned quantity).

  • Even if this is a Go 2 feature, would it ease the transition to introduce new checked types, and emphasize [their] use in documentation, instead of changing the default behaviour?

Given that new types could not be retrofitted into the Go 1 standard library, I think the benefit would be marginal. However, the proposed _, ok assignment form is compatible with Go 1.

Member

bcmills commented Jun 22, 2017

  • Are unchecked signed types useful, or are unsigned integers enough for those use cases?
  • What are some downsides to introducing more integer types — vs. say an (un)checked keyword?

Adding more distinct types would not only expand the language spec, but also add substantial surface area to packages like reflect that need to enumerate types.

Per #19624 (comment), I believe that separate "unchecked" types are not worth the complexity: a _, ok assignment form that suppresses overflow panics turns out to be fairly readable even in the implementation of a hash function.

A checked or unchecked keyword would also avoid the type-bloat, but the _, ok form seems to harmonize better with the rest of the language: we already have the same construct for type-assertions, map lookups, and channel receives.

  • Would it make sense to limit bitwise operations to the unchecked types? Not just math/bits, but shifts as well << >> and maybe even & |?

The bitwise operators &, |, and >> by definition cannot overflow. In the concrete proposal (which can be found at the bottom of #19624 (comment)), I recommend that << should not overflow when it represents a "logical" shift (that is, when the shift operand is an unsigned quantity).

  • Even if this is a Go 2 feature, would it ease the transition to introduce new checked types, and emphasize [their] use in documentation, instead of changing the default behaviour?

Given that new types could not be retrofitted into the Go 1 standard library, I think the benefit would be marginal. However, the proposed _, ok assignment form is compatible with Go 1.

@nathany

This comment has been minimized.

Show comment
Hide comment
@nathany

nathany Jun 22, 2017

Contributor

I believe that separate "unchecked" types are not worth the complexity

One difference with the extra types is that some functions may expose those unchecked types as parameters, which would self-document the behaviour.

Though I'm not sure if that's worth the extra code bloat of converting types.

I recommend that << should not overflow when it represents a "logical" shift (that is, when the shift operand is an unsigned quantity)

I saw that, but I initially found it a little odd to explain. Why can x * 2 cause a panic, but x << 1 is special?

That's why the idea popped into my head of separating the panicky checked types for ordinary math from the wrappy bit-oriented types. As in, no << operation on checked types.

Maybe not a good idea (<< is used surprisingly often), but maybe easier to explain.

If there are no separate types, than that idea dies with them. Explaining << as being intended to drop high-bits -- and therefore not really "overflowing" is reasonable, I suppose.

the _, ok form seems to harmonize better with the rest of the language

Very true. It's a syntax that takes some getting used to for new gophers, but the commonality with type assertion panics vs. okay does make sense.

Thanks for taking the time to answer my questions. I'd suggest revising the original post with some strikethrough if the _, ok approach is the what this proposal has concluded with. Cheers.

P.S. I'm still holding out for arbitrary precision ints (#19623) either way. I think the combination could strengthen Go's integer handling quite a lot, though neither is without some overhead.

Contributor

nathany commented Jun 22, 2017

I believe that separate "unchecked" types are not worth the complexity

One difference with the extra types is that some functions may expose those unchecked types as parameters, which would self-document the behaviour.

Though I'm not sure if that's worth the extra code bloat of converting types.

I recommend that << should not overflow when it represents a "logical" shift (that is, when the shift operand is an unsigned quantity)

I saw that, but I initially found it a little odd to explain. Why can x * 2 cause a panic, but x << 1 is special?

That's why the idea popped into my head of separating the panicky checked types for ordinary math from the wrappy bit-oriented types. As in, no << operation on checked types.

Maybe not a good idea (<< is used surprisingly often), but maybe easier to explain.

If there are no separate types, than that idea dies with them. Explaining << as being intended to drop high-bits -- and therefore not really "overflowing" is reasonable, I suppose.

the _, ok form seems to harmonize better with the rest of the language

Very true. It's a syntax that takes some getting used to for new gophers, but the commonality with type assertion panics vs. okay does make sense.

Thanks for taking the time to answer my questions. I'd suggest revising the original post with some strikethrough if the _, ok approach is the what this proposal has concluded with. Cheers.

P.S. I'm still holding out for arbitrary precision ints (#19623) either way. I think the combination could strengthen Go's integer handling quite a lot, though neither is without some overhead.

@josharian

This comment has been minimized.

Show comment
Hide comment
@josharian

josharian Jun 22, 2017

Contributor

Whatever else happens, this has been an interesting thread; thanks, @bcmills.

An off-the-cuff suggestion in the interests of exploring more of the design space: What if instead of panicking, we marked overflowed ints as tainted. Use of a tainted int in some settings (e.g. a slice index) would panic. And expose some way to check the taintedness of an int and some way to clear it. One obvious analogy here is with floats and NaNs.

Contributor

josharian commented Jun 22, 2017

Whatever else happens, this has been an interesting thread; thanks, @bcmills.

An off-the-cuff suggestion in the interests of exploring more of the design space: What if instead of panicking, we marked overflowed ints as tainted. Use of a tainted int in some settings (e.g. a slice index) would panic. And expose some way to check the taintedness of an int and some way to clear it. One obvious analogy here is with floats and NaNs.

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Jun 22, 2017

Member

What if instead of panicking, we marked overflowed ints as tainted.

An interesting idea.

One implementation approach would be to steal a "tag bit" to indicate overflow, but that has the downside of requiring all the fixed-width integer types to be n-1 bits (int31 instead of int32). My experience with languages in the ML family is that that's really annoying: pretty much the whole point of fixed-width integer types is to interoperate with other languages and to make wire-encodings easy to work with, and those overwhelmingly tend to use power-of-two integer sizes.

Another implementation approach would be to use the arbitrary-length-integer trick of using the tag bit to indicate a pointer (or an offset in a table somewhere) that points to the real value, which would still be power-of-2-sized. My intuition is that that's likely to be strictly more overhead than trapping on overflow, but perhaps not fatal. But it would imply that tainted Go ints could not be used directly in cgo or kernel structs, which in turn implies that "Go integer" and "C integer" probably need to actually be separate types in the type system.

So, maybe not terrible in terms of performance, but a bit awkward.

My bigger concern is that it would make overflow checking substantially weaker: we would only detect overflows that result in a validity check somewhere, which means we could still end up serializing corrupted values (e.g. incorrect length stamps in framing messages) and triggering arbitrary bugs in the (non-Go) programs on the receiving end.

Member

bcmills commented Jun 22, 2017

What if instead of panicking, we marked overflowed ints as tainted.

An interesting idea.

One implementation approach would be to steal a "tag bit" to indicate overflow, but that has the downside of requiring all the fixed-width integer types to be n-1 bits (int31 instead of int32). My experience with languages in the ML family is that that's really annoying: pretty much the whole point of fixed-width integer types is to interoperate with other languages and to make wire-encodings easy to work with, and those overwhelmingly tend to use power-of-two integer sizes.

Another implementation approach would be to use the arbitrary-length-integer trick of using the tag bit to indicate a pointer (or an offset in a table somewhere) that points to the real value, which would still be power-of-2-sized. My intuition is that that's likely to be strictly more overhead than trapping on overflow, but perhaps not fatal. But it would imply that tainted Go ints could not be used directly in cgo or kernel structs, which in turn implies that "Go integer" and "C integer" probably need to actually be separate types in the type system.

So, maybe not terrible in terms of performance, but a bit awkward.

My bigger concern is that it would make overflow checking substantially weaker: we would only detect overflows that result in a validity check somewhere, which means we could still end up serializing corrupted values (e.g. incorrect length stamps in framing messages) and triggering arbitrary bugs in the (non-Go) programs on the receiving end.

@nathany

This comment has been minimized.

Show comment
Hide comment
@nathany

nathany Jun 22, 2017

Contributor

@josharian If you were debugging some code, wouldn't you prefer the stack trace point to where the overflow occurred rather than later on?


If we go with the idea that everyone should be sanitizing input and doing pre-checks to avoid overflows, then causing panics should light the fire to do it.

Will people? Well, how often do we see:

if b != 0 {
    a /= b
}

I'd guess not nearly as often as one might hope.

The _, ok syntax would provide one more tool -- a way to do the math without panicking -- and query if anything went wrong after-the-fact. That may be cleaner and easier to express for complicated equations.

A type assertion in the form a := b.(type) suggests "I know this will succeed". The same logic should apply to any mathematical operation. If it might fail, it would be preferable to use the v, ok syntax or guard with pre-checks than allow a panic.

The more I think about it, the more merit I see in this proposal, but not as an alternative to arbitrary precision integers, which remain simpler to work with and interoperate nicely with constants.

The combination of the two proposals would mean that any arbitrary precision int-only expression would never panic and always be ok for overflows -- but if this proposal was extended to cover divide by zero, _, ok could still be used there.

That feels somewhat expected to me. If using _, ok, why would I still get a panic sometimes?

The caveat is divide by zero being unintentionally swallowed in code that wants wrap around, resulting in the need to split out division:

a, _ += 2      // allow wrap, ignore overflows and don't panic
a /= b         // may panic
a, ok = a / b  // may not be ok
Contributor

nathany commented Jun 22, 2017

@josharian If you were debugging some code, wouldn't you prefer the stack trace point to where the overflow occurred rather than later on?


If we go with the idea that everyone should be sanitizing input and doing pre-checks to avoid overflows, then causing panics should light the fire to do it.

Will people? Well, how often do we see:

if b != 0 {
    a /= b
}

I'd guess not nearly as often as one might hope.

The _, ok syntax would provide one more tool -- a way to do the math without panicking -- and query if anything went wrong after-the-fact. That may be cleaner and easier to express for complicated equations.

A type assertion in the form a := b.(type) suggests "I know this will succeed". The same logic should apply to any mathematical operation. If it might fail, it would be preferable to use the v, ok syntax or guard with pre-checks than allow a panic.

The more I think about it, the more merit I see in this proposal, but not as an alternative to arbitrary precision integers, which remain simpler to work with and interoperate nicely with constants.

The combination of the two proposals would mean that any arbitrary precision int-only expression would never panic and always be ok for overflows -- but if this proposal was extended to cover divide by zero, _, ok could still be used there.

That feels somewhat expected to me. If using _, ok, why would I still get a panic sometimes?

The caveat is divide by zero being unintentionally swallowed in code that wants wrap around, resulting in the need to split out division:

a, _ += 2      // allow wrap, ignore overflows and don't panic
a /= b         // may panic
a, ok = a / b  // may not be ok
@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Jun 22, 2017

Member

if this proposal was extended to cover divide by zero, _, ok could still be used there.

That's an interesting suggestion. For divide-by-zero, what would the other value be when ok is false? For overflows there is an obvious choice on current hardware (two's-complement truncation), but for division that's much less clear to me.

Member

bcmills commented Jun 22, 2017

if this proposal was extended to cover divide by zero, _, ok could still be used there.

That's an interesting suggestion. For divide-by-zero, what would the other value be when ok is false? For overflows there is an obvious choice on current hardware (two's-complement truncation), but for division that's much less clear to me.

@nathany

This comment has been minimized.

Show comment
Hide comment
@nathany

nathany Jun 22, 2017

Contributor

Maybe just 0 -- the zero value for the type as with type assertions?
https://play.golang.org/p/CBYt8ISMAu

(I'm not thinking of implementation details though)

Contributor

nathany commented Jun 22, 2017

Maybe just 0 -- the zero value for the type as with type assertions?
https://play.golang.org/p/CBYt8ISMAu

(I'm not thinking of implementation details though)

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Aug 16, 2017

Member

The proposed , ok extension here would make the fix for #21481 much clearer. Instead of:

if c := cap(b.buf); c > maxInt-c-n {
  panic(ErrTooLarge)
}
newBuf = makeSlice(2*cap(b.buf) + n)

we could write

s, ok := 2*cap(b.buf) + n
if !ok {
  panic(ErrTooLarge)
}
newBuf = makeSlice(s)
Member

bcmills commented Aug 16, 2017

The proposed , ok extension here would make the fix for #21481 much clearer. Instead of:

if c := cap(b.buf); c > maxInt-c-n {
  panic(ErrTooLarge)
}
newBuf = makeSlice(2*cap(b.buf) + n)

we could write

s, ok := 2*cap(b.buf) + n
if !ok {
  panic(ErrTooLarge)
}
newBuf = makeSlice(s)
@nathany

This comment has been minimized.

Show comment
Hide comment
@nathany

nathany Nov 17, 2017

Contributor

A nice property of the s, ok syntax is that it might not be considered a breaking change in Go 1.x because it's opt-in. But I don't like the idea of tag bits and int63s and such.

From a blog post on Rust:

These should cover all bases of “don’t want overflow to panic in some modes”:
wrapping_* returns the straight two’s complement result,
saturating_* returns the largest/smallest value (as appropriate) of the type when overflow occurs,
overflowing_* returns the two’s complement result along with a boolean indicating if overflow occurred, and
checked_* returns an Option that’s None when overflow occurs.
All of these can be implemented in terms of overflowing_*, but the standard library is trying to make it easy for programmers to do the right thing in the most common cases.
http://huonw.github.io/blog/2016/04/myths-and-legends-about-integer-overflow-in-rust/

The s, ok syntax is effectively the overflowing option, allowing any of the other options to be implemented.

s, ok := 2*cap(b.buf) + n

Though it isn't necessarily the best default option. Panic may be better, just like divide by zero.

If Go were to trap the overflow, does that essentially mean a panic, which could optionally be recovered from? In the case of #21481, would you recover just to panic with bytes.ErrTooLarge instead?

Could an overflow panic be implemented in Go 1.x behind a compiler/build flag (opt-in)? Potentially switching the default behaviour in Go 2.0.

The following may be a pipe dream for Go:

var m int64 = math.MaxInt64

a := m + 1 // panic: runtime error: ... overflows int64
b, ok := m + 1 
if !ok {
    // saturate, panic, return an error, other custom behaviour
}
c, _ := m + 1 // wrapping (current behaviour)

And perhaps similar for divide by zero, more for consistency than anything else:

var z int64 = 0

a := 1 / z // panic: runtime error: integer divide by zero (current behaviour)
b, ok := 1 / z
if !ok {
    // custom behavior
}
c, _ := 1 / z // zero?

(but perhaps not)

While at the same time having arbitrary precision ints to avoid overflows in the common case?

Contributor

nathany commented Nov 17, 2017

A nice property of the s, ok syntax is that it might not be considered a breaking change in Go 1.x because it's opt-in. But I don't like the idea of tag bits and int63s and such.

From a blog post on Rust:

These should cover all bases of “don’t want overflow to panic in some modes”:
wrapping_* returns the straight two’s complement result,
saturating_* returns the largest/smallest value (as appropriate) of the type when overflow occurs,
overflowing_* returns the two’s complement result along with a boolean indicating if overflow occurred, and
checked_* returns an Option that’s None when overflow occurs.
All of these can be implemented in terms of overflowing_*, but the standard library is trying to make it easy for programmers to do the right thing in the most common cases.
http://huonw.github.io/blog/2016/04/myths-and-legends-about-integer-overflow-in-rust/

The s, ok syntax is effectively the overflowing option, allowing any of the other options to be implemented.

s, ok := 2*cap(b.buf) + n

Though it isn't necessarily the best default option. Panic may be better, just like divide by zero.

If Go were to trap the overflow, does that essentially mean a panic, which could optionally be recovered from? In the case of #21481, would you recover just to panic with bytes.ErrTooLarge instead?

Could an overflow panic be implemented in Go 1.x behind a compiler/build flag (opt-in)? Potentially switching the default behaviour in Go 2.0.

The following may be a pipe dream for Go:

var m int64 = math.MaxInt64

a := m + 1 // panic: runtime error: ... overflows int64
b, ok := m + 1 
if !ok {
    // saturate, panic, return an error, other custom behaviour
}
c, _ := m + 1 // wrapping (current behaviour)

And perhaps similar for divide by zero, more for consistency than anything else:

var z int64 = 0

a := 1 / z // panic: runtime error: integer divide by zero (current behaviour)
b, ok := 1 / z
if !ok {
    // custom behavior
}
c, _ := 1 / z // zero?

(but perhaps not)

While at the same time having arbitrary precision ints to avoid overflows in the common case?

@dr2chase

This comment has been minimized.

Show comment
Hide comment
@dr2chase

dr2chase Nov 17, 2017

Contributor

This doesn't work for division by zero, but for addition, subtraction, and multiplication
s, ok := a op b
can assign useful values to s in even when ok is false. For subtraction and addition, the result is what you'd get right now. For multiplication, I think also "what you get now", which is (I think) the low-order bits of the overflowing multiplication.

For the add/sub case you can directly compute what the answer should have been; for multiplication, you have the low-x-low part of your ultimate result.

We could do this, without the panics, in Go 1, which both gives people a clear way to write this now, and future-proofs it against panic-on-overflow in Go 2 (because you wrote this where you thought it would happen). It would come into minor conflict with a Go 2 "int just grows as needed" int type (but not the sized types) -- either it always returns "okay" or we statically error out on that case, and we could conceivably remove them automatically (if the "ok" were involved in a complex expression we'd have to be a little more careful).

Contributor

dr2chase commented Nov 17, 2017

This doesn't work for division by zero, but for addition, subtraction, and multiplication
s, ok := a op b
can assign useful values to s in even when ok is false. For subtraction and addition, the result is what you'd get right now. For multiplication, I think also "what you get now", which is (I think) the low-order bits of the overflowing multiplication.

For the add/sub case you can directly compute what the answer should have been; for multiplication, you have the low-x-low part of your ultimate result.

We could do this, without the panics, in Go 1, which both gives people a clear way to write this now, and future-proofs it against panic-on-overflow in Go 2 (because you wrote this where you thought it would happen). It would come into minor conflict with a Go 2 "int just grows as needed" int type (but not the sized types) -- either it always returns "okay" or we statically error out on that case, and we could conceivably remove them automatically (if the "ok" were involved in a complex expression we'd have to be a little more careful).

@nathany

This comment has been minimized.

Show comment
Hide comment
@nathany

nathany Nov 18, 2017

Contributor

I'm wondering about whether this comma, ok syntax should apply to divide by zero or not.

In more complex expressions, would it be better if ok is false for either overflow or divide by zero? Or is it better to only use this construct for overflow, and just panic for division by zero as Go currently does.

var m int64 = math.MaxInt64
var z int64 = 0

b, ok := (m + 1) / z
if !ok {
    // custom behavior
}

It's also not clear to me how feasible it is to support comma, ok on any given mathematical expression (including function calls?) with a standard int64 (no tainted bits and so on to break compatibility with other languages).

Contributor

nathany commented Nov 18, 2017

I'm wondering about whether this comma, ok syntax should apply to divide by zero or not.

In more complex expressions, would it be better if ok is false for either overflow or divide by zero? Or is it better to only use this construct for overflow, and just panic for division by zero as Go currently does.

var m int64 = math.MaxInt64
var z int64 = 0

b, ok := (m + 1) / z
if !ok {
    // custom behavior
}

It's also not clear to me how feasible it is to support comma, ok on any given mathematical expression (including function calls?) with a standard int64 (no tainted bits and so on to break compatibility with other languages).

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Nov 18, 2017

Member

I'm wondering about whether this comma, ok syntax should apply to divide by zero or not.

It's not obvious to me either (see the comments starting at https://golang.org/issue/19624#issuecomment-310427675).

It's also not clear to me how feasible it is to support comma, ok on any given mathematical expression

My current draft proposal is that it should only apply to arithmetic expressions, not map lookups (for which , ok would become ambiguous) or function calls (which can have side effects and could thus unintentionally allow an overflowed value to escape).

But I'm not strongly tied to that point: if you have interesting examples or use-cases either way, I'd be happy to see them.

Member

bcmills commented Nov 18, 2017

I'm wondering about whether this comma, ok syntax should apply to divide by zero or not.

It's not obvious to me either (see the comments starting at https://golang.org/issue/19624#issuecomment-310427675).

It's also not clear to me how feasible it is to support comma, ok on any given mathematical expression

My current draft proposal is that it should only apply to arithmetic expressions, not map lookups (for which , ok would become ambiguous) or function calls (which can have side effects and could thus unintentionally allow an overflowed value to escape).

But I'm not strongly tied to that point: if you have interesting examples or use-cases either way, I'd be happy to see them.

@bcmills

This comment has been minimized.

Show comment
Hide comment
@bcmills

bcmills Aug 11, 2018

Member

@nathany, here's an interesting analysis of the x/0 = 0 option (for when the overflow is ignored). Seems like it's actually not too bad.
https://www.hillelwayne.com/post/divide-by-zero/

Member

bcmills commented Aug 11, 2018

@nathany, here's an interesting analysis of the x/0 = 0 option (for when the overflow is ignored). Seems like it's actually not too bad.
https://www.hillelwayne.com/post/divide-by-zero/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment