Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: checked integer types #30613

Open
ianlancetaylor opened this issue Mar 6, 2019 · 32 comments

Comments

@ianlancetaylor
Copy link
Contributor

commented Mar 6, 2019

As a simpler alternative to #30209, I propose that we add a new set of integer types that panic on overflow.

First a quick summary of some other languages, to the best of my knowledge:

  • C and C++: integer overflow is undefined.
  • Java and D: integer overflow wraps. This is also what Go does today.
  • Rust, Clojure, and Ada: integer overflow throws an exception.
  • Swift: integer overflow throws an exception but additional operators wrap: &+, &-, &*.
  • Haskell: integer overflow is implementation defined.
  • C#: integer overflow may be checked or wrapped by using the checked and unchecked keywords.
  • Python and Ruby: integers have unbounded size and cannot overflow.

Issues #19624 and #30209 argue that in Go integer overflow should panic. But today integer overflow wraps, and we cannot break existing code.

I propose that we add a new set of checked integer types. These will use a prefix o, for overflow. The types will be oint, oint8, oint16, oint32, oint64, ouint, ouint8, ouint16, ouint32, ouint64, ouintptr. We can also consider adding the type aliases obyte (= ouint8) and orune (= ouint32) although I'm not sure they are very important.

These new types act exactly like the corresponding types without the o prefix, except that if an addition, subtraction, multiplication, or division operation overflows, the result is a run-time panic.

Issue #30209 suggests adding a comma-ok form to detect integer overflow in an expression, as is also suggested in #6815. The main advantage of comma-ok for simple expressions is for a simpler implementation of multi-precision arithmetic, but for that use we have instead chosen to provide functions in the math/bits package. The main advantage of comma-ok for complex expressions is to permit switching between checked and wrapping arithmetic, but we already have wrapping arithmetic in the int type, and there is no realistic prospect of removing that from the language. So I do not think we need a comma-ok form. If there are other uses of comma-ok, this proposal still supports them, awkwardly, via a deferred function that calls recover.

There will as usual be no automatic conversion between existing types and the new types. This requirement of Go is the only way that this approach can work.

@beoran

This comment has been minimized.

Copy link

commented Mar 6, 2019

While this is a simple solution it doesn't seem to be the best solution to me. With my range types proposal, #30428, and use of type aliases, this simply becomes:

type OInt8 = range int8[math.MinInt8:math.MaxInt8]
type OInt16 = range int16[math.MinInt16:math.MaxInt16]
type OInt32 = range int32[math.MinInt32:math.MaxInt32]
type OInt64 = range int64[math.MinInt64:math.MaxInt64]
type OUInt8 = range uint8[math.MinUint8:math.MaxUint8]
type OUInt16 = range uint16[math.MinUint16:math.MaxUint16]
type OUInt32 = range uint32[math.MinUint32:math.MaxUint32]
type OUInt64 = range uint64[math.MinUint64:math.MaxUint64]

As per my proposal, panics are emitted on overflow, but the , ok form will also be supported. I think this issue goes to show that controlling the overflow of integers is indeed useful in several people's mind. But I believe it's better to build a generally useful language feature than a specific special case one. Therefore I think my proposal is the best approach to get general use range checked integers by implementing range types.

@josharian

This comment has been minimized.

Copy link
Contributor

commented Mar 6, 2019

The “o” prefix is a bit odd, in that these are types that don’t overflow. I don’t have a better suggestion, though. “c” for checked, perhaps, although cint looks a lot like C.int.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor Author

commented Mar 7, 2019

Yes, I considered a c prefix but I think it could be confusing. But I'm not wedded to o.

@extemporalgenome

This comment has been minimized.

Copy link
Contributor

commented Mar 7, 2019

uint32p ? (the 'p' is for "panic"). I presume that int8(oint16(x)) may wrap around into the final int8?

And should a Go 2 derivative of this concept permit byte(256) to wrap around at compile time (not get compilation errors) while obyte(256) would have the present compile-time behavior? byte(obyte(256)) could be the way to get a compile-time evaluated byte that is "temporarily checked."

@josharian

This comment has been minimized.

Copy link
Contributor

commented Mar 7, 2019

Maybe I’ve been hanging out on the wrong side of the C tracks, but I read int32p as “pointer to some int32”.

And if you put it as a prefix...well, seems like you’d need cup and quart as well.

“b” for bounded?

@beoran

This comment has been minimized.

Copy link

commented Mar 7, 2019

@extemporalgenome
I updated the spec of my proposal to add another check on overflow, during calculation based on the underlying type. The idea is that ranged types would not allow any overflow at all, not even during calculations based on the underlying type, but for performance the bounds check only happens just before the final assignment.

Your example int8(OInt16(x)) will cause a panic if X is out of range for OInt16. But if X is in range then the conversion happens based on the underlying types, then this becomes identical then to int8(int16(X)) which will wrap.

@alanfo

This comment has been minimized.

Copy link

commented Mar 8, 2019

I certainly prefer this proposal to #30209 which I felt was over-complicated.

My only reservation is whether this is a better way of solving the overflow problem than the wildly popular #19623 which (at the cost of some efficiency) gets rids of it altogether and also deals with requests for larger integer types such as int128. The only thing I didn't like about that particular proposal was changing the implementation-specific typesint and uint to be arbitrary precision. Unless there's a plan for Go to stop supporting 32-bit platforms in the near future, I'd prefer a new signed arbitrary precision type (zint say) instead.

Anyway, to return to this proposal, I agree that the o prefix seems inappropriate and c would be too confusing. Whilst b confers the correct sense the word bint has unfortunate connotations (see here). How about l for limited instead?

@bradfitz

This comment has been minimized.

Copy link
Member

commented Mar 12, 2019

Rust, Clojure, and Ada: integer overflow throws an exception.

Kinda, but not really for Rust: https://doc.rust-lang.org/book/ch03-02-data-types.html#integer-overflow

When compiling in debug mode, Rust checks for this kind of issue and will cause your program to panic, which is the term Rust uses when a program exits with an error. We’ll discuss panics more in Chapter 9.

In release builds, Rust does not check for overflow, and instead will do something called “two’s complement wrapping.” In short, 256 becomes 0, 257 becomes 1, etc. Relying on overflow is considered an error, even if this behavior happens. If you want this behavior explicitly, the standard library has a type, Wrapping, that provides it explicitly.

@networkimprov

This comment has been minimized.

Copy link

commented Mar 12, 2019

@extemporalgenome suggested a suffix, which is logical, since bounding pertains to the bit count

intb
int32b
uint32b
@ianlancetaylor

This comment has been minimized.

Copy link
Contributor Author

commented Mar 12, 2019

A suffix makes a certain amount of sense but I think it would be too easy to miss when reading the code.

@networkimprov

This comment has been minimized.

Copy link

commented Mar 12, 2019

Human perception finds words, not character strings, so a prefix letter is easily confused for identifiers with that letter sequence.

Certain letters or an underscore would be hard to overlook

intq
int32q
uint32q

int_b
int32_b
uint32_b
@beoran

This comment has been minimized.

Copy link

commented Mar 14, 2019

Rust has the idea that debug code will be checked and production code will not. While this may seem nice for performance, it has the downside now your debug version and production version are different. That's unacceptable for safety and/or security critical software. The debug version and the production version should be semantically the same.

I think overflow should always cause a recoverable panic, and , if the ,ok form is too hard to implement, that may be left out at first, indeed.

I also maintain that my proposal ranged integer proposal is the far more generally useful approach to solving this problem. With it, naming of the ranged, non-overflowing types can also be left to the programmer. Also, I updated my proposal to use the minimal and maximal values of the underlying type by default so it becomes even easier to write bounds checked range types:

// Let us put the types in a package
package bounded

type Int8 = range int8[:]
type Int16 = range int16[:]
type Int32 = range int32[:]
type Int64 = range int64[:]
type UInt8 = range uint8[:]
type UInt16 = range uint16[:]
type UInt32 = range uint32[:]
type UInt64 = range uint64[:]

// Or just use the range type as is. 
var j bounded.Int8 = range int8[:](1 << 7) // OK
var i range int8[:] = bounded.Int8(1 << 8) // panic
const MaskBit range int8[:] = range int8[:](1 << 8) // Compile error
@JavierZunzunegui

This comment has been minimized.

Copy link

commented Apr 4, 2019

When compiling in debug mode, Rust checks for this kind of issue and will cause your program to panic, [...]

Interesting highlight in #30613 (comment), could go introduce a build tag to make overflows panic? Might be useful to set in testing, canaries, etc

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor Author

commented Apr 4, 2019

@JavierZunzunegui In my opinion, that is not a feature. As others have said, it's like using a life vest when you are in a wading pool but not using one when you are in the ocean.

@bcmills

This comment has been minimized.

Copy link
Member

commented May 13, 2019

There will as usual be no automatic conversion between existing types and the new types. This requirement of Go is the only way that this approach can work.

Could you give some more detail on why that is the case?

I think it's important that checked arithmetic be at least as concise as unchecked arithmetic, so that there is no pressure for developers to use unchecked arithmetic where it is not needed. However, since existing libraries (such as strings and regexp) necessarily use the existing int types in their APIs, a design that requires explicit conversions between a checked oint and an unchecked int seems like it would make the checked version of the code quite a bit more verbose.

That was why I included a relaxed assignability rule in #30209, so I'd like to understand why a similar approach is considered untenable for this proposal.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor Author

commented May 13, 2019

What I mean when I say "no automatic conversion" is that it is otherwise ambiguous whether an operation should overflow or wrap. Consider

func PrintSum(a int, b oint) {
    fmt.Println(a + b)
}

I think it's essential that everyone reading Go code understand immediately whether a + b will overflow or wrap. If we permit implicit type conversion, then that is impossible. And if we permit implicit type conversion only in certain contexts, then the language is that much more confusing.

For the record, see #31500 for an alternative proposal.

@bcmills

This comment has been minimized.

Copy link
Member

commented May 14, 2019

if we permit implicit type conversion only in certain contexts, then the language is that much more confusing.

We already permit implicit type conversion only in certain contexts: for example, the conversion from an untyped numeric constant to a defined numeric type only occurs when the value is used as a function argument, variable, or arithmetic operand.
(Compare https://play.golang.org/p/LUqxYnZdHP_Y and https://play.golang.org/p/vKalIcIrpjM.)

I think the logical rule for this proposal in particular would be to make the checked and unchecked types mutually-assignable, but to require arithmetic expressions to involve only checked or only unchecked types.

func ident(a int) int { return a }
func oident(a oint) oint { return a }

func Example() {
	var a int
	var b oint
	c := ident(b)               // ok: int (no possibility of overflow)
	x := ident(a) + ident(b)    // ok: int (unambiguously unchecked)
	y := oident(a) + oident(b)  // ok: oint (unambiguously checked)
	z := oident(a + 1)  // ok: a+1 is performed unchecked, then passed as oint
	w := ident(b + 1)   // ok: b+1 is performed checked, then passed as int

	n := len(someSlice) + b      // error: expression combines checked and unchecked operands
	var z oint = len(someSlice)  // ok: no possibility of overflow
	m := z + b                   // ok: unambiguously checked
	
	bad := a + b                // error: expression combines checked and unchecked operands
	bad := ident(a + b)         // error: expression combines checked and unchecked operands
	bad := ident(a) + oident(b) // error: expression combines checked and unchecked operands
}

That doesn't prevent someone from unintentionally writing an unchecked operation where they meant to write a checked one (or vice-versa), but since the builtins (particularly len) already return the unchecked int type, that sort of bug is likely to occur regardless.

In particular, I would expect that many sites that should be written:

	foo(oint(n) + 1)

will instead be written

	foo(oint(n + 1))

regardless of any assignability rule (much like the floating-point-to-Duration class of bugs described in #20757).

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor Author

commented May 14, 2019

for example, the conversion from an untyped numeric constant to a defined numeric type only occurs when the value is used as a function argument, variable, or arithmetic operand.

That is not strictly accurate according the spec, since int it itself a defined numeric type. Type determination of an untyped numeric constant always occurs, though the type is sometimes determined by the context.

That minor quibble aside, I think that experience shows that people find the untyped constant rules to be confusing, and there are frequent references to https://blog.golang.org/constants. I think we should be extremely cautious about adding anything similarly confusing.

@JavierZunzunegui

This comment has been minimized.

Copy link

commented May 15, 2019

When compiling in debug mode, Rust checks for this kind of issue and will cause your program to panic, [...]

Interesting highlight in #30613 (comment), could go introduce a build tag to make overflows panic? Might be useful to set in testing, canaries, etc

@JavierZunzunegui In my opinion, that is not a feature. As others have said, it's like using a life vest when you are in a wading pool but not using one when you are in the ocean.

@ianlancetaylor I see this as a similar property to the -race flag - nobody uses it in production (I believe), that doesn't mean it isn't helpful.

@DmitriyMV

This comment has been minimized.

Copy link

commented May 15, 2019

-race flag - nobody uses it in production

On one of my previous jobs we did exactly that 😉

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor Author

commented May 15, 2019

I would absolutely turn on the -race flag in production if it didn't cost so much performance. Having a separate -race mode is not a philosophical point, it's a practical one.

Overflow checking is expected to have a very low additional cost (on amd64, one correctly predicted branch per operation); of course any approach would have to be reevaluated if it turns out to be more expensive than expected.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor Author

commented Jun 4, 2019

A simpler approach would be to consider that people using sized types want values of exactly that size, and don't want to necessarily worry about overflowing at the edges. And people using unsigned types typically want wrapping semantics. So the only type that really needs a checked variant is int itself, along the lines of #31500. So rather than introducing 11 new types, we should only introduce one, a signed integer type the same size as int that panics on overflow.

The hard part is picking a name for that type. Here are some possibilities, based on comments above and in other issues, none of which seem perfect:

  • oint
  • num
  • whole
  • checked.Int
  • sint
  • bint
  • trueint
@cespare

This comment has been minimized.

Copy link
Contributor

commented Jun 4, 2019

[...] people using sized types want values of exactly that size, and don't want to necessarily worry about overflowing at the edges.

I don't think this is true. I very often use int64 for counters or similar values that I do not want to overflow. In those cases I'm not choosing it because I need an 8-byte integer so much as I want the largest size of integer that the language/platform supports. (Something like the int64 return value of io.Copy is in this vein.)

And as a single anecdote, the only production issue I can recall tracking back to an integer overflow off the top of my head right now was a multiplication overflow on an int64 a few months back.

@carlmjohnson

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2019

Add safeint for an int that panics on overflow, and bigint for an int that grows to arbitrary sizes.

@alanfo

This comment has been minimized.

Copy link

commented Jun 14, 2019

Yep, if there's to be a single checked int type, then safeint looks the best name suggested so far.

In fact, I've seen this name used in C++ in a similar context though as a class library rather than a built-in language type.

@bcmills

This comment has been minimized.

Copy link
Member

commented Jul 29, 2019

I've been thinking some more about this proposal, particularly in contrast to #30209.

The major difference between this proposal and part 1 of #30209 is the omission of the wrapped package and its corresponding types. I think that difference is significant even if we retain the current wrapping behavior for the existing integer types.

Separate “wrapping” types would allow tools — particularly linters and fuzzers — to distinguish between two very different cases: “I wrote this code without considering overflow behavior”, and “I've thought about it and this code really should wrap silently on overflow”.

The “without considering wrapping behavior” case can arise for various reasons: because it predates the ability to write checked code, or because it was copied or derived from earlier code, or because the user just didn't put much thought into overflow behavior. It seems important for both human readers and mechanical linters to be able to distinguish that from an intentional decision to use wrapping arithmetic: for example, someone auditing a package for bugs may want to focus much more on unexpected overflow than intentional overflow.

@nathany

This comment has been minimized.

Copy link
Contributor

commented Jul 30, 2019

  • How about a "math/checked" package that can be aliased if desired? var i checked.Int. That leaves room for "math/saturating", etc. without introducing dozens of built-in types.

  • What about checked operations other than + - * / %, such as Abs or Pow checked.Abs(math.MinInt64) // panic

I think it's essential that everyone reading Go code understand immediately whether a + b will overflow or wrap. If we permit implicit type conversion, then that is impossible.

  • Swift and Rust approach this by having different operations instead of having different types. Maybe z := checked.Add(x, y) or z:= checked.Int(x).Add(y). That would allow underlying types to be passed more freely, at the cost of verbosity in writing the operations. Has this already been considered?

  • If intentionally wrapping types/operations are introduced, might there be a go.mod edition to change the default behaviour (to checked) going forward? This could make the more verbose alternatives palatable, because the verbosity would be for special cases like intentional wrapping or saturating.

@nathany

This comment has been minimized.

Copy link
Contributor

commented Jul 30, 2019

"RISC-V does not detect or flag most arithmetic errors, including overflow..." https://www.slideshare.net/YiHsiuHsu/riscv-introduction slide 13

This was provided as a motivation for Dart 2 using wrap-around integers. https://github.com/dart-lang/sdk/blob/master/docs/language/informal/int64.md#wrap-around

@carlmjohnson

This comment has been minimized.

Copy link
Contributor

commented Jul 30, 2019

If intentionally wrapping types/operations are introduced, might there be a go.mod edition to change the default behaviour (to checked) going forward? This could make the more verbose alternatives palatable, because the verbosity would be for special cases like intentional wrapping or saturating.

I thought about this, but I worry that it would introduce too many bugs to switch programs over from one behavior to another.

Then again, maybe probably the majority programs with overflow issues already have the bugs, they just don't know it yet.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor Author

commented Jul 30, 2019

@nathany Please also look at #30209.

@carlmjohnson See #31500, which was rejected.

While it's true that RISC-V doesn't have an overflow flag, it's still possible to detect overflow. I don't see that as a reason one way or another.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor Author

commented Jul 30, 2019

@bcmills That's a good point but isn't it a temporary condition? Go code that exists today is a small fraction of Go code that will exist in the future.

@bcmills

This comment has been minimized.

Copy link
Member

commented Jul 30, 2019

@ianlancetaylor, I don't think it is a temporary condition. Code that exists today — particularly as examples and tutorials in blog posts, books, and other formats unlikely to be kept up-to-date — will continue to exist for the foreseeable future, and even if we add a facility for checked arithmetic it will remain very difficult to determine whether the author of a given snippet of code was aware of it at the time the code was written (or copied).

That is: given the possibility of copying code from some other example or codebase, even code marked with a go version after the addition of checked types may very well reflect accidental, rather than intentional, overflow behavior.

Moreover, without some clear way to determine whether code has already been audited for unintended overflow, I don't see how we can expect users to reasonably migrate existing code. Some explicit marker seems necessary, whether it is a structured comment, a separate type or set of types, or some other mechanism (such as a “magic import” to annotate the auditing).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.