Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: change int to be arbitrary precision #19623

Open
robpike opened this issue Mar 20, 2017 · 107 comments

Comments

Projects
None yet
@robpike
Copy link
Contributor

commented Mar 20, 2017

An idea that has been kicking around for years, but never written down:

The current definition of int (and correspondingly uint) is that it is either 32 or 64 bits. This causes a variety of problems that are small but annoying and add up:

  • overflow when constants like math.MaxUint64 is automatically promoted to int type
  • maximum size of a byte slice is only half the address space on a 32-bit machine
  • int values can overflow silently, yet no one depends on this working. (Those who want overflow use sized types.)
  • great care must be taken with conversion between potentially large values, as information can be lost silently
  • many more

I propose that for Go 2 we make a profound change to the language and have int and uint be arbitrary precision. It can be done efficiently - many other languages have done so - and with the new compiler it should be possible to avoid the overhead completely in many cases. (The usual solution is to represent an integer as a word with one bit reserved; for instance if clear, the word points to a big.Int or equivalent, while if set the bit is just cleared or shifted out.)

The advantages are many:

  • int (and uint, but I'll stop mentioning it now) become very powerful types
  • overflow becomes impossible, simplifying and securing many calculations
  • the default type for len etc. can now capture any size without overflow
  • we could permit any integer type to be converted to int without ceremony, simplifying some arithmetical calculations

Most important, I think it makes Go a lot more interesting. No language in its domain has this feature, and the advantages of security and simplicity it would bring are significant.

@robpike robpike added the Go2 label Mar 20, 2017

@griesemer

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2017

I'm a big fan of this, myself. It would elevate int (and uint) to "unrestricted" (for lack of a better word) types, and only the types with explicit sizes (int16, int32, etc.) would be subject to wrap around.

In many cases (loop iteration variables) the compiler may be able to prove that an int is in a certain range and could produce optimal code. In many other cases, the use of int is not speed-critical.

@randall77

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2017

Let's put this and #19624 in the Thunderdome! Two proposals enter, one proposal leaves...

@griesemer

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2017

A minor related point about this: The int and uint types were intended to be of the "natural" size for a given platform, typically the register size. If we have true integers we loose that naturally sized type - yet we make use of it in a few places where we want integer algorithms to work well for a given platform (strconv.Itoa comes to mind). We may want to consider introducing an unsigned "word" type instead (in the past we have also used uintptr in that role, but that may not necessarily guaranteed to be of register size on all platforms).

@randall77

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2017

Representing an int in a single machine word will be tricky. We run across the same problem we had with scalars being in direct interfaces - the GC sees a word which is sometimes a pointer and sometimes isn't.
The only easy solution I see is to reserve some top bit(s) to set for raw integers and disallow those top bits for any heap pointer.

@bcmills

This comment has been minimized.

Copy link
Member

commented Mar 20, 2017

@randall77

Two proposals enter, one proposal leaves...

Actually, I think they're completely compatible. I recommend both!

@bcmills

This comment has been minimized.

Copy link
Member

commented Mar 20, 2017

@randall77

the GC sees a word which is sometimes a pointer and sometimes isn't

Could we fix that with better stack or heap maps? Instead of each word being "either a pointer or a non-pointer", it would be "a pointer, a non-pointer, or an int". I suppose that would require two bits instead of one per address, though ― might bloat the maps a bit.

FWIW, the ML family of languages worked around that issue by making the native types int31 and int63 instead of int32 and int64. I do not recommend that approach.

@randall77

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2017

@bcmills Yes, two bits per word should work. That just buys us more flexibility to put the distinguishing bits somewhere other than the top bits of the word - not sure if that is worth it.

@rasky

This comment has been minimized.

Copy link
Member

commented Mar 20, 2017

I love this proposal in abstract, but I'm very concerned about the performance impact. I think it's not by chance that "no language in its domain has this feature". If we use bound-checking elimination has a similar problem, Go compiler isn't very good at it even nowadays, it basically just handles obvious cases, and doesn't even have a proper VRP pass (the one proposed was abandoned because of compile time concerns). Stuff like a simple multiplication would become a call into the runtime in the general case, and I would surprised if the Go compiler could avoid them in most cases, if we exclude obvious cases like clearly bounded for loops.

@griesemer

This comment has been minimized.

Copy link
Contributor

commented Mar 20, 2017

@rasky Languages likes Smalltalk and Lisp (and more recently, JavaScript) have pioneered the implementation of such integers - they can be implemented surprisingly efficiently in the common case. A typical implementation reserves the 2 least significant bits as "tags" - making an int on a 64bit platform effectively 62bit in the common (small) case - which is likely more than plenty in most cases (and where it isn't it's either because we need very large integers, or we should be using int64).

One way of using the tag bits is as follows:
00 means small integer (smi)
01 means that the value is a pointer p to an arbitrary precision integer (at p-1)
10 typically the result of an operation (see below)
11 reserved for other uses or unused

Given this encoding, if we have two int values x and y, we can optimistically assume they are smis. For instance x + y would be translated into a single ADD machine instruction, followed by a test of the tag bits. If they are not 00, one (or both) of the operands were large ints and then a runtime call is needed. Also, if there was an overflow, a runtime call is needed. This does add 3 instructions in the fast path, but not more (one conditional jump if overflow, one test of tag bits, one conditional jump if tag bits are not 0 - perhaps more depending on how the runtime call is achieved or if there's a conditional call instruction). It's not cheap, but it's much less expensive than a runtime call in the common case. If both operands were smis, the tag bits remain 00, and the result doesn't need further work. The same is true for subtraction. Multiplication requires a bit more work but is also more expensive. Finally, division is the most complicated one, but also the most expensive operation in general.

It might be worthwhile performing a little experiment where one generates this additional code for each integer addition, using dummy conditional branches that will never be taken (or just jump to the end of the instruction sequence) and see what the performance impact is.

@DmitriyMV

This comment has been minimized.

Copy link

commented Mar 20, 2017

Not a fan of this proposal - currently its quite simple to argue about resulting code and its performance characteristics when doing simple arithmetics. Also - even if losing two bits on 64 bit platform is not important, on 32 bit one it is.

Maybe we could have an arbitrary precision ints implemented in new built-in type (like we do with complex currently)?

@gopherbot gopherbot added this to the Proposal milestone Mar 20, 2017

@gopherbot gopherbot added the Proposal label Mar 20, 2017

@dmitshur dmitshur referenced this issue Mar 21, 2017

Closed

int64 IDs #597

@dmitshur

This comment has been minimized.

Copy link
Member

commented Mar 21, 2017

Can you discuss how such int variables would represented in mediums other than RAM, and marshaled/unmarshaled?

Encoding to JSON should be easy and map really well. As far as I know, JSON spec does not place restrictions on size of numbers, so a really large int would encode as as base 10 number (and vice versa for decoding). E.g.:

{"number": 12312323123131231312312312312312321312313123123123123123123}

Would map to an int with value 12312323123131231312312312312312321312313123123123123123123.

What about something like encoding/gob or encoding/binary?

@griesemer

This comment has been minimized.

Copy link
Contributor

commented Mar 21, 2017

@shurcooL

  1. printing is obvious (the usual human-readable forms) - applies to JSON
  2. encoding/gob could use a private internal representation
  3. encoding/binary is for fixed-width numbers at the moment, the proposed int's wouldn't be - but a var-int encoding could work (though the current format would probably be inefficient).

Re: 3, note that the compiler's import/export format already encodes arbitrary-sized integer values because of precise constants.

@magical

This comment has been minimized.

Copy link
Contributor

commented Mar 21, 2017

@shurcooL @griesemer I believe encoding/gob already uses a variable-length encoding for all integer types.

@jasonwatkinspdx

This comment has been minimized.

Copy link

commented Mar 21, 2017

Go should have an arbitrary precision number type that is more convenient than math.Big. That type should not attempt to masquerade as int/uint, as these aren't just used semantically as "number" but more so as "number compatible with c/foolang code that uses the natural local word size".

The root problem here is that golang's design prevents a library from defining an arbitrary precision type that is as semantically and syntactically convenient as a type provided by the runtime/compiler with internal knowledge and cludges. Solve that problem with golang 2.0, or instead we will find ourselves years from now with countless ad hoc accretions like this.

Edit: I'm a fan of this design/feature in scripting languages. I don't see how it works as the base int type in a systems language to replace c/c++/java. I absolutely think we should have a great and convenient arbitrary precision number type, and think the road to that is a golang where library defined types are not second class to ad hoc boundary flaunting extensions of the runtime and compiler provided types.

@bcmills

This comment has been minimized.

Copy link
Member

commented Mar 21, 2017

@jasonwatkinspdx

It's true that one of the roles of int in Go 1 is as an analogue to C's ssize_t. But it doesn't actually matter whether int is exactly the size of the address space or merely sufficient to cover the address space.

Perhaps more importantly, C APIs don't actually use ssize_t all that often: they use size_t instead.
You have to range-check conversions from C.size_t to int, because the latter is potentially only half the range of the former. Under this proposal, you'd need the range check in the other direction instead, because C.size_t would now be smaller than int. The only way to avoid the need for a range check entirely is by making the Go "size" type have exactly the same range as C.size_t, and at that point you're looking at a fairly high risk of overflow bugs (especially from decrementing past zero).

@jasonwatkinspdx

This comment has been minimized.

Copy link

commented Mar 21, 2017

@bcmills

But it doesn't actually matter whether int is exactly the size of the address space or merely sufficient to cover the address space.

It does matter. People make design decisions concerning the size of the address space and how indexes into it can be represented in serializations or other transformed values. Making the type that spans the address space arb precision offloads many complexities to anyone that wants to ship data to other systems.

What does a c abi compatible library consumer of golang look like in a world where int/uint is arb precision? Is it really better if the golang side is blind to any issue at the type level and the c side has no choice but panic?

I do see the value of these types, I just don't want them conflated with int/uint. I'm entirely in favor of a numeric tower in golang being the default numeric type, I just don't want it to pretend to have the same name as the 32bit/64bit machine/isa determined types.

@bcmills

This comment has been minimized.

Copy link
Member

commented Mar 21, 2017

People make design decisions concerning the size of the address space and how indexes into it can be represented in serializations or other transformed values.

Can you give some examples? I'm having a hard time thinking of anything other than a syscall that spans a process boundary but should be sized to the address space, and programs using raw syscalls generally need to validate their arguments anyway.

What does a c abi compatible library consumer of golang look like in a world where int/uint is arb precision?

The same thing it looks like today: using C.size_t instead of int.

@bronze1man

This comment has been minimized.

Copy link
Contributor

commented Mar 21, 2017

@robpike
It looks harder to reduce CPU of golang program when ints become arbitrary precision. 😝
I do not need arbitrary precision ints by default, but I do need golang program to use less CPU to save my money with gce vm server.

@jasonwatkinspdx

This comment has been minimized.

Copy link

commented Mar 21, 2017

programs using raw syscalls generally need to validate their arguments anyway.

I want to capture that in the types, not in a dynamic value range check. I don't think this is an unreasonable expectation of a language that markets itself as a replacement for c/c++/java for systems programming.

@cznic

This comment has been minimized.

Copy link
Contributor

commented Mar 21, 2017

I honestly thought it's 11 days more than it really is.

I don't want to lose normal int performance. I don't believe there's a space/time effective way for an arbitrary precision int to be, in many cases, comparable in performance to the fixed width int.

I have nothing against adding an arbitrary precision mpint type (name does not matter), which the compiler accepts mixed in expressions with normal ints providing the conversions as needed. IOW, it would be really nice to be able to use standard operators with arbitrary precision integers, yes, but please only explicitly. Please leave int alone.

@ainar-g

This comment has been minimized.

Copy link
Contributor

commented Mar 21, 2017

What about floats? JSON can have values like 12345678901234567890.12345678901234567890, with many digits before and after the dot, but IIRC no Go type can accurately represent those. That is, no Go type can do maths with it, I know about json.Number.

Personally, I would keep int as it is and instead added a number type that could represent any number with or without a fractional part.

@bronze1man

This comment has been minimized.

Copy link
Contributor

commented Mar 21, 2017

Please do not make int operation slower by default.
I do not need arbitrary precision ints by default, but I do need golang program to use less CPU to save my money with gce vm server.

I do not need some like mpint with Infix expression eithor, *big.Int is enough for me.

@mvdan

This comment has been minimized.

Copy link
Member

commented Mar 21, 2017

I don't understand the performance concerns - even if the performance hit would be noticeable in some scenarios, could you not use sized types like int32 or the unsigned word (like the current uint) mentioned by @griesemer?

@CrimsonVoid

This comment has been minimized.

Copy link

commented Mar 21, 2017

I think this might be a good change, but I do have some concerns regarding bitwise operations and performance impact. In my experience, int tends to be the default type for numereic values which makes me a little hessitant without seening before and after numbers.

@Merovius

This comment has been minimized.

Copy link

commented Aug 15, 2018

@robaho IMO math.Log(x) is a natural operation to want to do for floating point numbers but not one that can be done with arbitrary precision. I.e. you won't get around the need to have a fixed precision "natural" floating point type.

@lpar

This comment has been minimized.

Copy link

commented Aug 16, 2018

No reason why you can't have math.Log(x, prec) to compute a log to specified precision using arbitrary precision values.

@tgulacsi

This comment has been minimized.

Copy link

commented Aug 16, 2018

@lpar

This comment has been minimized.

Copy link

commented Aug 16, 2018

For that operation, yes. That's the point. A single operation being limited precision doesn't mean you have to cripple your data type to support it.

@tgulacsi

This comment has been minimized.

Copy link

commented Aug 16, 2018

@lpar

This comment has been minimized.

Copy link

commented Aug 16, 2018

By that argument, since we can't represent pi exactly as a BigDecimal, we should remove BigDecimal from the language and only have floats.

That's a silly argument against BigDecimal.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2018

Let's please move the discussion about arbitrary size floating point types to a different issue. This issue is about changing int to be an arbitrary sized integer type. In particular note that the language has no float type. Thanks.

@lpar

This comment has been minimized.

Copy link

commented Aug 16, 2018

Since the language doesn't have an integer type either, how about making integer the type of arbitrary-precision ints rather than changing int? That would avoid breaking existing code.

@robaho

This comment has been minimized.

Copy link

commented Aug 16, 2018

@ianlancetaylor sorry, wasn't trying to hijack/confuse. I was only only trying to point out that why not have double and double64 then, with double being arbitrary sized. just seems wrong special casing it for int (rather than just using a BigInteger class). I would think operator overloading would be cleaner and more useful (although not a big fan of most uses of operator overloading...), but that is another can of worms

@TuomLarsen

This comment has been minimized.

Copy link

commented Nov 29, 2018

What would happen to big.Int if this gets accepted?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

commented Nov 29, 2018

@TuomLarsen It's implementation would become trivial. If we ever do a math/big/v2, it might be removed.

@TuomLarsen

This comment has been minimized.

Copy link

commented Nov 29, 2018

@ianlancetaylor On the other hand, it seems to me that one of the advantages of big.Int is that is provides greater control for managing memory. What would be the equivalent of Int.Set? Does = do deep or shallow copy? I'm asking because perhaps "dest ← op1, op2" of big.Int is more flexible but its functionality would overlap somewhat with this proposal.

@MichaelTJones

This comment has been minimized.

Copy link
Contributor

commented Nov 29, 2018

@robaho

This comment has been minimized.

Copy link

commented Nov 30, 2018

Upon reflection, I think this change would break to far from the languages C/system heritage. Better handled in libraries IMO.

@faiface

This comment has been minimized.

Copy link

commented Nov 30, 2018

@TuomLarsen I'd expect it'd work just like with strings today. Both strings and ints are immutable, so = would do a shallow copy, but all operations (+, +=, etc) would make a new instance by default. Of course, compiler would optimize cases where it's unnecessary to make a new instance and instead rewrite an old one.

@nathany

This comment has been minimized.

Copy link
Contributor

commented Dec 1, 2018

@TuomLarsen It's implementation would become trivial. If we ever do a math/big/v2, it might be removed. #19623 (comment)

What is the interplay between this proposal and #20654 math/big: support for constant-time arithmetic?

@griesemer

This comment has been minimized.

Copy link
Contributor

commented Dec 1, 2018

@nathany #20654 is about specialized large integer arithmetic where operations (such as +, *, etc.) always take the "same" amount of time independent of the input values. For instance, if huge represents a large integer value, huge + huge must take the same time as huge + 0 (even though in the latter the algorithm might be able to tell right away that there's nothing to do because one argument is 0). Making such operations' runtime independent of the arguments makes it impossible to "guess" arguments based on runtime. When such guessing is possible, this "side channel" (the timing of operations) becomes a valuable data source for crypto hackers.

There's no need for such constant-time operations in the language implementation. Just the opposite is true: we don't want constant-time operations because this proposal assumes that we can do all kinds of tricks to avoid the cost of a large integer operation: Most integers will be small, and in those cases the costs of operations should be close to what they are now (and inlined in the generated code).

I see #20654 as a proposal for a specialized library, built from the ground up (and tested as such) for crypto. I think it would be very hard to retro-fit big/int to support both (optimized and constant-time operations) and be sure to get it right in all cases.

@TuomLarsen

This comment has been minimized.

Copy link

commented Jan 5, 2019

I would like to add another data point which I just encountered. Provided there was an exponentiation operator, one could write

-3*x**5*y

instead of

new(big.Int).Mul(big.NewInt(-3), new(big.Int).Mul(new(big.Int).Exp(x, big.NewInt(5), nil), y))
@robaho

This comment has been minimized.

Copy link

commented Jan 5, 2019

@TuomLarsen I would like to point out that the latter example does not have to be that bad, it could easily be

NewInt(-3).Mul(Exp(x,NewInt(5))).Mul(y)

The required syntax you cite is somewhat a result of not using 'dot imports' for common core numerical classes (which I've discussed before). I think it makes sense, others argue against it, but anyway

And with method overloading it could easily be: (if x & y are already big.Int)

Int(-3).Mul(x.Exp(5)).Mul(y)

Still not great, but making code like this readable needs to be part of the API design from the start.

@bakul

This comment has been minimized.

Copy link

commented Jan 5, 2019

Not clear to me if this proposal is going anywhere but on rereading I realized I forgot to mention one point in my earlier comments: A separate arb. prec. integer type (hence forth called integer) would help the GC (at some cost to the programmer). With an explicit integer type with no automatic promotion of int variables to it, the GC knows there are no pointers in an []int. For []integer it would have to check each word if tagging is used to minimize allocation. But whether to use tagging or always allocate storage is up to an implementation and only affects the integer type.

@nnemkin

This comment has been minimized.

Copy link

commented Jan 20, 2019

A relevant case: Dart had switched from arbitrary precision integers in Dart 1 to fixed 64-bit integers in Dart 2. See their feature spec: https://github.com/dart-lang/sdk/blob/master/docs/language/informal/int64.md.

@carlmjohnson

This comment has been minimized.

Copy link
Contributor

commented Jan 20, 2019

For precompilation it's not possible to generate the mint/bigint code lazily. Typically, the generated code only contains inlined SMI instructions and falls back to calls when the type is not a SMI. Before being able to use the SMI code, instructions have to be checked for their type (null, or Mint/BigInt) and after most operations they need to be checked for overflows. This blows up the executable and generally slows down the code. As a consequence, Dart 2.0 switches to fixed-size integers.

Seems like their issue was that the JIT could handle the overflow but precompilation resulted in overly bloated code. Anyone want to do a PoC for Go to see if it’s too slow/bloated or not?

@seebs

This comment has been minimized.

Copy link
Contributor

commented May 12, 2019

i'd like to distinguish between two things that i think would both be useful:

  1. an integer type which can be of arbitrary size.
  2. the ability to declare arbitrarily-sized integer types.

i think these are both useful, and fill different needs. i have more than once wanted the ability to declare int128 or int256 objects and have them Just Work -- and that's much simpler than arbitrary-precision ints. and it might allow better performance, and work for many cases, but not solve some of the other cases.

both? both is good.

@JAicewizard

This comment has been minimized.

Copy link

commented May 12, 2019

maybe allow every intxx/uintxx where xx has some constraints to it so the compiler can still generate code for it without needing a per-type implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.