/ go Public

Closed
opened this issue Oct 17, 2018 · 42 comments
Closed

# proposal: arbitrary-radix integer literals #28256

opened this issue Oct 17, 2018 · 42 comments
Labels
FrozenDueToAge Proposal v2 A language change or incompatible library change
Milestone

### griesemer commented Oct 17, 2018

I've brought up this idea several times before informally. I'm filing this issue now for the formal documentation trail.

Currently, Go permits octal, decimal, and hexadecimal integer literals. There's a pending proposal for binary integer literals (#19308) which has wide support.

#### Proposal:

This is a fully backward-compatible proposal for arbitrary-radix integer literals. We change the integer literal syntax to the following:

int_lit = decimal_lit | octal_lit | radix_lit .
decimal_lit = ( "1" … "9" ) { decimal_digit } .
octal_lit = "0" { octal_digit } .

with

radix_digit = "0" … "9" | "A" … "Z" | "a" … "z" .

representing the digit values 0 to 35 (for a maximum radix of 36). The radix must be a decimal literal between 0 and 36, expressing the radix; with the radix value 0 having the same meaning as 16, and the value 1 being invalid.

Examples:

``````0x10   // same as 16x10 or 16
2x1001 // binary integer literal, same as 9
3x010  // ternary integer literal, same as 3
8x066  // octal integer literal, same as octal 066 or 54
36xz   // integer literal in base 36, value is 35
``````

#### Discussion:

The beauty of this approach is that it permits arbitrary radix notation, thus removing any future need to expand this again, remove the need for the extra notation for hexadecimal numbers because they are just part of this notation, and at the same time it's fully backward-compatible. The commonly accepted notation for binary integer literals and the respective notation here have the same length and the proposed notation here seems just as intuitive (e.g., 0b1001100 == 2x1001100).

We could go a step further and remove octal literals from the language since they are also easily expressed with this notation, but that's a step that would not be backward-compatible. One way to make that happen w/o introducing bugs would be to disallow non-zero decimal numbers that start with a 0; octal numbers in existing code would then lead to a compiler error and could be fixed. It would also be trivial to have them fixed automatically with a simple tool. Finally, removing octals would eliminate another (albeit mostly academic issue) with them; see #28253. If octals were not supported anymore, one could condense the integer literal syntax to:

int_lit = decimal_digit { decimal_digit } [ ( "x" | "X" ) radix_digit { radix_digit } ] .

#### Implementation:

The implementation is straight-forward. It would likely slightly simplify some of the scanning code for numeric literals because with this proposals now all such literals simply start with a decimal_lit always. If that value is zero, or between 2 and 36, a subsequent 'x' indicates the actual literal value in that radix. The respective number conversion routines are trivial and would need minimal adjustments.

#### Impact:

Hard to say. It may be sufficient to just add another notation for binary integer literals per #19308. Or we could do this and lay the issue to rest for good.

added this to the Proposal milestone Oct 17, 2018
mentioned this issue Oct 17, 2018

 In Go, I have never wanted to write an integer literal with radix other than 2, 8, 10, or 16. I have also never read code that would have used such literals, had they existed. Therefore, the benefit seems extremely low. The fact that the existing hexadecimal syntax doesn't fit directly into the proposed syntax but requires a special case of 0 ≡ 16 significantly detracts from the appeal.

### dr2chase commented Oct 17, 2018

 I like the idea of removing the leading-zero octal notation. That's a source of annoying errors, and simplifies explaining the language for new users ("don't do this, you'll be surprised" vs not mentioning alternate base notation till it is needed).

 @cespare I would have formulated your 2nd paragraph slightly differently: The fact that the existing hexadecimal syntax neatly fits directly into the proposed syntax significantly adds to the appeal. :-)

### beoran commented Oct 18, 2018

 While I see the appeal of having a consistent syntax, I fear this would become a very obscure feature. I never felt then need for anything else but binary, octal, decimal and hexadecimal integer constants. Binary integer literals are useful in many cases involving bit twiddling, octal is useful for file permissions, hexadecimal is useful for compact notation of bytes. But trinary or twentyone-ary, seems to be useful for obfuscation only. I do like the idea of changing then notation for octals, now it's still the confusing C notation. And I do like the uniform notation you propose. I would just disallow anything else than base 2, 8, 10 and 16 to avoid such obfuscation. Otherwise, could you please show us a few production open source code bases where the use of such arbitrary radix integer constants would have been beneficial?

### griesemer commented Oct 18, 2018

 I'd be ok with the restriction to 2, 8, 10, and 16, but why? It would make things (a tiny bit) more complicated; the only reason I'd see is that it might perhaps eliminate errors (somebody might write 9x066 rather than 8x066 for a file permission). I agree that most programmers may not care much about the flexibility here, they'll be just fine that they can write down numbers in all the commonly used radixes (2, 8, 10, 16) w/o extra cost (one extra char for octal) and use a single, uniform notation. Personally, I think that not having arbitrary radix notation is what prevents us from thinking it might be useful. Now usefulness alone is not a criteria for adding something to the language, but it this case it would address the desire for a binary notation and simplify what we already have, and remove restrictions. Seems like a win-win to me. Keep in mind that there's really strong support for adding binary integer literals, so no matter what, we'd have to make changes in all the same places. The difference is just whether we add one more special case, or whether we simplify all the code in favor for a uniform notation. Finally, there's also the educational aspect of Go: Having a simple, uniform mechanism here rather than an agglomeration of historical notations seems like a nice cleanup. Btw., Smalltalk supports arbitrary radix notation, too, using the same syntax but with an 'r' instead of an 'x'. Using the 'x' permits the most common other base notation fit neatly into the system.

### randall77 commented Oct 18, 2018

 I'd be ok with the restriction to 2, 8, 10, and 16, but why? Because that's 32 = 36-4 fewer bases you need to understand when reading code. `23xag56m`? It gets very confusing very quickly. I think I'd rather see `((((10*23+16)*23 + 5)*23 + 6)*23 + 22` or something (an exponent operator would help here). Hexidecimal is certainly useful. Binary and octal seem marginally useful. Other bases just don't seem useful at all. Certainly their value isn't worth burdening the reader with them.

 I am interested in smalltalk you mentioned. Could you point out any open source smalltalk project that uses arbitrary bases to good effect? My feeling is that the feature will be rarely used and more of a source of frustration due to typos as you mentioned.

### cespare commented Oct 18, 2018

 I'd be ok with the restriction to 2, 8, 10, and 16, but why? I don't think we should use this proposed syntax with such a restriction. I think that, if anything, we should just add the `0b` syntax for binary literals and be done with it (then Go will have all of base 2, 8, 10, and 16 literals). a single, uniform notation I don't agree that this proposal is uniform; it introduces more ways of writing the same integer literals: As you mention in your proposal, the existing octal syntax doesn't match, so there will be two different ways of writing octal integers unless we take the further, backward-incompatible step of removing the current octal syntax. The current way of writing hex integers doesn't exactly fit into the scheme, so the proposal includes a special case for 0 to have the same meaning as 16. There will forever be two ways of writing hex integers: `0x2a` and `16x2a`.

### griesemer commented Oct 19, 2018

 @cespare Not to be facetious, but with the 0b notation there will also forever be two ways of writing a "hex" number: 0x2a and 0b00101010 . I'd see that as much bigger problem - there will be plenty of people arguing that one is better than the other. Realistically, with the radix notation, people will stick to the shorter 0x notation rather than 16x (but either way, the actual hex number looks the same). What you are saying really was one of the reasons for not including 0b from day one: There's already a suitable notation, namely 0x.

### josharian commented Oct 19, 2018

 For instance, I can see how I'd use a small-n (3, 5, etc.) radix to encode multiple values of n states in a single int, e.g. for some state on a game board.) There is also the suggestion to support intN for all N from @jimmyfrasche: another way to handle this would be to create a new class of paramaterized integer types. This is bad syntax, but, for discussion, let's say it's I%N where I is in an integer type and N is an integer constant. All arithmetic with a value of this type is implicitly mod N. When working on a RISC-V port, I wanted a uint12 type, since my instruction encoding components are 12 bits; that could have been uint % (1<<12). Lots of bit-manipulation, particularly protocols, could benefit from this. I can see game states similarly benefitting from intN. In contrast, I can't think of any real world use cases for arbitrary radix constants. Just another data point.

### beoran commented Oct 19, 2018

 To answer your questions, I think, 1. yes we need binary constants because they are useful for bit masks and other bit twiddling. And 3. Dropping C style octals and replacing them is a good idea, because C style octals are a source of beginner bugs. Though I would probably go for 0o765 notation, although seeing the Smalltalk precedent 08x765 would also be ok. As for 2. Actually I don't care too much either way about the notation, as long as we limit it to bases 2, 8, 16 and maybe 10.

### RalphCorderoy commented Oct 19, 2018

 The choice of `x` is poor because it reads as multiply in `4x23` and I have to correct that, interrupting flow. We've trained ourselves to know `0x` is hex, partially by knowing multiplying by zero is pointless. Using `x` feels like a wheeze just to remove `0x` as one of the special cases. gri points out Smalltalk uses `r`; Ada uses `#`, as in `radix#digits#`, but it seems a shame to waste that unused character for this. The referenced issue calling for binary literals has quite a lot of voices saying they're not needed. I won't paste in arguments from there to here, but it's not clear to me that the case for needing them has been made. More literal bases is a step towards Perl's There's more than one way to do it and away from Go's `gofmt`. It widens an argument over how some literals should be written and we'll see calls for `gofmt`, `vet`, and `lint` to stand in judgement. Some programmers have never mastered hex, see that other issue, and will want 2x11111111110 because 0xffe is unreadable. Yes, that was a deliberate mistake to show few readers will want to count a run of the same digit so then there will be calls for underscores, a la Ada, as separators, with arguments over where to separate. It doesn't matter those programmers won't be coding on your project; you and I will still have to read their project. It's a shame octal nabbed 0755 instead of 0o755, no capital O allowed, but other than that things seem fine as they are. And deprecation of 0755 for a new octal format can be done, as gri outlined, without adding base 2 or base 2-36.

### as commented Oct 21, 2018

 Some programmers have never mastered hex, see that other issue, and will want 2x11111111110 because 0xffe is unreadable. It's hard to believe there are programmers that can mentally reason about 64 digit binary literals but not hex. The digits are not zero padded so to even determine what bit is set, you need to determine the number of digits in the number. Easy with base16, but are there really any examples of binary integer literals serving a useful purpose other that tables of constants rendered by a monospaced font that are rigorously whitespace alligned or zero padded? The gofmt is not going to move these numbers to the right either. Small values will be difficult to see clearly. I suppose that could solved by using 2b01 and 2b10 though.

 To make binary literals more readable some languages also allow the use of separators. For example, in C# you can write something like this `0b0010_0110_0000_0011` In fact, C# allows underscore to be used in any numeric literal, not only in binary ones. In my opinion, even for 64-bit literals binary representation would be much more readable if you need very specific bits to be set. Hex values always require a bit more thinking and conversion in your head even if you know hex perfectly well. It's simple for one byte values but gets harder as you go further (the argument about counting digits applies here even for hex) and add values with multiple bits set where simple pattern of `1, 2, 4, 8, 10...` no longer holds and you have to convert to binary in your head or just use a calculator.

### RalphCorderoy commented Oct 22, 2018

 Sorry, I didn't mean to drag this back to a rerun of #19308, but to point out that widening the choice of ways to do something, write 0xffe, ripples out into formatting and tools. What demand there is for base 2-36 could be lessened by two things touched on in earlier comments. Keith gave an example for it being easier to read the manual multiplication and addition for a base 23 number. Syntax for array multiplication, AKA Hadamard product, perhaps introduced for vector instructions, would give an alternative. As he said, an exponent operator would help. ``````base23 = [6436343, 279841, 12167, 529, 23] ag56m = base23 * [10, 16, 5, 6, 22] `````` That would also allow for mixed-radix numbers; units of time being a common example. Josh referred to intN for all N, e.g. int12. That might be too general, and uintN for 1 ≤ N ≤ 32 good enough for most cases. Verilog has something similar and combined with a bit-catenate operator allows `x<<18 | y<<5 | z` to be `x :: y :: z`, with `::` picked at random. This removes the overlap error where x should have been shifted by 19 to avoid y, and means typed constants with many bits overlaying fields of varying widths can be written more easily as their parts: `uint3(2) :: uint13(0x1f0a) :: uint5(0x0f)`. This is handy when the fields don't fall on nibble boundaries; compare 0xbe14f. I'm not strongly arguing for either of these, just pointing out that if there is any movement towards them then they overlap with the need for a base 2-36 notation.

added the v2 A language change or incompatible library change label Oct 24, 2018

### dgryski commented Oct 24, 2018

 In fact, C# allows underscore to be used in any numeric literal, not only in binary ones. Also Perl. ``````\$ perl -E 'say 123_456_789;' 123456789 ``````

 In fact, C# allows underscore to be used in any numeric literal, not only in binary ones. Also Perl. ``````\$ perl -E 'say 123_456_789;' 123456789 `````` Also python. ``````>>> print(123_456_789) 123456789 ``````

### hooluupog commented Oct 27, 2018

 Java, ``````jshell> System.out.println(123_456_789) 123456789 ``````

### RalphCorderoy commented Oct 27, 2018

 I expect there's quite a few languages that permit underscore in some numeric literals. Ada was just the first I encountered. Like ditching 0751 as the octal syntax, these underscores would seem to be orthogonal to whether base 2-36 is required. They can be an aid to readability on long literals, but also allow more formatting choice by the author, and disagreement with everyone else. (Perl accepts `3._1__4_` for 3.14, though warns if warnings are explicitly requested.) It's tempting to dictate the allowable formats, e.g. integers must either no underscores, or they must be every three digits from the right: `2_718_281`. Hex could be split on nibble boundaries, etc. But that rules out splitting based on the field boundaries underlying the literal, e.g. a 12-bit nibble-aligned field.

### alanfo commented Oct 27, 2018

 Although I'd normally welcome improvements to the numeric aspects of the language, I'm finding it very hard to get enthused about this proposal. The demand just doesn't seem to be there for bases other than 2, 8, 10 and 16 and, even it was, I don't think the change could be made in isolation. People would then be asking for a simple way to print these numbers out. Currently the formatted print functions in the standard library support only the standard bases with their %b, %o, %d and %x verbs so new verbs would need to be added to print out values for arbitrary bases. In other words what are already very complicated functions would become even more so. Nor do I like the proposed syntax. The use of the letter 'x' as a divider seems inappropriate as the other radixes have nothing to do with hex and for the highest radixes it's even a digit itself. I also dislike the discontinuity for hex itself when 16 suddenly becomes 0. It's worth remembering that we already have support for radixes from 2 to 36 in the `strconv` package with the `FormatInt` and `ParseInt` functions. Although string based and hence relatively inefficient, I'd have thought this should be enough for anyone who wants to play around with different radixes for educational purposes. Although on balance I'd support it, I'm not even sure that adding binary literals (with a 0b prefix) is such a great idea unless a digit separator (such as _) is introduced at the same time. The reality it that once you get past one or two bytes, binary literals become unreadable. As for octal, if one surveys the current state of C family languages, the traditional ones (C, C++, Java) all use the leading zero notation and the newer ones (Swift, Rust) use an 0o prefix. It seems to me that compatibility with the former is much more important for Go and that the leading zero notation should therefore be retained. As no one appears to be seriously complaining about this, it's just not worth the hassle of changing it. Having said that, if binary literals are introduced, then for the sake of consistency I wouldn't necessarily be against adding an alternative 0o prefix for octal with people being advised to prefer that unless they were using `cgo`.

### creker commented Oct 27, 2018

 I don't think Go needs compatibility with any language, especially C/C++. Go is already quite different from C family of languages that there's no point in clinging to them. If we're going to look elsewhere we should really look at what modern languages are doing, not the ancient ones that riddled with questionable design decisions and years of backwards compatibility. If we were to add `0b` prefix it would be really preferable to also change `0` to `0o` just for the sake of consistency. gofixing it would be really easy.

### alanfo commented Oct 27, 2018

 I'm not denying that the leading zero syntax for octal was a questionable design decision for C in the first place. It's more a question of what people expect and anyone coming to Go from the traditional languages is going to expect it to deal with octal literals in the same way. Also it's not just a matter of `go fix` changing `0` to `0o`. You'd also need to change the language to prevent non-zero integer literals from beginning with `0` at all, otherwise the change could potentially be very confusing.

mentioned this issue Oct 30, 2018

### alanfo commented Oct 31, 2018

 I'm still not keen on the 2x and 8x prefixes and would much prefer your alternative notation of 0b and (if we must change octal) 0o. That would be consistent with the verbs in the formatted print statements and also with what Swift and Rust do. If we are to have binary literals then, in the interests of readability, I think #28493 is a necessity and it would also help with other long numbers.

### RalphCorderoy commented Oct 31, 2018

 Hi @griesemer, I realise from your opening Discussion that reusing the `x` is to lessen the change, but 8x32 just looks wrong because the `x` unconsciously reads as times. If hex's syntax had been 0h instead of 0x then I guess the proposal would be 8h32. That reads no better and it's because x and h are both mnemonic for hex and trying to contort them to another purpose goes against that long-learned language-agnostic connection. 0b and 0o have fans because they continue this mnemonic use of the letter. If you want a syntax open to future radixes then adopting a new letter avoids thwarting what's already learnt, e.g. r for radix in 8r32. Rewriting octal is already being considered, partially to avoid the beginner error of leading zeroes on base10. If that just leaves 0x as an oddity, given a new 0r syntax, then, 16r0fc0 is at least consistent, but it's noisy compared to the leaner 0x0fc0 that we all love, and parse without thinking. :-)

### alanfo commented Oct 31, 2018

 That's a good point about `x` reading like a multiplication symbol. I find it difficult to imagine any base outside 2, 8, 10 or 16 becoming popular in the future but, if one did, then other languages would also be under pressure to support it. Perhaps a consensus might then emerge on the best notation to use which Go could follow rather than coming up with its own.

 Bases other than 2, 8, 10, and 16 are extensively used for example in the handling of Bitcoin, Ethereum, and IPFS (all of which have existing implementations in Go). Whilst it's true that all these projects exist and thrive without having base 32 and base 58 literals available, there is no good reason why programmers who frequently use that base should make their code less readable or less expressive. I think Robert's proposal is perfect, he doesn't seem to have overlooked anything. If I were forced to complain about anything, that would be that I'd like to see this feature support up to base 58 for reasons stated above, but I reckon that may be a little too much to ask, because there are various different base encodings for bases above 36. (for example, the alphabet for Bitcoin's base 58 encoding is crafted to remove ambiguity in numbers as read by humans, that's the reason there are no Bitcoin base 58 addresses containing the character l (lowercase L), to avoid confusion with the number 1. So that's the reason that makes me think the base 36 upper bound is good enough, it corrects the glaring omission of base 2, it's a consistent syntax for any integer literal, it promotes readability, and easy learning one single rule for all bases. It's as perfect a solution as you can get. Good work, @griesemer.

### alanfo commented Oct 31, 2018

 @htrob ISTM that you're really arguing here for a base32 encoding to be added. base58 would be out of the question because, unless we distinguish between upper and lower case letters (which wouldn't fit in with hex), we simply don't have enough potential digits and, even if we did, some of them are omitted by base58 as you've pointed out yourself. With the exception of base32hex, base32 also suffers from having several different alphabets which are not consistent with the original proposal My view is that it's best to process them as strings or byte slices as we do now.

 Or perhaps, for the more exotic bases, better compile-time evaluation of pure functions applied to constant strings. E.g. ``````x := EthB58("WQERDSFDEdjhjjdk11234567") `````` That by itself wouldn't make the result eligible for use as a constant, however.

### haiitch commented Oct 31, 2018

 @alanfo This is the part where I said "but I reckon that may be a little too much to ask", which you may have missed. I am happy already if base2 to base36 are supported as per @griesemer 's design. I also said I think @griesemer's design is perfect, I'm not sure what is your concern about that, maybe you can illustrate how you think it can be made better.

### alanfo commented Oct 31, 2018

 @htrob Well, I detailed my concerns about the original proposal at some length in my first post to this thread. But, as @griesemer has since narrowed it to only allow 0, 2 and 8 as radix prefixes, there's not much point in going over the same ground again. The question now is whether 2x or 8x should be preferred to the more familiar 0b and 0o which he offered as an alternative. As I don't like the use of `x` for various reasons, I'm firmly in the latter camp.

### haiitch commented Nov 3, 2018

 It's always been clear what's your position, and I still honestly believe it's wrong. I already explained exactly why I think it's more practical, readable, and useful to the new Go programmer going forward to accept @griesemer's proposal, which I believe made a far more solid argument than "I don't like". No amount of "I don't like" is likely to convince me that I shouldn't state what I believe is a better way. It's agree to disagree territory I guess, so... yes, I agree there's no point in going over matters of your personal preference. Cheers.

### cznic commented Nov 3, 2018

 The status quo in integer literal is IMO more than sufficient wrt what's needed. Even plain decimal only would be perfectly enough, just use a comment. `const LaunchMask = 141836999991328 // 1000 0001 0000 0000 0000 0000 0000 0000 0010 0000 0010 0000` But I don't want to see such monstrosities, as the comment is, in source code. As a comment it's just fine.

### creker commented Nov 3, 2018

 @cznic with comment you introduced an even bigger problem that's common to comments in general - they could be out of date or plain wrong. Single bit error is enough to throw people off that will inevitably rely on these comments. And no amount of testing would catch that. Even code review may not always catch when at some point someone decides to format it like so `const LaunchMask = 141836999991328 // 10000001 00000000 00000000 00000000 0010000 00100000` Good luck catching an error. At least with binary literals you can write tests.

### cznic commented Nov 3, 2018

 At least with binary literals you can write tests. Tests can be written without them as well. ```func TestFoo(t *testing.T) { n, err := strconv,ParseUint(strings.Replace("1000 0001 0000 0000 0000 0000 0000 0000 0010 0000 0010 0000", " ", "", -1), 2, 64) if err != nil { t.Fatal(err) } if g, e := n, 141836999991328; g != e { t.Fatal(g, e) } }``` Also, I have yet to see a test that tests for the equality of a constant against a literal value. I think `vet` would not be happy about that. Can we estimate the share of programs that would ever use, per this proposal, something like an int literal `2x100000010000000000000000000000000010000000100000`? My guess is it might be well less than a promile and that's another reason why I'm not in favor of the proposal.

### creker commented Nov 3, 2018

 @cznic I fail to see the relevance of this test to my argument. My point is, your comment might be wrong. People will see it, rely on it and report bugs or simply waste time until discovering that the comment was wrong and they need to manually check the bits in the calculator. One can argue that no comment at all would be better. I'm not talking about testing the exact value of a constant. Its value could mean some feature flags that you could pass to your function during tests. Very common for libraries to have constants with default flags set. With wrong comment tests would be green. With an error in a binary literal tests could immediately catch it. Can we estimate the share of programs that would ever use, per this proposal, something like an int literal Binary literals are useful even for much smaller literals. Share of programs would be meaningless as it very much depends on the nature of a program. Binary network protocols, stuff that deals with hardware, emulators - they all could benefit from this proposal. But if we take some REST API service - it doesn't need binary or even hex literals.

### cznic commented Nov 3, 2018

 Binary literals are useful even for much smaller literals. But those are IMO way better readable when written in hex.

### griesemer commented Nov 7, 2018

 I am going to retract and close this proposal. With the reduction to 3 radixes at best (0x, 2x, 8x), it doesn't really bring enough "bang for the buck"; especially so if we keep the existing octal notation. Thanks to the initial supporters, but there doesn't seem to be enough community support for this idea at this stage of Go. If we are going to introduce binary integer literals, we should follow established practice in other languages and go with proposal #19308. If we want to introduce another octal notation, we may want to go with the 0o prefix (another more established convention). Closing.

### beoran commented Nov 7, 2018 via email

 After seeing what a bike shed this became, I think closing this is the best idea. Thanks for your continued efforts. Op wo 7 nov. 2018 22:57 schreef Robert Griesemer . — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#28256 (comment)>, or mute the thread .

mentioned this issue Nov 7, 2018

### nathany commented Nov 29, 2018

 Will #19308 be re-opened? Not much more to say, but it seems odd to request feedback in a blog post and then link to a locked issue. we may want to go with the 0o prefix That may be more clear than present. It would also be a simple feature to sort out breaking language changes: introduce the new syntax in 1.13 eventually remove the existing, slightly more confusing, syntax -- at least for new code

### ianlancetaylor commented Nov 30, 2018

 I unlocked #19308.

locked and limited conversation to collaborators Nov 30, 2019