Skip to content

proposal: arbitrary-radix integer literals #28256

@griesemer

Description

@griesemer

I've brought up this idea several times before informally. I'm filing this issue now for the formal documentation trail.

Currently, Go permits octal, decimal, and hexadecimal integer literals. There's a pending proposal for binary integer literals (#19308) which has wide support.

Proposal:

This is a fully backward-compatible proposal for arbitrary-radix integer literals. We change the integer literal syntax to the following:

int_lit = decimal_lit | octal_lit | radix_lit .
decimal_lit = ( "1" … "9" ) { decimal_digit } .
octal_lit = "0" { octal_digit } .
radix_lit = radix ( "x" | "X" ) radix_digit { radix_digit } .
radix = decimal_lit .

with

radix_digit = "0" … "9" | "A" … "Z" | "a" … "z" .

representing the digit values 0 to 35 (for a maximum radix of 36). The radix must be a decimal literal between 0 and 36, expressing the radix; with the radix value 0 having the same meaning as 16, and the value 1 being invalid.

Examples:

0x10   // same as 16x10 or 16
2x1001 // binary integer literal, same as 9
3x010  // ternary integer literal, same as 3
8x066  // octal integer literal, same as octal 066 or 54
36xz   // integer literal in base 36, value is 35

Discussion:

The beauty of this approach is that it permits arbitrary radix notation, thus removing any future need to expand this again, remove the need for the extra notation for hexadecimal numbers because they are just part of this notation, and at the same time it's fully backward-compatible. The commonly accepted notation for binary integer literals and the respective notation here have the same length and the proposed notation here seems just as intuitive (e.g., 0b1001100 == 2x1001100).

We could go a step further and remove octal literals from the language since they are also easily expressed with this notation, but that's a step that would not be backward-compatible. One way to make that happen w/o introducing bugs would be to disallow non-zero decimal numbers that start with a 0; octal numbers in existing code would then lead to a compiler error and could be fixed. It would also be trivial to have them fixed automatically with a simple tool. Finally, removing octals would eliminate another (albeit mostly academic issue) with them; see #28253. If octals were not supported anymore, one could condense the integer literal syntax to:

int_lit = decimal_digit { decimal_digit } [ ( "x" | "X" ) radix_digit { radix_digit } ] .

Implementation:

The implementation is straight-forward. It would likely slightly simplify some of the scanning code for numeric literals because with this proposals now all such literals simply start with a decimal_lit always. If that value is zero, or between 2 and 36, a subsequent 'x' indicates the actual literal value in that radix. The respective number conversion routines are trivial and would need minimal adjustments.

Impact:

Hard to say. It may be sufficient to just add another notation for binary integer literals per #19308. Or we could do this and lay the issue to rest for good.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions