-
Notifications
You must be signed in to change notification settings - Fork 17
/
P0907r1.bs
880 lines (685 loc) · 40.5 KB
/
P0907r1.bs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
<pre class='metadata'>
Title: Signed Integers are Two's Complement
Shortname: P0907
Revision: 1
Audience: EWG
Status: P
Group: WG21
URL: http://wg21.link/P0907r1
!Source: <a href="https://github.com/jfbastien/papers/blob/master/source/P0907r1.bs">github.com/jfbastien/papers/blob/master/source/P0907r1.bs</a>
Editor: JF Bastien, Apple, jfbastien@apple.com
Abstract: There is One True Representation for signed integers, and that representation is two's complement.
Date: 2018-04-01
Markup Shorthands: markdown yes
Toggle Diffs: yes
</pre>
Introduction {#intro}
============
[[C11]] Integer types allows three representations for signed integral types:
* Signed magnitude
* Ones' complement
* Two's complement
See [[#c-sign]] for full wording.
C++ goes further than C and only requires that "the representations of integral
types shall define values by use of a pure binary numeration system". To the
author's knowledge no modern machine uses both C++ and a signed integer
representation other than two's complement (see [[#survey]]). None of [[MSVC]],
[[GCC]], and [[LLVM]] support other representations. This means that the C++
that is taught is effectively two's complement, and the C++ that is written is
two's complement. It is extremely unlikely that there exist any significant
codebase developed for two's complement machines that would actually work when
run on a non-two's complement machine.
C and C++ as specified, however, are not two's complement. Signed integers
currently allow the existence of an extraordinary value which traps, extra
padding bits, integral negative zero, and introduce undefined behavior and
implementation-defined behavior for the sake of this *extremely* abstract
machine.
Let's stop pretending that the abstract machine should represent integers as
signed magnitude or ones' complement. These theoretical implementations are a
different programming language, not our real-world C++. Developers who require
signed magnitude or ones' complement integers would be better served by a
pure-library solution, and so would the rest of us.
This paper proposes the following:
* *Status-quo* Signed integer arithmetic remains non-commutative in general
(though some implementations may guarantee that it is).
* *Change* `bool` is represented as `0` for `false` and `1` for `true`. All
other representations are undefined.
* *Change* `bool` only has value bits, no padding bits.
* *Change* Signed integers are two's complement.
* *Change* If there are *M* value bits in the signed type and *N* in the
unsigned type, then *M = N-1* (whereas C says *M ≤ N*).
* *Status-quo* If a signed operation would naturally produce a value that is
not within the range of the result type, the behavior is undefined. The
author had hoped to make this well-defined as wrapping (the operations
produce the same value bits as for the corresponding unsigned type), but
WG21 had strong resistance against this.
* *Change* None of the integral types have extraordinary values.
* *Change* C11 note 53 has wording aroung trap representations within padding
bits, e.g. for parity bits. C++ has no such wording.
* *Change* Conversion from signed to unsigned is always well-defined: the
result is the unique value of the destination type that is congruent to the
source integer modulo 2<sup>N</sup>.
* *Change* Conversion from enumeration type to integral is the same as that of
converting from the enumeration's underlying type and then to the
destination type, even if the original value cannot be represented by the
specified type.
* *Status-quo* Conversion from integral type to enumeration is unchanged: if
the original value is not within the range of the enumeration values the
behavior is undefined.
* *Change* Left-shift on signed integer types produces the same results as
left-shift on the corresponding unsigned integer type.
* *Change* Right-shift is an arithmetic right shift which performs
sign-extension.
* *Status-quo* shift by larger-than or equal-to bit-width remains undefined.
* *Status-quo* `constexpr` evaluation of signed integer arithmetic with
undefined result is not a *core constant expression*.
* *Change* the range of enumerations without a fixed underlying type is
simplified because of two's complement.
* *Status-quo* `is_modulo` type trait remains `false` for signed integer types
unless an implementation chooses to defined overflow as wrap.
* `has_unique_object_representations<T>` is `true` for `bool` and signed
integer types, in addition to unsigned integer types and others before.
* *Status-quo* atomic operations on signed integer types continues not to have
undefined behavior, and is still specified to wrap as two's complement (the
definition is clarified to act as-if casting to unsigned and back).
* *Change* address [[LWG3047]] to remove undefined behavior in pre-increment
atomic operations.
This proposal leaves C unchanged, it merely restricts further the subset of C
which applies to C++. Aaron Ballman volunteered to present this paper and the
corresponding [[N2218]] to WG14, in the hope that C will approve compatible
changes.
A final argument to move to two's complement is that few people spell "ones'
complement" correctly according to Knuth [[TAoCP]]. Reducing the nerd-snipe
potential inherent in C++ is a Good Thing™.
<blockquote>
Detail-oriented readers and copy editors should notice the position of the
apostrophe in terms like “two’s complement” and “ones’ complement”: A two’s
complement number is complemented with respect to a single power of 2, while a
ones’ complement number is complemented with respect to a long sequence of
1s. Indeed, there is also a “twos’ complement notation,” which has radix 3 and
complementation with respect to (2 . . . 22)<sub>3</sub>.
</blockquote>
Edit History {#edit}
============
r0 → r1 {#r0r1}
-------
This paper was presented in Jacksonville to:
* A joint SG6 numerics and SG12 undefined behavior session
* The Evolution Working Group
The following polls were taken, and corresponding modifications made to the
paper. The main change between [[P0907r0]] and the subsequent revision is to
maintain undefined behavior when signed integer overflow occurs, instead of
defining wrapping behavior. This direction was motivated by:
* Performance concerns, whereby defining the behavior prevents optimizers from
assuming that overflow never occurs;
* Implementation leeway for tools such as sanitizers;
* Data from Google suggesting that over 90% of all overflow is a bug, and
defining wrapping behavior would not have solved the bug.
It is expected that this paper have little to no effect on code generation from
current compilers. Known codegen changes should have no performance implication,
for example:
* Older x86 vector instructions don't support arithmetic right shift, and
therefore must remain scalar instead of vectorizing to non-arithmetic right
shift.
* An implementation which implements left shift using a rotate now need to
also mask.
### SG6 / SG12 ### {#sg6-sg12}
<table class="def">
<tr><th style="width: 70%;"></th><th>**SF**</th><th>**F**</th><th>**N**</th><th>**A**</th><th>**SA**</th></tr>
<tr><th><small>Change anything with respect to signed integers.</th>
<th>2</th><th>10</th><th>5</th><th>1</th><th>0</th></tr>
<tr><th><small>Moving any change from this paper to IS'20 is blocked on synchronizing with WG14.</th>
<th>4</th><th>9</th><th>4</th><th>0</th><th>1</th></tr>
<tr><th><small>Change the default unsigned integral types' behavior.</th>
<th>0</th><th>0</th><th>1</th><th>8</th><th>9</th></tr>
<tr><th><small>Changes we make constrain *extended integral types*.</th>
<th>10</th><th>4</th><th>2</th><th>0</th><th>1</th></tr>
<tr><th><small>Allow *extraordinary value* for signed integral types.</th>
<th>2</th><th>2</th><th>9</th><th>2</th><th>3</th></tr>
</table>
We decided to independently conside what the "privileged" syntax' behavior
should be when going out of range for:
* cast of enums
* cast from `unsigned` to `signed`
* `INT_MIN / -1`
* addition / subtraction / multiplication and `-INT_MIN` overflow
**Multiple answer poll** would you be OK with these being the standards-mandated
behavior for the "privileged" syntax behavior of signed integral types when
going out of range:
* 16—undefined behavior
* 0—saturation
* 0—trap (exception)
* 1—trap (abort)
* 1—generate extraordinary value
* 11—wrap
* 5—Rust-style wrap or trap, maybe contract violation?
* 9—intermediate values are mathematical integers (e.g. `(int)a+(int)b > INT_MAX` would be OK)
<table class="def">
<tr><th style="width: 70%;"></th><th>**SF**</th><th>**F**</th><th>**N**</th><th>**A**</th><th>**SA**</th></tr>
<tr><th><small>Cast to enums outside of the enum’s representable range should be defined instead of undefined behavior.</th>
<th>0</th><th>1</th><th>1</th><th>5</th><th>8</th></tr>
<tr><th><small>Cast from `unsigned` to `signed` which are out of range is currently implementation defined. It should instead be wrap.</th>
<th>6</th><th>6</th><th>3</th><th>0</th><th>1</th></tr>
<tr><th><small>We want two’s complement representation for signed types, regardless of what WG14 decides.</th>
<th>7</th><th>4</th><th>2</th><th>0</th><th>3</th></tr>
<tr><th><small>`INT_MIN / -1` should be defined.</th>
<th>0</th><th>0</th><th>4</th><th>3</th><th>9</th></tr>
</table>
**Multiple answer poll** addition / subtraction / multiplication and `-INT_MIN`
overflow is currently undefined behavior, it should instead be:
* 4—wrap
* 6—wrap or trap
* 5—intermediate values are mathematical integers
* 14—status quo (remain undefined behavior)
**Multiple answer poll** to opt-in to the other behaviors (both for `signed` and
`unsigned`) we can create library or language changes. What should we explore
separately from this paper?
* 12—undefined behavior
* 9—saturation
* 8—trap (exception)
* 6—trap (abort)
* 3—generate extraordinary value
* 16—wrap
* 10—Rust-style wrap or trap, maybe contract violation?
* 9—intermediate values are mathematical integers (e.g. `(int)a+(int)b > INT_MAX` would be OK)
* 13—bigint
* 14—operations with a bool set when overflow has occurred
<table class="def">
<tr><th style="width: 70%;"></th><th>**SF**</th><th>**F**</th><th>**N**</th><th>**A**</th><th>**SA**</th></tr>
<tr><th><small>Shifting out of range `1 << 31` as currently defined (if shift has
defined value when interpreted as unsigned, you get that value as signed, you
can shift into but not past the sign, shifting past sign bit is undefined
behavior). `1 << 31` was defined and still is (as of Howard’s paper), `2 << 31`
was undefined behavior and still is. WG14 is currently considering whether to
adopt Howard’s paper which made this. Should we take it back to undefined to do
`1 << 31`?</th>
<th>2</th><th>7</th><th>5</th><th>1</th><th>1</th></tr>
<tr><th><small>Left shift should be the same for `signed` and `unsigned` (*overrides previous poll*).</th>
<th>6</th><th>5</th><th>5</th><th>0</th><th>1</th></tr>
<tr><th><small>Right shift on a signed integral type should be an arithmetic shift (which sign-extends).</th>
<th>9</th><th>4</th><th>3</th><th>1</th><th>0</th></tr>
<tr><th><small>In C `intN_t` is optional: “These types are optional. However, if an
implementation provides integer types with widths of 8, 16, 32, or 64 bits, no
padding bits, and (for the signed types) that have a two’s complement
representation, it shall define the corresponding typedef names.” Change it to
make `int8_t`, `int16_t`, `int32_t`, and `int64_t` not optional. </th>
<th>1</th><th>0</th><th>2</th><th>3</th><th>12</th></tr>
</table>
### EWG ### {#ewg}
The Evolution Working Group took the following polls:
<table class="def">
<tr><th style="width: 70%;"></th><th>**SF**</th><th>**F**</th><th>**N**</th><th>**A**</th><th>**SA**</th></tr>
<tr><th><small>Disallow extraordinary value (trapping / NaN) for signed integral types.</th>
<th>16</th><th>12</th><th>7</th><th>1</th><th>2</th></tr>
<tr><th><small>Does EWG want to move signed integers to two’s complement, as presented
in the current paper (without extraordinary values)?</th>
<th>11</th><th>17</th><th>7</th><th>2</th><th>1</th></tr>
<tr><th><small>Move to Core.</th>
<th>7</th><th>8</th><th>12</th><th>8</th><th>4</th></tr>
</table>
The resolution on disallowing extraordinary values overrides the lack of
consensus for change from SG6 / SG12.
The decision to not forward to Core was mainly motivated on hearing back from
WG14, which should be done by Rappersvil. EWG will see the paper again in
Rappersvil, and will likely forward to Core at that point in time given the
outcome of the next-to-last poll.
Proposed Wording {#word}
================
Leave the note in Program execution [**intro.execution**] ❡8 as-is:
<blockquote>
[*Note:* Operators can be regrouped according to the usual mathematical
rules only where the operators really are associative or commutative.
For example, in the following fragment
<xmp>
int a, b;
/* ... */
a = a + 32760 + b + 5;
</xmp>
the expression statement behaves exactly the same as
<xmp>
a = (((a + 32760) + b) + 5);
</xmp>
due to the associativity and precedence of these operators. Thus, the result of
the sum `(a + 32760)` is next added to `b`, and that result is then added to 5
which results in the value assigned to `a`. On a machine in which overflows
produce an exception and in which the range of values representable by an `int`
is `[-32768, +32767]`, the implementation cannot rewrite this expression as
<xmp>
a = ((a + b) + 32765);
</xmp>
since if the values for `a` and `b` were, respectively, -32754 and -15, the sum
`a + b` would produce an exception while the original expression would not; nor
can the expression be rewritten either as
<xmp>
a = ((a + 32765) + b);
</xmp>
or
<xmp>
a = (a + (b + 32765));
</xmp>
since the values for `a` and `b` might have been, respectively, 4 and -8 or -17
and 12. However on a machine in which overflows do not produce an exception and
in which the results of overflows are reversible, the above expression statement
can be rewritten by the implementation in any of the above ways because the same
result will occur. —*end note*]
</blockquote>
Modify Fundamental types [**basic.fundamental**] ❡4 onwards:
<blockquote>
Unsigned integers shall obey the laws of arithmetic modulo 2<sup>n</sup> where
*n* is the number of bits in the value representation of that particular size of
integer.
<blockquote>
This implies that unsigned arithmetic does not overflow because a result that
cannot be represented by the resulting unsigned integer type is reduced modulo
the number that is one greater than the largest value that can be represented
by the resulting unsigned integer type.
</blockquote>
Type `wchar_t` is a distinct type whose values can represent distinct codes for
all members of the largest extended character set specified among the supported
locales. Type `wchar_t` shall have the same size, signedness, and alignment
requirements as one of the other integral types, called its *underlying type*.
Types `char16_t` and `char32_t` denote distinct types with the same size,
signedness, and alignment as `uint_least16_t` and `uint_least32_t`,
respectively, in `<cstdint>`, called the underlying types.
Values of type `bool` are either `true` or `false`<sup>†</sup>. [*Note:* There are no
`signed`, `unsigned`, `short`, or `long bool` types or values. —*end note*]
Values of type `bool` participate in integral promotions.<ins> Type `bool`'s
object representation bits shall only contain value bits. The bits of the value
representation of type `bool` shall all be zero for `false`. For `true`, all
bits shall be zero except for the bit corresponding to the least-significant bit
in the object representation of the unsigned type with the same size (or a
bit-field thereof) as `bool`. The program has undefined behavior if any other
object representation is formed.</ins>
<blockquote>
<sup>†</sup> Using a `bool` value in ways described by this International
Standard as “undefined”, such as by examining the value of an uninitialized
automatic object, might cause it to behave as if it is neither `true` nor
`false`.
</blockquote>
Issue: We need to define the storage for `bool` since we define signed and
unsigned below, and that excludes `bool`. The definition I propose is more
restrictive than what we had before because it only allows two values to be
represented, and doesn't allow padding bits. This guarantees that `bool` is
trivially-copyable, and gives it a unique object representation, which as far as
I know all compilers already guaranteed.
Types `bool`, `char`, `char16_t`, `char32_t`, `wchar_t`, and the signed and
unsigned integer types are collectively called *integral* types. A synonym for
integral type is *integer type*. <del>The representations of integral types
shall define values by use of a pure binary numeration system<sup>‡</sup>. [*Example:*
This document permits two's complement, ones' complement and signed magnitude
representations for integral types. —*end example*]</del>
<p><ins>For unsigned integer types, the bits of the object representation are
divided into two groups: value bits and padding bits. There need not be any
padding bits; `unsigned char` shall not have any padding bits. If there are *N*
value bits, each bit shall represent a different power of 2 between 1 and
2<sup>*N*−1</sup>, so that objects of that type shall be capable of representing
values from 0 to 2<sup>*N*</sup>−1 using a pure binary numeration
system<sup>‡</sup>; this shall be known as the value representation. The values
of any padding bits are unspecified.</ins>
<blockquote>
<sup>‡</sup> A positional representation for integers that uses the binary
digits 0 and 1, in which the values represented by successive bits are
additive, begin with 1, and are multiplied by successive integral power of 2,
except perhaps for the bit with the highest position. (Adapted from the
*American National Dictionary for Information Processing Systems*.)
</blockquote>
Issue: An intermediate revision of this paper stated "Value bits are store
contiguously in memory" in an attempt to preserve different endiannesses, and
otherwise restricts implementations to "sane" layout. However, that change
wasn't presented to EWG and received pushback on the SG6 reflector. Would this
be a desirable addition, should it be in a separate paper, or does it
overconstrain implementations?
<p><ins>Signed integer types shall be repesented as two's complement. The bits
of signed integer types' object representation are divided into three groups:
value bits, padding bits, and the sign bit. There need not be any padding bits;
`signed char` shall not have any padding bits. There shall be exactly one sign
bit. Each bit that is a value bit shall have the same value as the same bit in
the object representation of the corresponding unsigned type (if there are *M*
value bits in the signed type and *N* in the unsigned type, then *M = N-1*).
The value bit in the unsigned representation with value 2<sup>N-1</sup>
corresponds to the sign bit in the signed representation, where it has value
-2<sup>N-1</sup>. [*Note:* Unless otherwise specified, if a signed operation
would naturally produce a value that is not within the range of the result
type, the behavior is undefined (see [**expr.pre**] 8). —*end note*]</ins>
Issue: note *M = N-1*, whereas C says *M ≤ N*. I derive this from "For each of
the standard signed integer types, there exists a corresponding (but different)
standard unsigned integer type [...] <strong>each of which occupies the same
amount of storage</strong>".
<p><ins>All combinations of value bits and sign bit are valid and result in a
specific distinct integer value. No other bits factor into an integer's value.
[*Note:* Therefore, none of the integral types have extraordinary values as
defined in ISO C. —*end note*]</ins>
Issue: In C11 implementation requirements call it the "extraordinary value" and
refer to 6.2.6.2 which calls it "is a trap representation or a normal
value". Furthermore, in note 53 there's wording around extraordinary values
being held in padding bits (not just as a value stolen from the value
representation), and since C++ wording used to only talk about "binary
representation" it'd rather be very clear about the absence of extraordinary
values. It's unclear whether SG6 / SG12 guidance explicitly wanted to disallow
padding bits from being special and trapping. I propose disallowing it.
</blockquote>
Modify Integral conversions [**conv.integral**] ❡1 onwards:
<blockquote>
A prvalue of an integer type can be converted to a prvalue of another integer
type. A prvalue of an unscoped enumeration type can be converted to a prvalue of
an integer type.
<del>If the destination type is unsigned, the resulting value is the least unsigned
integer congruent to the source integer (modulo 2<sup>n</sup> where *n* is the
number of bits used to represent the unsigned type). [*Note:* In a two's
complement representation, this conversion is conceptual and there is no change
in the bit pattern (if there is no truncation). —*end note*]</del>
<del>If the destination type is signed, the value is unchanged if it can be
represented in the destination type; otherwise, the value is
implementation-defined.</del>
<p><ins>For integral types other than `bool`, the result is the unique value of
the destination type that is congruent to the source integer modulo
2<sup>N</sup>, where *N* is the number of value bits in the destination
type. [*Note:* This conversion is conceptual; there is no change in the bit
pattern other than that high-order bits may either be discarded or added. All
added high-order bits will have the same value. —*end note*]</ins>
</blockquote>
Modify Static cast [**expr.static.cast**] ❡9 onwards:
<blockquote>
A value of a scoped enumeration type can be explicitly converted to an integral
type<del>. When that type is *cv* `bool`, the resulting value is `false` if the
original value is zero and `true` for all other values. For the remaining
integral types, the value is unchanged if the original value can be represented
by the specified type. Otherwise, the resulting value is
unspecified.</del><ins>; the result is the same as that of converting from the
enumeration's underlying type and then to the destination type.</ins> A value of
a scoped enumeration type can also be explicitly converted to a floating-point
type; the result is the same as that of converting from the original value to
the floating-point type.
Issue: the above change is for enumeration *to* integer, which SG6 did not
object to changing as suggested. It states the conversion in terms of the
updated [**conv.integral**] rules.
A value of integral or enumeration type can be explicitly converted to a
complete enumeration type. If the enumeration type has a fixed underlying type,
the value is first converted to that type by integral conversion, if necessary,
and then to the enumeration type. If the enumeration type does not have a fixed
underlying type, the value is unchanged if the original value is within the
range of the enumeration values, and otherwise, the behavior is undefined. A
value of floating-point type can also be explicitly converted to an enumeration
type. The resulting value is the same as converting the original value to the
underlying type of the enumeration, and subsequently to the enumeration type.
Issue: the above unmodified paragraph is for integer *to* enumeration which SG6
voted to leave undefined in poll "Cast to enums outside of the enum’s
representable range should be defined instead of undefined behavior".
</blockquote>
Modify Shift operators [**expr.shift**] ❡1 onwards:
<blockquote>
The operands shall be of integral or unscoped enumeration type and integral
promotions are performed. The type of the result is that of the promoted left
operand. The behavior is undefined if the right operand is negative, or greater
than or equal to the length in bits of the promoted left operand.
The value of `E1 << E2` is `E1` left-shifted `E2` bit positions; vacated bits
are zero-filled. <del>If `E1` has an unsigned type, the</del><ins>The</ins>
value of the result is E1×2<sup>E2</sup>, reduced modulo <del>one more than the
maximum value representable in the result type. Otherwise, if `E1` has a signed
type and non-negative value, and E1×2<sup>E2</sup> is representable in the
corresponding unsigned type of the result type, then that value, converted to
the result type, is the resulting value; otherwise, the behavior is
undefined.</del><ins>2<sup>N</sup>, where *N* is the number of value
bits in the result type.</ins>
Issue: updated to reflect poll "Left shift should be the same for signed and
unsigned".
The value of `E1 >> E2` is `E1` right-shifted `E2` bit positions. <del>If `E1`
has an unsigned type or if `E1` has a signed type and a non-negative value,
the</del><ins>The</ins> value of the result is <del>the integral part of the
quotient of <code>E1/2<sup>E2</sup></code>. If `E1` has a signed type and a
negative value, the resulting value is implementation-defined.</del><ins>
<code>E1 / 2<sup>E2</sup></code>, rounded down. [*Note*: Right-shift on signed
integral types is an arithmetic right shift, which performs sign-extension.
—*end note*]</ins>
</blockquote>
Leave Constant expressions [**expr.const**] ❡2 as-is:
<blockquote>
An expression `e` is a *core constant expression* unless the evaluation of `e`,
following the rules of the abstract machine, would evaluate one of the following
expressions:
[…]
* an operation that would have undefined behavior as specified in Clause 4
through 19 of this document [*Note*: including, for example, signed
integer overflow, certain pointer arithmetic, division
by zero, or certain shift operations —*end note*]
</blockquote>
Modify Enumeration declarations [**dcl.enum**] ❡8:
<blockquote>
For an enumeration whose underlying type is fixed, the values of the enumeration
are the values of the underlying type. Otherwise, for an enumeration where
e<sub>min</sub> is the smallest enumerator and e<sub>max</sub> is the largest,
the values of the enumeration are the values in the range b<sub>min</sub> to
b<sub>max</sub>, defined as follows: <del>Let *K* be 1 for a two's complement
representation and 0 for a ones' complement or sign-magnitude representation.</del>
b<sub>max</sub> is the smallest value greater than or equal to
max(|e<sub>min</sub>| - <del>K</del><ins>1</ins>, |e<sub>max</sub>|) and equal to 2<sup>M</sup>-1,
where *M* is a non-negative integer. b<sub>min</sub> is zero if e<sub>min</sub>
is non-negative and -(b<sub>max</sub>+<del>K</del><ins>1</ins>) otherwise. The size of the smallest
bit-field large enough to hold all the values of the enumeration type is
max(M,1) if b<sub>min</sub> is zero and M+1 otherwise. It is possible to define
an enumeration that has values not defined by any of its enumerators. If the
*enumerator-list* is empty, the values of the enumeration are as if the
enumeration had a single enumerator with value 0.
</blockquote>
Leave `numeric_limits` members [**numeric.limits.members**] ❡61 onwards as-is:
<blockquote>
<xmp>static constexpr bool is_modulo;</xmp>
`true` if the type is modulo. A type is modulo if, for any operation involving
`+`, `-`, or `*` on values of that type whose result would fall outside the
range `[min(), max()]`, the value returned differs from the true value by
an integer multiple of `max() - min() + 1`.
[*Example:*`is_modulo` is `false` for signed integer types unless an
implementation, as an extension to this document, defines signed integer
overflow to wrap. —*end example*]
Meaningful for all specializations.
</blockquote>
Modify Type properties [**meta.unary.prop**] ❡9 as follows:
<blockquote>
The predicate condition for a template specialization `has_unique_object_representations<T>` shall be satisfied if and only if:
* `T` is trivially copyable, and
* any two objects of type `T` with the same value have the same object
representation, where two objects of array or non-union class type are
considered to have the same value if their respective sequences of direct
subobjects have the same values, and two objects of union type are
considered to have the same value if they have the same active member and
the corresponding members have the same value.
The set of scalar types for which this condition holds is implementation-defined.
[*Note:* If a type has padding bits, the condition does not hold; otherwise, the
condition holds true for <ins>`bool`, signed, and </ins>unsigned integral types.
—*end note*]
Issue: this was missing from P0907r0. Adding signed here seems like a
no-brainer. GCC and LLVM currently return `true` for `bool` (and it currently
isn't implemented in MSVC). [Try it out](https://godbolt.org/g/pxaidA).
</blockquote>
Modify Class template `ratio` [**ratio.ratio**] ❡1 as follows:
<blockquote>
If the template argument `D` is zero or the absolute values of either of the
template arguments `N` and `D` is not representable by type `intmax_t`, the
program is ill-formed. [*Note:* These rules ensure that infinite ratios are
avoided and that for any negative input, there exists a representable value of
its absolute value which is positive. <del>In a two's complement representation,
this</del><ins>This</ins> excludes the most negative value. —*end note*]
</blockquote>
Modify Specializations for integers [**atomics.types.int**] ❡7 and ❡8 as
follows:
<blockquote>
*Remarks:* For signed integer types, <del>arithmetic is defined to use two's
complement representation. </del><ins>the result is as if the object value and
parameters were converted to their corresponding unsigned types, the computation
performed on those types, and the result converted back to the signed type.
[*Note:* </ins> There are no undefined results<ins> arising from the computation.
—*end note*]</ins>
Issue: the operations this applies to are add, or, and, sub, xor, and is only
meaningful for add and sub.
<xmp>
T operator op=(T operand) volatile noexcept;
T operator op=(T operand) noexcept;
</xmp>
*Effects:* Equivalent to:
<del>`return fetch_key(operand) op operand;`</del>
<ins>`return static_cast<T>(static_cast<make_unsigned_t<T>>(fetch_key(operand)) op static_cast<make_unsigned_t<T>>(operand));`</ins>
Issue: there's an outstanding defect report for this [[LWG3047]], whose
resolution should be updated as above.
</blockquote>
Out of Scope {#scope}
============
This proposal focuses on the representation of signed integers, and on
tightening the specification when that representation is constrained to two's
complement. It is out of scope for this proposal to deal with related issues
which have more to them than simply the representation of signed integers.
A non-comprehensive list of items left purposefully out:
* Left and right shift with a right-hand-side equal to or wider than the
bit-width of the left-hand-side.
* Integral division or modulo by zero.
* Integral division or modulo of the signed minimum integral value for a
particular integral type by minus one.
* Overflow of pointer arithmetic.
* Library solution for ones' complement integers.
* Library solution for signed magnitude integers.
* Library solution for two's complement integers with trapping or undefined
overflow semantics.
* Language support for explicit signed overflow truncation such as Swift's
(`&+`, `&-`, and `&*`), or complementary trapping overflow operators.
* Library or language support for saturating arithmetic.
* Mechanism to let the compiler assume that integers, signed or unsigned, do
not experience signed or unsigned wrapping for:
* A specific integral variable.
* All integral variables (à la `-ftrapv`, `-fno-wrapv`, and `-fstrict-overflow`).
* A specific loop's induction variable.
* Mechanism to have the compiler list places where it could benefit from
knowing that overflow cannot occur (à la `-Wstrict-overflow`).
* Endianness of integral storage (or endianness in general).
* Bits per bytes, though we all know there are eight.
These items could be tackled in separate proposals, unless the committee wants
them tackled here. This paper expresses no preference in whether they should be
addressed or how.
C Signed Integer Wording {#c-sign}
========================
The following is the wording on integers from the C11 Standard.
<blockquote>
For unsigned integer types other than unsigned char, the bits of the object
representation shall be divided into two groups: value bits and padding bits
(there need not be any of the latter). If there are *N* value bits, each bit
shall represent a different power of 2 between 1 and 2<sup>N−1</sup>, so that
objects of that type shall be capable of representing values from 0 to
2<sup>N</sup> − 1 using a pure binary representation; this shall be known as
the value representation. The values of any padding bits are unspecified.
For signed integer types, the bits of the object representation shall be
divided into three groups: value bits, padding bits, and the sign bit. There
need not be any padding bits; `signed char` shall not have any padding bits.
There shall be exactly one sign bit. Each bit that is a value bit shall have
the same value as the same bit in the object representation of the
corresponding unsigned type (if there are *M* value bits in the signed type
and *N* in the unsigned type, then M ≤ N). If the sign bit is zero, it shall
not affect the resulting value. If the sign bit is one, the value shall be
modified in one of the following ways:
* the corresponding value with sign bit 0 is negated (*sign and magnitude*);
* the sign bit has the value −(2<sup>M</sup>) (*two’s complement*);
* the sign bit has the value −(2<sup>M</sup> − 1) (*ones’ complement*).
Which of these applies is implementation-defined, as is whether the value with
sign bit 1 and all value bits zero (for the first two), or with sign bit and
all value bits 1 (for ones’ complement), is a trap representation or a normal
value. In the case of sign and magnitude and ones’ complement, if this
representation is a normal value it is called a *negative zero*.
If the implementation supports negative zeros, they shall be generated only
by:
* the `&`, `|`, `^`, `~`, `<<`, and `>>` operators with operands that produce such a value;
* the `+`, `-`, `*`, `/`, and `%` operators where one operand is a negative zero and the result is zero;
* compound assignment operators based on the above cases.
It is unspecified whether these cases actually generate a negative zero or a
normal zero, and whether a negative zero becomes a normal zero when stored in
an object.
If the implementation does not support negative zeros, the behavior of the
`&`, `|`, `^`, `~`, `<<`, and `>>` operators with operands that would produce
such a value is undefined.
The values of any padding bits are unspecified. A valid (non-trap) object
representation of a signed integer type where the sign bit is zero is a valid
object representation of the corresponding unsigned type, and shall represent
the same value. For any integer type, the object representation where all the
bits are zero shall be a representation of the value zero in that type.
The *precision* of an integer type is the number of bits it uses to represent
values, excluding any sign and padding bits. The *width* of an integer type is
the same but including any sign bit; thus for unsigned integer types the two
values are the same, while for signed integer types the width is one greater
than the precision.
</blockquote>
Survey of Signed Integer Representations {#survey}
========================================
Here is a non-comprehensive history of signed integer representations:
* Two's complement
* John von Neumann suggested use of two's complement binary representation in his 1945 First Draft of a Report on the EDVAC proposal for an electronic stored-program digital computer.
* The 1949 EDSAC, which was inspired by the First Draft, used two's complement representation of binary numbers.
* Early commercial two's complement computers include the Digital Equipment Corporation PDP-5 and the 1963 PDP-6.
* The System/360, introduced in 1964 by IBM, then the dominant player in the computer industry, made two's complement the most widely used binary representation in the computer industry.
* The first minicomputer, the PDP-8 introduced in 1965, uses two's complement arithmetic as do the 1969 Data General Nova, the 1970 PDP-11.
* Ones' complement
* Many early computers, including the CDC 6600, the LINC, the PDP-1, and the UNIVAC 1107.
* Successors of the CDC 6600 continued to use ones' complement until the late 1980s.
* Descendants of the UNIVAC 1107, the UNIVAC 1100/2200 series, continue to do so, although ClearPath machines are a common platform that implement either the 1100/2200 architecture (the ClearPath IX series) or the Burroughs large systems architecture (the ClearPath NX series). Everything is common except the actual CPUs, which are implemented as ASICs. In addition to the IX (1100/2200) CPUs and the NX (Burroughs large systems) CPU, the architecture had Xeon (and briefly Itanium) CPUs. Unisys' goal was to provide an orderly transition for their 1100/2200 customers to a more modern architecture.
* Signed magnitude
* The IBM 700/7000 series scientific machines use sign/magnitude notation, except for the index registers which are two's complement.
<a href="https://en.wikipedia.org/wiki/Two%27s_complement">Wikipedia</a> offers
more details and has comprehensive sources for the above.
Thomas Rodgers surveyed popular DSPs and found the following:
* SHARC family ships
[a C++ compiler](http://www.analog.com/media/en/dsp-documentation/software-manuals/cces-SharcCompiler-manual.pdf)
which supports C++11, and where signed integers are two's complement.
* Freescale/NXP ships
[a C++ compiler](https://www.nxp.com/docs/en/reference-manual/SC_COMPILER_USERS_GUIDE.pdf)
which supports C++03, and where signed integers are two's complement
([reference](https://www.nxp.com/docs/en/reference-manual/MNSC140CORE.pdf)).
* Texas Instruments ships
[a C++ compiler](http://www.ti.com/lit/ug/slau132r/slau132r.pdf) which
supports C++14, and where signed integers are two's complement.
* CML Microcircuits has fixed ASIC for radio processing, and doesn't seem to
support C++.
* [Cirrus Logic](https://www.cirrus.com/products/cs4970xx/) (formerly Wolfson)
has
[a C compiler only](https://d3uzseaevmutz1.cloudfront.net/pubs/manual/cirrus_c_compiler_um15.pdf)
which is probably two's complement.
* Synaptics (formerly Connexant) makes audio input subsystem for voice
assistants. The DSP runs fixed far-field signal processing algorithms and
has programmable functions which run on standard ARM controller, using
Raspbian.
In short, the only machine the author could find using non-two's complement are
made by Unisys, and no counter-example was brought by any member of the C++
standards committee. Nowadays Unisys emulates their old architecture using x86
CPUs with attached FPGAs for customers who have legacy applications which
they've been unable to migrate. These applications are unlikely to be well
served by modern C++, signed integers are the least of their
problem. Post-modern C++ should focus on serving its existing users well, and
incoming users should be blissfully unaware of integer esoterica.
<pre class=biblio>
{
"C11": {
"href": "http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf",
"title": "Programming Languages — C",
"publisher": "ISO/IEC JTC1 SC22 WG14"
},
"MSVC": {
"href": "https://docs.microsoft.com/en-us/cpp/c-language/integers",
"title": "MSVC C Implementation-Defined Behavior: Integers"
},
"GCC": {
"href": "https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html",
"title": "GCC C Implementation-Defined Behavior: Integers"
},
"LLVM": {
"href": "https://llvm.org/docs/LangRef.html",
"title": "LLVM Language Reference Manual"
},
"TAoCP": {
"authors": ["Donald Knuth"],
"title": "The Art of Computer Programming, Volume 2 (3rd Ed.): Seminumerical Algorithms",
"rawDate": "1997",
"isbn": "0-201-89684-2",
"publisher": "Addison-Wesley Longman Publishing Co., Inc."
},
"N2218": {
"authors": ["JF Bastien"],
"title": "Signed integers are two's complement",
"publisher": "ISO JTC1/SC22/WG14: Programming Language C",
"href": "http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.html"
}
}
</pre>