<div style="text-align:center;">
    <img src="http://www.cs.wm.edu/~rml/images/wm_horizontal_single_line_full_color.png">
    <h1>CSCI 312, Fall 2025</h1>
    <h1>Effective C, Chapter 3
    <h1>Arithmetic types!</h1>
</div>

# Contents
* [Sizes of types](#Sizes-of-types)
* [Integers](#Integers)
    * [`char`](#char)
    * [`short int`](#short-int)
    * [`int`](#int)
    * [`long int`](#long-int)
    * [`long long int`](#long-long-int)
* [Specification of integers in the C standard](#Specification-of-integers-in-the-C-standard)
    * [What to expect in the wild](#What-to-expect-in-the-wild)
* [Yet more integers!](#Yet-more-integers!)
    * [Exact-width integers](#Exact-width-integers)
    * [Bit-precise integers](#Bit-precise-integers)
* [Integer overflow](#Integer-overflow)
* [Storage classes](#Storage-classes)
* [Arithmetic](#Arithmetic)
* [Type conversions 🐞🐞🐞🐞](#Type-conversions-🐞🐞🐞🐞)
* [Integer-conversions-🐞😱🐞😱](#Integer-conversions-🐞😱🐞😱)
* [Floating point in C/C++](#Floating-point-in-C/C++)

# Python vs. C vs. C++ vs. Rust vs. Typescript

|           | Python          | C            | C++   | Rust  | Typescript  |
| :--------:| :-------------: | :----------: | :---: | :---: | :---------: |
| integer representation | ??? | two's complement<sup>0</sup> | same as C | same as C |
| addition, subtraction, multiplication | `+,-,*`    | same  | same | same |
| regular division | `/` | same<sup>1</sup> | same as C | same in C 
| integer division | `//` | `/`<sup>2</sup> | same as C | same in C|
| remainder | `%` | same | same | same |
| software integers | `int`   |    |
| hardware integers |        | `char` | same as C | |
|           |        | `unsigned char` | same as C | |
|           |        | `signed char` | same as C | `byte` |
|           |        | `short` or `short int` or `signed short int` | same as C | `short` |
|           |        | `unsigned short` or `unsigned short int` | same as C | |
|           |        | `int` or `signed int` | same as C | `int` |
|           |        | `unsigned int` | same as C | |
|           |        | `long` or `long int` or `signed long int` | same as C | `long` |
|           |        | `unsigned long` or `unsigned long int` | same as C | |
|           |        | `long long` or `signed long long` or `signed long long int` | same as C | |
|           |        | `unsigned long long` or `unsigned long long int` | same as C | |
| exact-width integers<sup>3</sup> |        | signed: `intN_t`, N = 8, 16, 32, 64 | same as C | `iN`, N = 8, 16, 32, 64 | |
|           |        | unsigned: `uintN_t`, N = 8, 16, 32, 64  | same as C | `uiN`, N = 8, 16, 32, 64 |
| bit-precise integers | | `_BitInt(n)`, `n` &ge; 1 | | | 

<sup>0</sup> As of C23. <br/>
<sup>1</sup> When one or both operands is non-integer. <br/>
<sup>2</sup> When both operands are integers. <br/>
<sup>3</sup> These are implementation dependent and might not be present.

# Sizes of types

If you program in C you will find yourself greatly concerned with the number of bytes objects occupy in memory. We will refer to this quantity as the **width** or **length** or **size** of the object or type.  Keep in mind that if you can represent $2^{n}$ distinct bit patterns using $n$ bits.

You can determine the width of a variable or a type using <code class="kw">sizeof()</code>.  For example, here are the widths of various flavors of integers:

In [None]:
cat -n src/int_widths.c

In [None]:
gcc src/int_widths.c

In [None]:
./a.out

🐞🐞🐞🐞 Be careful &ndash; <code>sizeof()</code> is not like Python's <code>len()</code> function.  Instead, it is like Python's
[<tt>sys.getsizeof()</tt>](https://docs.python.org/3/library/sys.html#sys.getsizeof]) function. 🐞🐞🐞🐞

# Integers

C/C++ has more flavors of integers than you can shake a stick at.  The various types of integers differ in how many bits they comprise in hardware. 

This is in contrast to Python, where integers are implemented in software.  In particular, each type of integer in C/C++ can only represent a bounded range of numbers, while Python's integer system is unbounded.

Because C/C++ integers depend on the underlying hardware, code may behave differently on different machines.  That said, as processors become more and more standardized the likelihood of being surprised decreases.

In [None]:
cat -n src/int_widths.c

In [None]:
./a.out

Integers can be either **signed** or **unsigned**:
* A signed integer can be either negative or non-negative. One of its bits is interpreted as a sign bit.  
* An unsigned integer is **always nonnegative** (i.e., `>= 0`).  The bit that would otherwise be used to indicate the sign is used to double the range of nonnegative integers we can represent.

The different widths of integers allows us to optimize the use of memory.  For instance, in pixel intensities in computer images are the 256 = 2<sup>8</sup> integers in the range from 0 to 255.  This means they will fit in a one-byte (8 bit) integer.

Some of the types are synomyms:
* `short` = `short int` = `signed short` = `signed short int`
* `unsigned short` = `unsigned short int`
* `int` = `signed int`
* `long` = `long int` = `signed long int`
* `unsigned long` = `unsigned long int`
* `long long` = `long long int` = `signed long long int`
* `unsigned long long` = `unsigned long long int`

## `char`

The C standard guarantees that `sizeof(char) == 1` byte.  On consumer CPUs (e.g., x86-64 or ARM) a byte is 8 bits.

<img src="https://www.cs.wm.edu/~rml/images/danger.svg" style="height: 30px;"/> However, keep in mind the C standard does not specify how many bits are in a byte.

On some DSP chips and microcontrollers the number of bits may not be 8 (e.g., 16).

<img src="https://www.cs.wm.edu/~rml/images/danger.svg" style="height: 30px;"/> The C standard does not specify whether `char` are signed or unsigned, so you will need to declare variables `signed char` or `unsigned char` when you want to use them to store small integers.

## `short int`

A `short int` is intended to have fewer bits an `int`. You can use `short` rather than `short int`. There is also a `unsigned short int`.

## `int`

The basic kind of integer is the signed integer `int`.  There is also an `unsigned int`.  An `unsigned int` literal is indicated with a terminal `u` or `U`:

```
2147483648u;
2147483648U;
```

Both <code class="kw">int</code> and <code class="kw">unsigned int</code> have the same width; the difference is how the bit pattern is interpreted.  Here we print the signed value -1 as a signed integer (format code ```%d```, for "decimal integer") and an unsigned integer (format code ```%u```, for "unsigned integer"):

In [None]:
cat -n src/int_format.c

In [None]:
clang -Wall src/int_format.c

In [None]:
./a.out

This illustrates once again that C views variables simply as strings of bits.  In the call to `printf()` in line 8 the `%d` code tells `printf()` to interpret the 32 bit input `n` as a signed integer, while in the call in line 11 the `%u` code tells `printf()` to interpret the 32 bit input `n` as an unsigned integer.

## `long int`

Both ```long int``` and ```long long int``` are intended to have more bits than the ```int``` type.  The ```long int``` and ```long long int``` may be referred to as ```long``` and ```long long``` for short, though I usually use `long int` simply for consistency with other types.

These days a `long int` is usually 64 bits (8 bytes), and a `long long int` is also usually 64 bits &ndash; it is not wider than a `long int` because there are currently no processors that support 128 bit hardware integers (in the past there were).

A `long int` literal is indicated by a suffix of `l` or `L`:

```
2147483648l;  /* Don't use this since lower-case 'l' looks like a '1'. */
2147483648L;
```

An `unsigned long int` literal is indicated by a suffix of `ul` or `UL`:

```
2147483648ul;  /* Lower-case 'l' is tolerable since it's preceeded by 'u'. */
2147483648UL;
```

# `long long int`

For `long long int` and `unsigned long long int` literals we use double ells:

```
2147483648ll;  /* long long */
2147483648LL;

2147483648ull; /* unsigned long long */
2147483648ULL;


<img src="https://www.cs.wm.edu/~rml/images/danger.svg" style="height: 30px;"/>  By default, integer literals in C are of type `int`.

# Specification of integers in the C standard

The C standard specifies the following **minimal** requirements for the various types of integers.  The actual limits depend on the underlying hardware.  The implementation-defined values must be greater than or equal or magnitude (absolute value) to those shown in the table, and have the same sign.

|           |           |          | 
| :-------- | :-------------: | :----------: |
| number of bits for smallest object that is not a bit-field (byte) | 8 | |
| minimum value for a ```signed char``` | ```-127``` | ```−(2**7 − 1)``` |
| maximum value for a ```signed char``` | ```+127``` | ```+(2**7 − 1)``` |
| maximum value for a ```unsigned char``` | ```255``` | ```2**8 − 1``` |
| minimum value for a ```char``` | see below | |
| maximum value for a ```char``` | see below | |
| maximum number of bytes in a multibyte character | 1 |
| minimum value for a ```short int``` | ```-32767``` | ```−(2**15 − 1)``` |
| maximum value for a ```short int``` | ```+32767``` | ```+(2**15 − 1)``` |
| maximum value for a ```unsigned short int``` |```65535``` | ```2**16 − 1```
| minimum value for an ```int``` | ```-32767``` | ```−(2**15 − 1)``` |
| maximum value for an ```int``` | ```+32767``` | ```+(2**15 − 1)``` |
| maximum value for an ```unsigned int``` | ```65535``` | ```2**16 − 1``` |
| minimum value for a ```long int``` | ```-2147483647``` | ```−(2**31 − 1)``` |
| maximum value for a ```long int``` | ```+2147483647``` | ```+(2**31 − 1)``` |
| maximum value for an ```unsigned long int``` | ```4294967295``` | ```2**32 − 1``` |
| minimum value for a ```long long int``` | ```-9223372036854775807``` | ```−(2**63 − 1)``` |
| maximum value for a ```long long int``` | ```+9223372036854775807``` | ```+(2**63 − 1)``` |
| maximum value for a unsigned ```long long int``` | ```18446744073709551615``` | ```2**64 − 1``` |

This means that each type must be wide enough to represent the following ranges:

|           | minimum value | maximum value | minimum num. of bytes |
| :-------- | :-------------: | :----------: | :----------: |
| ```signed char``` | -127 | 127 | 1 |
| ```unsigned char``` | 0 | 255 | 1 | 
| ```char``` | see below | see below | 1 | 
| ```short int``` | -32,767 | 32,767 | 2 |
| ```unsigned short int``` | 0 | 65,535 | 2 |
| ```int``` | -32,767 | 32,767 | 2 | 
| ```unsigned int``` | 0 | 65,535 | 2 |
| ```long int``` | -2,147,483,647 | 2,147,483,647 | 4 |
| ```unsigned long int``` | 0 | 4,294,967,295 | 4 | 
| ```long long int``` | -9,223,372,036,854,775,807 | 9,223,372,036,854,775,807 | 8 |
| ```unsigned long long int``` | 0 | 18,446,744,073,709,551,615 | 8 |

If the value of a ```char``` is treated as signed then the minimum and maximum values of a ```char``` are the same as that of ```signed char```. Otherwise, the minimum value of a ```char``` is ```0``` and the maximum value will be that of an ```unsigned char```.

The header file [`limits.h`](https://en.wikipedia.org/wiki/C_data_types#limits.h) contains the ranges for integers for your specific C implementation.

## What to expect in the wild

Despite the latitude in the lengths of integers allowed by the C standard, on contemporary 64-bit consumer processors you will encounter the following:

|   Type     |     bytes       |  bits        |
| :--------: | :-------------: | :----------: |
| ```char``` | 1 | 8 | 
| ```short``` | 2 |16 |
| ```int``` | 4 | 32 |
| ```long``` | 8 | 64 |
| ```long long``` | 8 | 64 |

# Yet more integers!

To add to the fun, the C standard specifies yet more types of integers that might or might not be present in your implementation as they are not mandatory.  There are also integer types whose behavior is purely up to the implementation (compiler).

## Exact-width integers

You can use exact-width integers to make sure, for instance, you've got an 8-bit integer (as opposed to a 1-byte integer, which could have more than 8 bits).

In [None]:
cat -n src/exact_width.c

In [None]:
gcc -std=c23 src/exact_width.c

In [None]:
./a.out

## Bit-precise integers

C23 introduced **bit-precise integers**.  These allow you to specify the exact number of bits to use.

In [None]:
cat -n src/bit_precise.c

In [None]:
gcc -std=c23 src/bit_precise.c

In [None]:
./a.out

# Integer overflow

If a value lies outside the range of representable integers, strange things occur:

In [None]:
cat -n src/int_overflow.c

In [None]:
gcc src/int_overflow.c

In [None]:
./a.out

**'Zounds!**

This phenomenon is called **integer overflow** and is a very real danger.  Integer variables that are used as counters that run a long time are particularly prone to overflow.  The results of integer overflow range 
from [the amusing](https://www.cbc.ca/news/entertainment/psy-s-gangnam-style-breaks-the-limit-of-youtube-s-video-counter-1.2860186)
to the [disruptive](https://www.bleepingcomputer.com/news/microsoft/microsoft-exchange-year-2022-bug-in-fip-fs-breaks-email-delivery/)
to the [potentially](http://www.cs.wm.edu/~rml/teaching/c/docs/787_overflow.pdf) [disastrous](https://en.wikipedia.org/wiki/Year_2038_problem).

Overflow can also occur if you try to set an integer literal outside of its range, as in line 15 above.  In this situation the compiler is able to detect a potential problem and issues a warning.

On most contemporary processors then $4294967295 = 2^{32} - 1$ should be converted to $-1$.  The reason has to do with the way signed integers are represented in hardware.  Most current architectures use what is called the [two's complement](https://en.wikipedia.org/wiki/Two's_complement) representation, and in that system the bit pattern for $4294967295$,
```
11111111 11111111 11111111 11111111  /* All 32 bits on. */
```
is, when interpreted as a signed integer, $-1$.  (In case you were wondering, the competing system is called [one's complement](https://en.wikipedia.org/wiki/Ones'_complement).)

<div class="danger"></div><div class="danger"></div>
😱🐞 &nbsp; Watch out for integer overflow!! Using <code>long int</code> can help avoid overflow. &nbsp; 🐞😱

# Storage classes

C has a number of keywords that described the desired storage for a variable.  They include
* `const`: indicates the value of the variable cannot change;
* `constexpr`: indicates a value to be computed at compile time;
* `static`: indicates a persistent value or one with file scope;
* `extern`: indicates a global variable;
* `register`: hints that the variable should be stored in a register;
* `volatile`: hints aht the variable may be modified in a non-obvious manner.

We will not say anything about <code class="kw">register</code> and <code class="kw">volatile</code> other than that compilers are free to ignore them.

The most useful and most commonly used of these is `const`.

In [None]:
cat -n src/const.c

In [None]:
clang -Wall -pedantic src/const.c

The `static` storage class can either indicate
* a variable with file scope (i.e., visible to all functions in a single file), or
* a variable local to a function whose value persists after the function returns.

In [None]:
cat -n src/static.c

In [None]:
gcc src/static.c

In [None]:
./a.out

# Arithmetic

Arithmetic in C works the same as in Python, with one exception: watch out for integer division in C/C++/Java.  When **both** the numerator and the denominator are integers, division acts like floor division in Python; otherwise, it behaves like regular division.

Thus, the expression ```5 / 9``` in C is the same as ```5 // 9``` in Python:

In [None]:
cat -n src/floor_div.c

In [None]:
gcc src/floor_div.c

In [None]:
./a.out

# Type conversions 🐞🐞🐞🐞

One of the trickier matters in C are type conversions.  Types can be converted either explictly, by you, or implicitly, by C.

A **cast** is an explicit conversion in which you specify the type to convert to.  The syntax is
```
(type) expression
```
to obtain the value of `expression` as if it were assigned to a variable of type `type`.  Thus,
```
int n;
double x;
x = (double) n;
```
explicitly converts the value of an `int` to a `double`.

C will also perform implicit conversions (which some call **type coercion**).  For instance, 
```
int n;
double x;
x = n;
```
will also result in a conversion of the value of `n` to a `double`.

Implicit conversions can also occur when performing arithmetic.  K&amp;R give the following informal conversion rules **provided no `unsigned` variables are involved**:
<blockquote>
    <ul>
        <li>If either operand is <code>long double</code> convert the other operand <code>long double</code>.</li>
        <li>Otherwise, if either operand is <code>double</code> convert the other operand <code>double</code>.</li>
        <li>Otherwise, if either operand is <code>float</code> convert the other operand <code>float</code>.</li>
        <li>Otherwise, convert <code>char</code> and <code>short</code> to <code>int</code>.</li>
        <li>Then, if either operand is <code>long</code> convert the other to <code>long</code>.</li>
    </ul>
</blockquote>

If any of the variables are `unsigned` then the conversion rules get complicated.  I would advise against mixing signed and unsigned types unless you are an expert (and even then there may be problems).

A **widening conversion** or **promotion** is a conversion from one data type to another that can represent all possible values of the original type.  These conversions are safe.  The conversions described in the quote above are widening conversions (except for a few unlikely situations).

A **narrowing conversion** is a conversion from one data type to another that **cannot** represent all possible values of the original type.  If we try to convert a value that cannot be exactly represented in the target type, there is no telling what might happen!  For instance, what happens if we try to store 1024 in an 8-bit integer?  or assign an `unsigned int` a negative value?

# Integer conversions 🐞😱🐞😱

**Unfortunately, type conversion can be dangerous.**

In [None]:
cat -n src/int_casts.c

The implicit conversion at line 17 is a narrowing conversion:
```
uc = ui;
```

In [None]:
gcc -Wall src/int_casts.c
./a.out

## Integer conversion rules 🐞🐞🐞🐞

The general rule for converting one type of integer into another is that the value of the result should be the same as the value of the original, **if that is possible**.  Here we will look at some cases where conversion behaves as we would expect, and some cases where it does not.

If it is **not possible** to preserve the original value, then the conversion follows the following rules.

**Rule 1**. If the result is signed, then the conversion is considered to have overflowed **and the result is undefined**.  The actual behavior will depend on the C implementation and underlying hardware. 

Here we show what can go wrong due to overflow.

In [None]:
cat -n src/conversion_1.c

In [None]:
gcc src/conversion_1.c
./a.out

**Rule 2**. If the result is unsigned, then the result will be the value of the target type that is equal modulo 2<sup>n</sup> to the original value, where `n` is the number of bits used for the target type.  Since `n` depends on the hardware, the actual behavior will depend on the C implementation and underlying hardware.

In [None]:
clang -Wall -pedantic src/conversion_2.c
./a.out

**Rule 3**. If an unsigned type is converted to a signed type of the same width, the conversion is considered to have overflowed if the original value lies outside the range of values represented by the target type.  Since the behavior of overflow depends on the C implementation and underlying hardware, so does the result of this conversion.

In [None]:
gcc src/conversion_3.c
./a.out

**Rule 4**. If a negative signed value is converted to an unsigned type of greater width, then the conversion must behave as if the original value was converted to a signed version of the target type and then converted to an unsigned version.

In [None]:
gcc integers/conversion_4.c
./a.out

Besides assignments, implicit conversions will also take place in boolean expressions such as `m < n`, with the potential for all manner of wacky surprises.

😱🐞😱🐞😱🐞😱🐞😱🐞😱🐞 

So watch out for:
<ul>
<li> conversions from wider types to narrower types;
<li> conversions between signed and unsigned types.
</ul>

Do not mix:
<ul>
    <li>wine and beer,</li>
    <li>bleach and ammonia,</li>
    <li>brown shoes and blue suits,</li>
    <li>handguns and tequila,</li>
    <li>signed and unsigned integers.</li>
</ul>

**You have been warned.**

🐞😱🐞😱🐞😱🐞😱🐞😱🐞😱 

# Floating point in C/C++

While the standard does not require it, most C/C++ use the IEEE-754 standard for floating point arithmetic.

As with integers, floating point in C/C++ reflects the capability of the hardware.  C/C++ has three types of floating point numbers, `float`, `double`, and `long double`.  The `double` in C/C++ is comparable to Python's `float`.

The term `double` refers to **double precision**, the idea being that a `double` has twice the precision as a **single precision** `float`.  In practice a `double` will have twice the number of bits as a `float`.  

The `long double` is implementation dependent.  Depending on the C implementation this type may be 
* more many bits than a `double`, or
* the same number of bits as a `double`.

On x86-64 architectures, `long double` is almost surely 80 bits.

## Ranges of floating point numbers

| IEEE-754 type | typical type in C/C++ | length | sign | significand | exponent | smallest positive value | largest positive value |
| :-----------: | :----: | :------------: | :------------: | :-----------: | :------------: | :------------: | :-----------: |
| binary32     | ```float``` | 32 bits | 1 bit | 23 + 1 bits | 8 bits | 2<sup>-126</sup> &#8776; 1.2 &#215; 10<sup>-38</sup> | (2 - 2<sup>-23</sup>) &#215; 2<sup>127</sup> &#8776; 3.4 &#215; 10<sup>38</sup> |
| binary64     | ```double``` | 64 bits | 1 bit | 52 + 1 bits | 11 bits | 2<sup>-1022</sup> &#8776; 2.2 &#215; 10<sup>-308</sup> | (1 + (1 - 2<sup>-52</sup>)) &#215; 2<sup>1023</sup> &#8776; 1.8 &#215; 10<sup>308</sup> |
| binary128 | possibly ```long double```<sup>1</sup> | 128 bits | 1 bit | 15 bits | 112 + 1 bits | 2<sup>−16382</sup> ≈ 3.4 × 10<sup>−4932</sup> | (1 + (1 - 2<sup>-122</sup>)) × 2<sup>16383</sup> ≈ 1.2 × 10<sup>4932</sup> |

<sup>1</sup>But probably not.



## Half precision floating point

IEEE-754 specifies a half precision 16 bit floating point:
| IEEE-754 type | length | sign | significand | exponent | smallest positive value | largest positive value |
| :-----------: | :------------: | :------------: | :-----------: | :------------: | :------------: | :-----------: |
binary16     | 16 bits | 1 bit | 10 + 1 bits | 5 bits | 2<sup>-14</sup> &#8776; 6.1 &#215; 10<sup>-5</sup> | (2 - 2<sup>-10</sup>) &#215; 2<sup>15</sup> = 65504 |

An alternative to IEEE <tt>binary16</tt> half precision is <tt>bfloat16</tt>, Brain floating point, developed by Google.  It is <tt>binary32</tt> with 16 bits of the significand lopped off.  It is less precise than <tt>binary16</tt> but has range comparable to <tt>binary32</tt>:
| Non-IEEE-754 type | length | sign | significand | exponent | smallest positive value | largest positive value |
| :-----------: | :----------: | :----------: | :--------: |  :----------: | :----------: | :--------: |
bfloat16     | 16 bits | 1 bit | 7 + 1 bits | 7 bits | 2<sup>-126</sup> &#8776; 1.2 &#215; 10<sup>-38</sup> | (2 - 2<sup>−23</sup>) &#215; 2<sup>127</sup> &#8776; 3.4 &#215; 10<sup>38</sup> |

C does not offer a half precision floating point (though C++23 and later do) but compilers frequently have extensions to support it, e.g.,
* [gcc](https://gcc.gnu.org/onlinedocs/gcc/Half-Precision.html),
* [clang](https://releases.llvm.org/16.0.0/tools/clang/docs/LanguageExtensions.html#half-precision-floating-point),
* [icx](https://builders.intel.com/docs/networkbuilders/intel-avx-512-fp16-instruction-set-for-intel-xeon-processor-based-products-technology-guide-1651874188.pdf).

In [None]:
cat -n src/float_widths.c

In [None]:
gcc src/float_widths.c

In [None]:
./a.out

## The special values ```inf``` and ```nan```

There are two special floating point values, ```inf``` and ```nan```.

The value ```inf``` is an infinite value $\infty$.  It is larger than all other numbers except itself.  A quick way to obtain ```inf``` is to divide a non-zero number by zero.

A quick way to obtain a ```nan``` is to divide 0.0 by itself.

In [None]:
cat -n src/inf_nan.c

In [None]:
gcc src/inf_nan.c

In [None]:
./a.out

Because `nan` is not a number it is not equal to any other number, and it is also not equal to itself!

Any arithmetic involving a `nan` results in a `nan`, so once they creep into a calculation you will generally see them all over the place.

Floating point numbers will **underflow to zero** or **overflow to infinity** if they lie outside the range of representable numbers:

In [None]:
cat -n src/float_overflow.c

In [None]:
gcc src/float_overflow.c

In [None]:
./a.out