# Reading 1-4 - IEEE 754 Format

### What about decimals?

To this point, we have only represented signed integers and unsigned integers (with ASCII being represented as a signed integer). But what if we want to represent 1.6? Or how about -3.7 x 10<sup>-5</sup>?

Bear in mind, we still have the same binary representations in a computer, so we need to develop a format to represent <i>fractions</i> (i.e. 16/10 as 1.6) with discrete, binary values.

### Floating Point binary representation 

Since we are trying to use discrete, real numbers to represent irrational numbers (such as pi), we must understand the difference between <i>precision</i> and <i>accuracy</i> in a computer.

Integers in a computer are <b>accurate</b> in we are able to represent the <i>exact</i> value of the integer in the computer, provided we have enough bits to represent those values.

We must measure the representation of decimals and irrational numbers to a level of <b>precision</b>, meaning that we can never obtain a purely <i>exact</i> value, but we can obtain a <i>reasonably accurate</i>representation of that value.

### Single Precision

In the UNIX operating system we will code C/C++ on in this course, we will be able to use 32-bit and 64-bit precision value. We will call them <i>floating point</i> since we will use the format to represent where the decimal point should be.

<b>32-Bit Single Precision</b>: We will call a 32-bit floating point value a <code>float</code> in this course.
<ul>
<li>Most Significant Bit is the sign bit</li>
<li>Next 8 are the exponent. We use this as an offset to be able to do positive and negative exponents. Exponent – 127 => Range is 2<sup>0-127</sup> to 2<sup>255-127</sup></li>
<li>Next 23 are the fraction, which is the number after the decimal</li>
</ul>

<b>Equation</b>: <code>(-1)<sup>sign</sup> x (1+Fraction) x 2<sup>exponent – 127</sup></code>

![IEEE 754 Single Precision Format](https://raw.githubusercontent.com/mmorri22/su23-cse20332/main/readings/reading01/Reading%201-4-1%20-%20Single%20Precision.png)

### Single-Precision Floating Point Example

To combine concepts, let's attempt the following example: Convert the following IEEE 754 Single Precision Formation value to scientific notation:<p></p>

<center><code>0 01111110 10110000000000000000000</code></center><p></p>

First, we see that the sign = 0, so out solution is <b>positive</b>

Next, we calculate the exponent like a unsigned integer:<br>
<code>0x2<sup>7</sup> + 1x2<sup>6</sup> + 1x2<sup>5</sup> + 1x2<sup>4</sup> + 1x2<sup>3</sup> + 1x2<sup>2</sup> + 1x2<sup>1</sup> + 0x2<sup>0</sup> = 126<sub>10</sub></code>

For the fraction, we start from left to right, and multiply in descending fractions. For the first four bits of the fraction <code>1011</code>, we perform the following:<p></p>
<code>1x2<sup>-1</sup> + 0x2<sup>-2</sup> + 1x2<sup>-3</sup> + 1x2<sup>-4</sup></code>, which is equivalent to:<p></p>
<code>1x0.5 + 0x0.25 + 1x0.125 + 1x0.0625 = 0.6875</code><p></p>

Finally, we can plug these values into the original equation:

<code>(-1)<sup>sign</sup> x (1+Fraction) x 2<sup>exponent – 127</sup></code><p></p>

<code>(-1)<sup>0</sup> x (1+0.6875) x 2<sup>126 – 127</sup> = 0.84375</code>

<b>Visualize It!</b> The example of determining the value of <code>0 01111110 10110000000000000000000</code> in IEEE 754 Single Precision format:

<center><a href="http://www.youtube.com/watch?feature=player_embedded&v=I-AIfbAs3Us" target="_blank">
 <img src="http://img.youtube.com/vi/I-AIfbAs3Us/mqdefault.jpg" target="_blank" width="240" height="180" border="10" />
</a></center><p></p>

><b>Try it yourself</b>: Convert the following IEEE 754 Single Precision Formation value to scientific notation<p></p>
><center><code>1 10000011 11010000000000000000000</code></center>

### Double Precision

In the UNIX operating system we will code C/C++ on in this course, we will be able to use 32-bit and 64-bit precision value. We will call them <i>floating point</i> since we will use the format to represent where the decimal point should be.

<b>64-Bit Dobule Precision</b>: We will call a 64-bit floating point value a <code>double</code> in this course.
<ul>
<li>Most Significant Bit is the sign bit</li>
<li>Next 11 bits are the exponent. We use this as an offset to be able to do positive and negative exponents. Exponent – 1023 => Range is 2<sup>0-1023</sup> to 2<sup>2047-1023</sup></li>
<li>Next 52 bits are the fraction, which is the number after the decimal</li>
</ul>

<b>Equation</b>: <code>(-1)<sup>sign</sup> x (1+Fraction) x 2<sup>exponent – 127</sup></code>

![IEEE 754 Double Precision Format](https://raw.githubusercontent.com/mmorri22/su23-cse20332/main/readings/reading01/Reading%201-4-2%20-%20Double%20Precision.png)

### <font color = "red">Class Introduction Question #7 - How do we represent decimals and irrational numbers in binary</font>