<a href="https://colab.research.google.com/github/jacob-jones23/MAT-422/blob/main/Module_A.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Module A**
In this chapter, we will learn about different representation of numbers and how they are useful for computers. By the end of this chapterwe should know and understand some representations of numbers that are used in computing, how to convert them and decimal numbers, their primary advantages and disadvantages and the roundoff errors.

##**9.1: Base-N and Binary**
The **decimal system** is a way of representing numbers that you are familiar with from elementary school. In the decimal system, a number is represented by a list of digits from 0 to 9, where each digit represents the coefficient for a power of 10.

EXAMPLE: Show the decimal expansion for 147.3.

147.3 = 1⋅10$^{2}$ + 4⋅10$^{1}$ + 7⋅10$^{0}$ + 3⋅10$^{-1}$.
  * Since each digit is associated with a power of 10, the decimal system is also known as **base10** because it is based on 10 digits (0 to 9).

However, there is nothing special about base10 numbers except that we are more accustomed to using them. For example, in base3 we have the digits 0, 1, and 2 and the number

121 (*base 3*) = 1⋅3$^{2}$+2⋅3$^{1}$ + 1⋅3$^{0}$ = 9 + 6 + 1 = 16(*base 10*)
  * It is useful to denote which representation a number is in. In this chapter every number will be followed by its representation in parentheses (e.g., 11(base10) means 11 in base10) unless the context is clear.

A very important representation of numbers for computers is base2 or **binary** numbers. In binary, the only available digits are 0 and 1, and each digit is the coefficient of a power of 2. Digits in a binary number are also known as a **bit**.




Convert to binary:

In [None]:
# 37(base 10) = 100101 (base 2)
37 == 32 + 4 + 1 == 1*(2**5) + 0*(2**4) + 0*(2**3) + 1*(2**2) + 0*(2**1) + 1*(2**0)


True

In [None]:
# 17(base 10) = 10001 (base 2)
17 == 16 + 1 == 1*(2**4) + 0*(2**3) + 0*(2**2) + 0*(2**1) + 1*(2**0)

True

  Get results of addition and multiplication in decimal:

In [None]:
# 37 + 17 = 54
# 100101 + 10001 = 110110
54 == 32 + 16 + 4 + 2 + 0 == 1*(2**5) + 1*(2**4) + 0*(2**3) + 1*(2**2) + 1*(2**1) + 0*(2**0)


True

In [None]:
# 37 * 17 = 629
# 100101 * 10001 = 1001110101
629 == 512 + 64 + 32 + 16 + 4 + 1 == 1*(2**9) + 0*(2**8) + 0*(2**7) + 1*(2**6) + 1*(2**5) + 1*(2**4) + 0*(2**3) + 1*(2**2) + 0*(2**1) + 1*(2**0)

True

##**9.2: Floating Point Numbers**
The number of bits is usually fixed for any given computer. Using binary representation gives us an insuffficient range and precision of numbers to do relevant engineering calculations. To achieve the range of values needed with the same number of bits, we use **floating point** numbers or **float** for short. Instead of utilizing each bit as the coefficient of a power of 2, floats allocate bits to three different parts: the **sign indicator**, *s*, which says whether a number is positive or negative; **characteristic** or **exponent**, *e*, which is the power of 2; and the **fraction**, *f*, which is the coefficient of the exponent. Almost all platforms map Python floats to the **IEEE754** double precision - 64 total bits. 1 bit is allocated to the sign indicator, 11 bits are allocated to the exponent, and 52 bits are allocated to the fraction.

With 11 bits allocated to the exponent, this makes 2048 values that this number can take. Since we want to be able to make very precise numbers, we want some of these values to represent negative exponents. To accomplish this, 1023 is subtracted from the exponent to normalize it. The value is subtracted from the exponent is commonly referred to as the **bias**. The fraction is a number between 1 and 2. In binary, this means that the leading term will always be 1, and, therefore, it is a waste of bits to store it. In Python, we could get the float information using the *sys* package as shown below:

In [None]:
#float information
import sys
sys.float_info

sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1)

A float can then be represented as:
*n = (-1)$^{s}$2$^{e-1023}$(1 + f).*


We call the distance from one number to the next the **gap**. Because the fraction is multiplied by 2$^{e-1023}$, the gap grows as the number represented grows. The gap at a given number can be computed using the function *spacing* in *numpy*.


In [None]:
import numpy as np
# Use the spacing function to determine the gap at 1e9.
np.spacing(1e9)

1.1920928955078125e-07

In [None]:
1e9 == (1e9 + np.spacing(1e9)/3)

True

There are special cases for the value of a floating point number when e = 0 (i.e., e = 00000000000 (base2)) and when e = 2047 (i.e., e = 11111111111 (base2)), which are reserved. When the exponent is 0, the leading 1 in the fraction takes the value 0 instead. The result is a **subnormal number**, which is computed by *n = (-1)$^{s}$2$^{e-1023}$(1 + f)*

In [None]:
largest = (2**(2046-1023))*((1 + sum(0.5**np.arange(1,53))))
largest

1.7976931348623157e+308

In [None]:
sys.float_info.max

1.7976931348623157e+308

In [None]:
smallest = (2**(1-1023))*(1+0)
smallest

2.2250738585072014e-308

In [None]:
sys.float_info.min

2.2250738585072014e-308

Numbers that are larger than the largest representable floating point number result in **overflow**, and Python handles this case by assigning the result to *inf*. Numbers that are smaller than the smallest subnormal number result in **underflow**, and Python handles this case by assigning the result to 0.

In [None]:
2**(-1075)

0.0

In [None]:
2**(-1075) == 0

True

In [None]:
2**(-1074)

5e-324

##**9.3: Round-off Errors**
In the previous section, we talked about how the floating point numbers are represented in computers as base 2 fractions. This has a side effect that the floating point numbers can not be stored with perfect precision, instead the numbers are approximated by finite number of bytes. Therefore, the difference between an approximation of a number used in computation and its correct (true) value is called **round-off error**. It is one of the common errors usually in the numerical calculations. The other one is **truncation error**.

###**Representation error**
The common form round-off error is the representation error in the floating point numbers. A simple example will be to represent π. We know that π is an infinite number, but when we use it, we usually only use a finite digits. For example, if you only use 3.14159265, there will be an error between this approximation and the true infinite number.

###**Round-off error by floating-point arithmetic**
The error between 4.845 and 4.8 should be 0.055. But if you calculate it in Python, you will see the 4.9 - 4.845 is not equal to 0.055.


In [1]:
4.9 - 4.845 == 0.055
# we actually get 0.055000000000000604

False

This is because the floating point can not be represented by the exact number, it is just approcimation, and when it is used in arithmetic, it is causing a small error.

In [2]:
4.9 - 4.845

0.055000000000000604

In [3]:
4.8 - 4.845

-0.04499999999999993

Another example is 0.1 + 0.2 + 0.3 does not equal 0.6, which has the same cause.

In [4]:
0.1 + 0.2 + 0.3 == 0.6

False

Though the numbers cannot be made closer to their intended exact values, the *round* function can be useful for post-rounding so that results with inexact values become comparable to one another:

In [8]:
round(0.1 + 0.2 + 0.3, 5) == round(0.6, 5)

True

###**Accumulation of round-off error**
When we are doing a sequence of calculations on an initial input with round-off error due to inexact representation, the errors can be magnified or accumulated. The following is an example, that we have the number 1 add and subtract 1/3, which gives us the same number 1. But what if we add 1/3 for many times and subtract the same number of times 1/3, do we still get the same number 1?

In [9]:
# If we only do it once
1 + 1/3 - 1/3

1.0

In [10]:
def add_and_subtract(iterations):
  result = 1

  for i in range(iterations):
    result += 1/3

  for i in range(iterations):
    result -= 1/3
  return result

In [11]:
# If we do this 100 times
add_and_subtract(100)

1.0000000000000002

In [12]:
# If we do this 1000 times
add_and_subtract(1000)

1.0000000000000064

In [13]:
# If we do this 10000 times
add_and_subtract(1000)

1.0000000000000064