>![image](JMUlogo.png)
>
> # Math 248 Computers and Numerical Algorithms
> # Hala Nelson
> # Week 8: Machine Representation Of Numbers And Resulting Errors

# These notes are great

https://statmath.wu.ac.at/courses/data-analysis/itdtHTML/node55.html

# How many bits (binary digits) does a computer allocate to represent numbers?

* Each computer is different (personal computers are different than high performance computers).
* Computers treat integers and real numbers differently: They allocate a certain number of bits to represent all integers. They represent real numbers as floating point numbers: 1 bit for the sign, certain bits for the mantissa, and certain bits for the exponent (power of 2). 
* This means that there is only a finite amount of numbers whether integer or real that can be represented on a machine. Thus any number that we include in our computations that cannot be accurately represented (because there are not enough allocated bits) will introduce errors into our computations. These errors are called round-off errors and they will accumulate if we keep using the in-accurately represented numbers in our iterations, etc.
* Because there are only finitely many bits that are used to represent integers and real numbers, there are a largest and smallest integers that can be represented, and a largest and smallest real numbers that can be represented, and anything above or below these largest magnitude numbers will be returned as a NaN or inf.

# Mathematical theory v.s. computational reality

**An example**: x**{0.5} versus np.sqrt(x): On some computers, including the CRAY supercomputers of the 1990s, exponentiation using fractional real exponents (in this case 0.5) was exceedingly slow because it made use of table lookups from logarithm tables. Thus, on CRAY machines, the first variant ran 40 times slower than the second! Even though mathematically these are the exact same thing, one approach requires far more computational effort than the other. In programming, be mindful of good, better, and best ways to accomplish a goal.

## Addition on computers is not commutative?! Why?! 

Write a program that adds $1+\frac{1}{2}+\frac{1}{3}+\dots+\frac{1}{10^6}$ and another one that adds $\frac{1}{10^6}+\frac{1}{10^6-1}+\dots+\frac{1}{3}+\frac{1}{2}+1$ and compare the answers. In theory, these two are the same. Computationally, they are not!

In [15]:
sum=0
for n in range(1,10**6+1):
    sum=sum+1/n
print(sum)

sum=0
for n in range(1,10**6+1):
    sum=sum+1/(10**6-n+1)
print(sum)

14.392726722864989
14.392726722865772


## These notes will be helpful for our compuations this week

- Adding in base two: $(1+1=10)_2$. 
- Note that $(11111)_2=2^5-1$. This is similar to $999=10^3-1$ and $99999=10^5-1$. 
- So if we have $d$ binary bits which are all one, then they represent the integer $2^{d}-1$.
- $2^d-2^{d-1}=2^{d-1}(2-1)=2^{d-1}$.
- $1000=10^3$ in base 10. Similarly $1000=2^3$ in base 2. So 1 and 31 zeros in base two is equal to $2^{31}$. 

# Storing integers on a machine 

- A machine allocates $d$ binary digits to store an integer. This is the word length for integers, measured in bits or bytes (each byte is 8 bits). Usually, an integer word is 4 bytes or 32 bits. 
- The largest integer stored on such a machine will be when the first digit is zero (since it is positive) and all the remaining $d-1$ digits are one, which amounts to $2^{d-1}-1$.
- Negative integers are stored as $2^d$ complement of their absolute values, so that is the binary representation of: $2^d-|NegativeInteger|$. One advantage of this is the unique representation of zero. Another advantage is the ability to store one extra negative number, namely: $-2^{d-1}$, stored as $2^{d}-|-2^{d-1}|=2^d-2^{d-1}=2^{d-1}$, essentially turning on all the $d-1$ digits plus the first digit for the sign. 

# Storing real numbers on a machine: Floating point numbers (all in base 2) 
Number=$\pm.1b_2b_3\dots b_n\times 2^{c_1 c_2\dots c_m}$ where $b's$ and $c's$ are zeros or ones. This is analogous scientific notation in base $10$.

# Precision (single precision, double precision)
This is the number of significant decimal digits accurately represented by the mantissa. So it is $p$ such that $2^n=10^p$ where $n$ is the number of bits assigned for the mantissa.

# Underflow and overflow 
When the exponent is too large or too small to be accurately represented (need more bits for exponent).

# Absolute error ($\delta=|x-\bar{x}|$) due to approximation of a real number by a floating point number. This quantity is not dimensionless, it has the units of $x$.

# Relative error ($\epsilon=\frac{|x-\bar{x}|}{x}$) due to approximation of a real number by a floating point number. This quantity is dimensionless.

# Unit round off error or machine epsilon $u$ (depends on the machine)
This is the largest relative error that a machine can commit due to approximation of a real number by a floating point number (due to chopping or rounding). A machine that chops can commit twice as much error as a machine that rounds.

# Propagation of roundoff error due to arithmetic operations
Calculating the error bounds associated with each arithmetic operation. 

![image](errorsubtract1.JPG)
![image](errorsubtract2.JPG)
![image](errorsubtract3.JPG)

# Catastrophic loss of significance 
Due to numerically subtracting two numbers very close to each other. How to avoid such a situation? Enter equivalent mathematical formulas that avoid the problematic subtraction.

![image](sigloss1.JPG)
![image](sigloss2.JPG)
![image](sigloss3.JPG)

# Read Sochacki Book Pages 19-33.

# Explain Problem 9 Sochacki's book (exercise 9 on Chapter 3 page 33).

Write a program that produces the results of problem $9$ in chapter 3. 

# Exercises (on Quiz for Week 9)

1. What is a supercomputer? What is Cray supercomputer? What is the most powerful supercomputer currently present?

2. What is the word length (for both integers and floating point numbers) that a supercomputer usually assigns? 

3. Given a binary machine that reserves $4 bytes$ for an integer (recall that 1 byte=8 bits), find the range of integers that can be represented. How many numbers are those?

4. Write the floating point representation for the binary number $0.00011011$ using a $4$-bit mantissa then write the way it will be stored in the machine (use $5$ bit exponent).

5. Compute the largest floating point number that Python can represent. Do a google search to check your answer.

6. Suppose you are employed to design a digital watch. How many bits will you need to store the time? Justify your answer.

7. Suppose $x = (13)_{10}$. What is $fl(x)$ on a binary computer with a 3-bit mantissa and 3-bit exponent?

8. Find the precision of real numbers on a computer with a 64-bit word, of which 15 bits are reserved for the exponent.

9. How does a single precision type represent the number -285.75?

10. Find the absolute and relative round-off errors incurred when $(23)_{10}$ is approximated by a floating point number with a 3-bit mantissa in base-2.

11. Express $0.25$ in floating point notation (use  $6$ bit mantissa, and decimal notation for the exponent).

12. Suppose you have a binary machine that allows a 3 bit mantissa, and exponents $m=-2, -1, 0, 1, 2$. Compute all the floating point numbers that this machine represents. How would this machine store the real number $e$? What is the relative error?

13. Suppose you have a binary machine that allows 4 bit mantissa, and exponents $m=-1, 0, 1$. Compute all the floating point numbers that this machine represents. How would this machine store the real number $e$? What is the relative error?

14. Define machine epsilon. 

15. Consider a machine in base-2 with a 4-bit mantissa and 4-bit exponent. What is the machine epsilon for this machine?

16. Calculate the relative round-off error due to approximating $\left(\frac{1}{3}\right)_{10}$ by a floating point number with a $5$ bit mantissa. What is the machine epsilon for such a machine?

17. Write a program that manually computes the machine epsilon of your machine in Python. Explain the logic behind this program.

18. Research: Provide an example where catastrophic loss of significance occurs.

19. Overflow: Give an example when overflow occurs.

20. Precision: IEEE specifications for double precision variables use one bit for the sign, 11 bits for the exponent, and 52 bits for the mantissa. What precision does it have? 

21. Compute the precision of a Cray supercomputer.

22. One way of calculating the inverse hyperbolic cosine function is to rewrite it as $\cosh^{-1}(x)=-\ln(x-\sqrt{x^2-1})$. Explain why this is a bad idea for large values of x, then rewrite it in a form that is worth using for large x.

23. Explain why numerically calculating $\ln(x-\sqrt{x^2-1})$ could result in catastrophic loss of significance, then explain how this problem can be rectified.

24. Explain why numerically calculating $\ln(b-\sqrt{b^2-10^{-8}})$ could result in catastrophic loss of significance, then explain how this problem can be rectified.

25. Compute the precision of a machine with $23$ bit mantissa. Explain the meaning of your answer.

26. The sequence $n\sin\left(\frac{1}{n}\right)$ converges to $1$ as $n\to\infty$ (analytical proof: use L'Hospital's rule). Write a program that verifies this result numerically, then discuss your limitation with how large your numerical $n$ could get. What would happen to your limit numerically if $n=10^{17}$ (you don't have to run your program up to $n=10^{17}$ to answer this question)?

27. Prove that the relative error resulting from adding two numbers on the machine is less than or equal to $2u$, where $u$ is machine epsilon.

In [21]:
# Check the largest floating point number in python

print(1.7e308)
print(1.799e308)

1.7e+308
inf
