# GEOPHYS 257 (Winter 2023)
## Name: Joseph Stitt

## Data Types and Basic Machine Errors

In this lab we will be covering Numeric Data Types for both Python and Numpy, as well as Machine errors. Although the problems in this lab are not from *Python Numerical Methods*, please read [Chapter-9](https://pythonnumericalmethods.berkeley.edu/notebooks/chapter09.00-Representation-of-Numbers.html). Pay particular attention to the *Round-off Errors* secion, but the first two sections are mostly FYI; however, these sections provide an understanding of how most computational systems represent fractional numbers via binary numbers.

[//]: <> (Notebook Author: Thomas Cullison, Stanford University, Jan. 2023)

## External Resources
If you have any question regarding some specific Python functionality you can consult the official [Python documenation](http://docs.python.org/3/).

* [Python Numeric Types](https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex)
* [Numpy Data Types](https://numpy.org/doc/stable/user/basics.types.html)

## Python Built-in Numeric Types (not the same as Numpy types)

In Python it is not necessary to declare variables. Once a value is assigned to a variable the variable will have the type of that value. That behaviour is known as **dynamic typing**. A variable can also change its type at any point of the program execution. That has its advantages but there are also some pitfalls.

1. Int
1. Float
1. Complex (really the 3rd type is *imaginary* and the complex number is like a tuple)
1. Type Casting

## Exercise 0

#### First read [Chapter 9](https://pythonnumericalmethods.berkeley.edu/notebooks/chapter09.00-Representation-of-Numbers.html) in *Python Numerical Methods*.

Then begin this lab by first reading the code below; then running it, and, finally, by examing the results. Continue to the next exercise afterwards.

In [1]:
# 1
x=2
print('x =',x)
print('x type:',type(x))
print()

# 2 manual casting int to float
cx=float(x)
print('cx =',cx)
print('cx type:',type(cx))
print()

# 3
y=1.2
print('y =',y)
print('y type:',type(y))
print()

# 4 manual casting float to int
cy=int(y)
print('cy =',cy) # !!! notice the rounding !!!
print('cy type:',type(cy))
print()

# 3
z=1 + 2j
print('z =',z)
print('z type:',type(z))
print()

# 4 dynamic casting of y to complex
print('z*y =',z*y)
print('z*y type:',type(z*y))
print()

x = 2
x type: <class 'int'>

cx = 2.0
cx type: <class 'float'>

y = 1.2
y type: <class 'float'>

cy = 1
cy type: <class 'int'>

z = (1+2j)
z type: <class 'complex'>

z*y = (1.2+2.4j)
z*y type: <class 'complex'>



## Exercise 1: Dynamic Type Casting

You might have noticed that $z \cdot y$ above returned a complex number. This is related to something called *type casting*. The value stored in $y$ was actually casted (think transformed) into a complex number and then the multiplication was computed. Thus, the two lines below will yield equivalent answers.

```python 
z*y
z*(y + 0.j)
```

Note, the dot in '0.j' is needed to tell Python that I'm adding a floating point imaginary number. This dot is only neccessary when using '0' as the imaginary part, or when you wish to explicitly define a values as being a float type.

To confirm that the above two lines are equivelant, try the below line for yourself. 

```python 
z*y == z*(y + 0.j)
```

It will return a value of 'True'. The '==' is a comparison operator, which we will discuss later.

### For this exercise please do the following in the cell below.
1. Run the comparison code above and print the result. If you don't believe the answer, change '0.j' to '1j' and run the code again.
1. Type-cast 0.5 and 0.99 to integers and print the results
1. Similar to above, but now type-cast 100.5 and 100.99 to integers
1. Set $x$ equal to a floating point number with a non-zero value left of the decimal point. Then finish the line below such that $x$ is rounded to the nearest integer no mater what value is to the right of the decimal point. Print that result.
1. Store the results of the following statements into a list (*hint: use .append()*). After that, run the code for section 5 below which maps each element in the list to it's corresponding boolean value. Using comments (i.e. #comment), discuss the results. 
```python 
True
True + 1
True - 1
True*1
True*0
True*3.1425
True*-3.1425
False
False + 1
False - 1
False*1
False*0
False*3.1425
False*-3.1425
```

In [20]:
# 1. comparison
print('#1')
is_same = z*y == z*(y + 0.j) # your code
print(f'Equivalent?: {is_same}')

# 2. type-cast
print('\n#2')
print('0.5 casted to int: ', int(0.5))
print('0.5 casted type: ', type(int(0.5)))
print('0.99 casted to int: ', int(0.99))
print('0.99 casted type: ', type(int(0.99)))

# 3. type-cast
print('\n#3')
print('100.5 casted to int: ', int(100.5))
print('100.5 casted type: ', type(int(100.5)))
print('100.99 casted to int: ', int(100.99))
print('100.99 casted type: ', type(int(100.99)))


# 4. round to nearist integer (you need to add something before casting)
print('\n#4')
x = 1.05
print('casted 1.05: ', round(x)) # interesting, it rounds to the nearest integer which is 1

# 5. 'Boolean' is a number
print('\n#5')
mylist = []
mylist.append(True)
mylist.append(True + 1)
mylist.append(True - 1)
mylist.append(True*1)
mylist.append(True*0)
mylist.append(True*3.1425)
mylist.append(True*-3.1425)
mylist.append(False)
mylist.append(False + 1)
mylist.append(False - 1)
mylist.append(False*1)
mylist.append(False*0)
mylist.append(False*3.1425)
mylist.append(False*-3.1425)

## name your list: mylist

# your code

print(f'mylist: {mylist}')
print(f'bool(mylist): {list(map(bool,mylist))}')
# Ok, so immediatly I was confused by the values in mylist. I saw that, for example, True - 1 printed out to be 0. 
# I also saw False*1 equaled 0 which also confused me. Shouldn't there be a nonzero value there. Well, after looking into it,
# I saw that in python, the 'True' is equivlanet to 1 and 'False' is equivalent to 0, so when you use mathematical operations 
# like +,-,* or / with True or False, it will be treated as 1 or 0. Additionaly, looking at the mapping function, we are essentially
# performing a non explicit type cast of each element of the list to a Boolean. 
# With the confusions out of the way, lets go element by element
# The first element is 'True' because it remains 'True' when mapped or casted to the same type.
# The second element is '2' because 'True' is equivalent to 1 and when adding to an explicit 1, we get 2. When mapped to a boolean, we get 'True' because in Python, any non-zero value is considered True when passed to the bool() function. This includes positive and negative integers, as well as non-zero floating-point numbers. 
# The third elment is tricky, in python, True is equivalent to '1' and 'False' is equivalent to 0. Therefore, 'True - 1' is equal to '1 - 1' which is equal to 0. When mapped to boolean, it is 'False' as 0 is considered False.
# The fourth element is 1, since we are essentially doing 1*1 getting us a true value due to the stated reasons above
# The fifth element is 0 since 0*1=0, and that is equivlanet to false when mapped to a boolean
# The sixth and seventh elements are non-zero values so they are both true
# The eigth element is False since false remains false when mapped to a boolean 
# The ninth and 10th elements are turned to true, because we make them nonzero by adding 1 and -1
# For 11-14, we dont changed our value when multiplying a scalar to false value leaving us with false when mapped

#1
Equivalent?: True

#2
0.5 casted to int:  0
0.5 casted type:  <class 'int'>
0.99 casted to int:  0
0.99 casted type:  <class 'int'>

#3
100.5 casted to int:  100
100.5 casted type:  <class 'int'>
100.99 casted to int:  100
100.99 casted type:  <class 'int'>

#4
casted 1.05:  1

#5
mylist: [True, 2, 0, 1, 0, 3.1425, -3.1425, False, 1, -1, 0, 0, 0.0, -0.0]
bool(mylist): [True, True, False, True, False, True, True, False, True, True, False, False, False, False]


## Exercise 2: "Type-casting" Numpy arrays

Type casting Numpy arrays is a little different than it is for built in types.  Here, we can make use of the following function, [**astype()**](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html).

### For this exercise please do the following in the cell below.
1. Run the code below for this problem below to create an array of random 64bit floats. Then type cast the array to float32 types, and then subtract the two arrays and print the results. Discuss the result of the subtraction.
1. Now, using the same 64bit array, type cast all the values to int32 values. How were the numbers rounded? Was this what you expected compared to rouding a Python built in type?
1. Now type cast the float64 array to the nearst int32.
1. Dynamically cast the int32 array created above to into an array with float64 types. Print the array and something that verifies the arrays new numeric type.
1. For the last problem in this section. Type cast the float64 array created in problem 1, to an array of unsigned integers ([Hint](https://numpy.org/doc/stable/reference/arrays.scalars.html#numpy.uint32)). What do you notice with the results compared to problem 2?

In [1]:
import numpy as np

# 1 type cast to float32
print('#1')
# your code
np.random.seed(42) # leave seed alone until all 5 problems and discussion are finished, then feel free to tinker.
isc = np.random.randint(5,high=10)
rx64 = (np.random.rand(10)-0.5)*isc
rx32 = rx64.astype(np.float32)
diff_arr = rx64 - rx32
print('64 bit float array: ', rx64)
print('32 bit float array: ', rx32)
print('Difference: ', diff_arr)
# Looking at the above results, we can see that there is a difference between the more precise 64 bit float values compared 
# to the less pricse 32-bit float values. The difference in precisness is very small and close to zero due to the fact that we 
# lost this amount of precision when type casting from 64 bit to 32 bit representation

# 2 type cast to int32
print('#2')
# your code
rx64_int32 = rx64.astype(np.int64)
print('Type Casted to int32 :', rx64_int32)
# Looking at the results, we can see that the numbers are truncated, by removing the decimal part of the number and leaving behind
# the integer rather than rounding up or down. You can see that when you perform an np.round operation, we are rounding to the 
# nearest integer, while np.int32 type casting truncates the decimal part of the number and keeps the integer part.

# 3 type cast to nearest int32
print('#3')
print('Type cast to nearest int32: ',  np.round(rx64).astype(np.int32))
# your code

# 4 dynamic type cast int32 array above to float64
print('#4')
rx64_float64 = rx64_int32.astype(np.float64)
print('Dynamic type cast int32 array to float 64: ', rx64_float64)
print('Type: ', rx64_float64.dtype)
# your code

# 5 type cast to unsigned-int32
print('#4')
# your code
rx_64_uint64 = rx64.astype(np.uint64)
print('Type Casted to unassigned int64: ', rx_64_uint64)
# Compared to problem 2, I notice that for negative values, I get a number that is very large compared to the output from number 2 which is just truncated to the integer.
# I think this could be due to overflow, where for example, -2.75185088 gets casted to 18446744073709551614, which is the result of the wrapping around the maximum representable value.

#1
64 bit float array:  [ 3.60571445  1.85595153  0.78926787 -2.75185088 -2.75204384 -3.5353311
  2.92940917  0.80892009  1.66458062 -3.83532405]
32 bit float array:  [ 3.6057146  1.8559515  0.7892679 -2.7518508 -2.7520437 -3.535331
  2.9294093  0.8089201  1.6645806 -3.835324 ]
Difference:  [-1.08275724e-07 -1.31314399e-08 -2.40296032e-08 -3.30309424e-08
 -1.13250320e-07 -9.18359229e-08 -9.93187070e-08  8.51552517e-09
  3.87959762e-08  3.36239125e-09]
#2
Type Casted to int32 : [ 3  1  0 -2 -2 -3  2  0  1 -3]
#3
Type cast to nearest int32:  [ 4  2  1 -3 -3 -4  3  1  2 -4]
#4
Dynamic type cast int32 array to float 64:  [ 3.  1.  0. -2. -2. -3.  2.  0.  1. -3.]
Type:  float64
#4
Type Casted to unassigned int64:  [                   3                    1                    0
 18446744073709551614 18446744073709551614 18446744073709551613
                    2                    0                    1
 18446744073709551613]


## Python and Numpy Overflow Errors (integers)

The behavior you just saw in problem 5 above is related to a type of computational error called an Overflow, specifically, a Binary Overflow (not related to a "Stack Overflow" which is a real thing, and not just a website for finding solutions to coding problems). I think the following is a reasonable discription of a [Binary Overflow](https://www.geeksforgeeks.org/overflow-in-arithmetic-addition-in-binary-number-system/).

### Exercise 3

For this exercise, extend the code below to int64 type integers (copy and modify including the comments). Then in the markdown cell for this exercise below, please explain:
1. Why does adding *1* to the numpy varibales cause the values to become negative, while the same operation does not cause the Python variable values to become negative. ([Hint-1](https://docs.python.org/3.3/reference/lexical_analysis.html#numeric-literals), [Hint-2](https://numpy.org/doc/stable/reference/arrays.scalars.html#numpy.int32))
1. Suppose you are writting a specific code, and you know that none of the operations will result in negative values, what other interger type (dtype) could use for 32-bit and 64-bit numpy arrays that would extend the maximum interger value that the arrays could correctly represent? ([Hint](https://programmercave0.github.io/blog/2019/10/19/Bit-Manipulation-in-C-and-C++))
1. Why is a Numpy array '+=' operator being used in Problem 1 instead of just doing something like the code shown below? Try this code your self, and then modify if so that you can understand and explain what happens after the addition of $1$ to the Numpy variable vs. the addition to the Numpy array.
```python
np_x32 = np.array([2**31-1],dtype=np.int32)[0]
#                               look here --^
```
```python
np_x32 += 1 #Single variable += operator
print(f'val(np_x32+1), type(np_x32+1): {np_x32},{type(np_x32)}')
#                         what is not here --^  nor here --^
```


In [29]:
# Python int32 
print('Python int32')
x32 = int(2**31 - 1) #remember order of operations! # max value for int32 using bitwise operation
print(f'val(x32), x32.bit_length: {x32},{x32.bit_length()}') # print bit value and length of int32

x32 += 1 #increment the value of x32 by 1 
print(f'val(x32+1), (x32+1).bit_length: {x32},{x32.bit_length()}') # print the new value after the increment 

print()

# Numpy int32 
print('Numpy int32')
np_x32 = np.array([2**31-1],dtype=np.int32) # create numpy array of maximum value 

print(f'val(np_x32), type(np_x32): {np_x32[0]},{type(np_x32[0])}') # print value and type 

np_x32 += 1
print(f'val(np_x32+1), type(np_x32+1): {np_x32[0]},{type(np_x32[0])}') # print new value and type after increment 

print()

# Python int64 

# Your Code Here

print('Python int64')
x64 = int(2**63 -1) # remember order of operations # max value for int 64 using bitwise operation
print(f'val(x64), x64.bit_length: {x64},{x64.bit_length()}') # doing the above operations but now with float 64

x64 +=1 
print(f'val(x64+1), (x64+1).bit_length: {x64},{x64.bit_length()}')

print()

# Numpy int64 
print('Numpy int64')

#   Your Code Here

np_x64 = np.array([2**63 - 1], dtype=np.int64)
print(f'val(np_x64), type(np_x64): {np_x64[0]},{type(np_x64[0])}') # doing the above operations but now with float 64


np_x64 += 1
print(f'val(np_x64+1), type(np_x64+1): {np_x64[0]},{type(np_x64[0])}')

print()

#Problem 3: Varible Numpy int32

#   Your Code Here

np_x32 = np.array([2**31-1],dtype=np.int32)[0] # seeing what happens when we extract integer from numpy array 
#                               look here --^
np_x32 += 1 #Single variable += operator

print(f'val(np_x32+1), type(np_x32+1): {np_x32},{type(np_x32)}')
#                         what is not here --^  nor here --^

Python int32
val(x32), x32.bit_length: 2147483647,31
val(x32+1), (x32+1).bit_length: 2147483648,32

Numpy int32
val(np_x32), type(np_x32): 2147483647,<class 'numpy.int32'>
val(np_x32+1), type(np_x32+1): -2147483648,<class 'numpy.int32'>

Python int64
val(x64), x64.bit_length: 9223372036854775807,63
val(x64+1), (x64+1).bit_length: 9223372036854775808,64

Numpy int64
val(np_x64), type(np_x64): 9223372036854775807,<class 'numpy.int64'>
val(np_x64+1), type(np_x64+1): -9223372036854775808,<class 'numpy.int64'>

val(np_x32+1), type(np_x32+1): -2147483648,<class 'numpy.int32'>


  np_x32 += 1 #Single variable += operator


<br><br>
### Response to Exercise 3 in this *Markdown* cell
<br><br>

1. It seems that with Python, the int type does not have a specfiic maximum representable value as said in the hint that "There is no limit for the length of integer literals apart from what can be stored in available memory." Thus, just adding 1 to x32 and x64 does not cause them to wrap around to negative values like we see with the numpy case, where in numpy, there is a maximum representable value for the integer data types because numpy is built on C programming language and is limited to a fixed amount of memory. C is very fast and is known for its speed and effciency, so this is a bit of a trade off with the maximum rperesentable value. 
2. Given that one does not need negative values, you could use numpys 'uint32' or 'uint64' for the unsigned representations since these only hold positive values only. This is good, especially if we are working with large positive numbers since we have a larger maximum range. 
3. The main difference between adding 1 to the Numpy variable (as shown with the code below the question) and adding 1 to the Numpy array (as shown in problem 1) is that when you add 1 to the Numpy variable, it changes only the value of the new variable and does not affect the original array, whereas when you add 1 to the Numpy array it changes the value of the element in the array and the original array is affected by the operation. With this in mind, both sets of code give the logical result of overflow, which is -2147483648, which is the expected when using a 32-bit signed integer. The difference is how they handle the overflow. When you use the numpy array and then increment it by 1, numpy internally recognizes the overflow and wraps the value around to the smallest representable number for a 32-bit signed integer. In contrast, when you use the extracted numpy variable and then increment it by 1, since np_x32 is a python int, python does not have a built-in mechanism for handling integer overflow and will raise an error which I was able to observe above in my outputted code.

<br><br>
#### Yo Thomas, I think Batman would beat Superman because he's a total bada** with a ton of cool gadgets and a genius level IQ. He's also got mad combat skills and would be able to come up with a plan to take Supes down.  He would use his vast resources and technology to analyze Superman's weaknesses and come up with a strategy to exploit them. Plus, Batman never gives up, he's like the ultimate ninja warrior detective. Superman may be super strong and all, but Batman's got the brains and brawn to outsmart and overpower him. It would be a crazy fight, but in the end, Batman would come out on top. P.S. They need to remake Batman vs. Superman without the martha plotpoint "insert dead emoji"


[//]: <> (Delete one of the <br> tags from above the heading but leave the rest.)
[//]: <> (Then fill in your responses and number your responses via a numbered, markdown list.)
[//]: <> (Also, please give me your oppinion on who would win in a battle between Superman and Batman.)
[//]: <> (Make the heading of your Opinion just one-size smaller then the heading of this cell.)
[//]: <> (One sentence or even just one word about the battle will suffice)

## Numpy Underflow Errors (floating point)

Now that you have seen Overflow errors, lets look at what Underflow errors are.  Do you have a guess?  The following [Wiki](https://en.wikipedia.org/wiki/Arithmetic_underflow) provides a nice explination of this type of error. There is more information on that page then you probably need to know, but be sure to read the first section. Then work on the exercise below.

### Exercise 4

For this exercise, besure to comment the code, and provide print statments between the problems.  Make the output look nice. For the questions, please fill in your responses in the markdown section that follows.

* Part A
    0. Print the smallest possible numpy.float32 value. (This has been done for you.) [Extra-Info](https://en.wikipedia.org/wiki/Subnormal_number)
    1. print the result of mf$32^2$. Do you believe that the value in A0 really is the smallest numpy.float32? Provide evidence that supports your answer (Hint: this is possible with one line of code.)
    2. Now creat a numpy array, np_mf32, with a size one and the value of mf32 in it. Print both the array and the numpy.dtype of the array. (no discussion needed)
    3. Now square the the np_mf32 array.  Does this confirm your thoughts about A1?
    4. Create a list or numpy array of floats for $x$ such that $ x \in \left[0.5,0.9\right]$ by increments of 0.1 (there shold be 5 elements). No loop over the values of $x$ and multiply the np_mf32 array by x. Comment on which value or values cause an Underflow error.
* Part B: do all of the above in Part A with a numpy.float64 dtype
    5. And, answer this question related to A1 vs B1. Why were the results differen when we squared mf32 in A1 vs squaring mf64 in B1? (Hint: what is the dtype of both values after squaring? It takes just two lines of code to see what was different between the two.)


In [2]:
import sys
 
 
print("User Current Version:-", sys.version)

User Current Version:- 3.8.16 (default, Jan 17 2023, 22:25:28) [MSC v.1916 64 bit (AMD64)]


In [55]:
#Part A

#A0
print('Part A')
mf32 = np.finfo(np.float32).smallest_subnormal # Smallest subnormal value 
print(f'val(mf32): {mf32}')
print()

#A1 square mf32
print(f'mf32 squared without converting to float32: {mf32**2}')
next_smaller = np.float32(np.nextafter(mf32, -np.inf)) # check for smaller number in -inf direction and make sure its float32
if next_smaller == mf32: # if the values are the same, basically next after couldnt find a smaller value
    print("There is no number smaller than mf32 that is a valid np.float32.")
else:
    print("There is a number smaller than mf32 that is a valid np.float32.")
print()
#A2 mf32 in a numpy array (look at the Numpy int32 example in the previous exercise)
np_mf32 = np.array([mf32], dtype=np.float32)
print(np_mf32)
print(np_mf32.dtype)
print()

#A3 squre the np_mf32 array 
print('Square result after converting to float32',np_mf32**2) # squaring the subnormal of float 32 precision 
print()

#A4 loop over x in [0.5:091:0.1] show x*np_mf32
x = np.arange(0.5,1,0.1) # looping through 0.5 to 0.9 
for i in x:
    np_mf32_x = np_mf32 * i # multiplying the float 32 value created above and multiplying it by the list of values created from the arange 
    print(np_mf32_x[0])


#Part B (same as above but for float64)


#B0 - B4: Your code
print()
print('Part B')
#B0
#This should be the smalles float64 possible, but ...
mf64 = np.finfo(np.float64).smallest_subnormal
print(f'val(mf64): {mf64}')
print()

#B1 square mf64
print(f'mf64 squared: {np.float64(mf64**2)}')
next_smaller = np.float64(np.nextafter(mf64, -np.inf)) # check for smaller number in -inf direction and make sure its float64
if next_smaller == mf64: # if the values are the same, basically next after couldnt find a smaller value
    print("There is no number smaller than mf32 that is a valid np.float32.")
else:
    print("There is a number smaller than mf32 that is a valid np.float32.")
print()
#B2 mf64 in a numpy array
np_mf64 = np.array([mf64], dtype=np.float64)
print(np_mf64)
print(np_mf64.dtype)
print()

#B3 squre the np_mf64 array 
print(np_mf64**2)
print()

#B4 loop over x in [0.5:091:0.1] show x*np_mf64
x = np.arange(0.5,1,0.1)
for i in x:
    np_mf64_x = np_mf64 * i # see comments above, this is for float 64 precision 
    print(np_mf64_x[0])


#B5 The extra qustion: A1 vs B1
print()
print('Extra Question')
print('Type of mf32 before squaring: ', type(mf32))
print(f'Type of mf64 before squaring {type(mf64)}')
print(f'Mf32 squared: ', mf32**2)
print(f'Mf64 squared:', mf64**2)
print('Type of mf32 after squaring: ', type(mf32**2))
print(f'Type of mf64 after squaring {type(mf64**2)}')

Part A
val(mf32): 1.401298464324817e-45

mf32 squared without converting to float32: 1.9636373861190906e-90
There is no number smaller than mf32 that is a valid np.float32.

[1.e-45]
float32

Square result after converting to float32 [0.]

0.0
1e-45
1e-45
1e-45
1e-45

Part B
val(mf64): 5e-324

mf64 squared: 0.0
There is a number smaller than mf32 that is a valid np.float32.

[5.e-324]
float64

[0.]

0.0
5e-324
5e-324
5e-324
5e-324

Extra Question
Type of mf32 before squaring:  <class 'numpy.float32'>
Type of mf64 before squaring <class 'numpy.float64'>
Mf32 squared:  1.9636373861190906e-90
Mf64 squared: 0.0
Type of mf32 after squaring:  <class 'numpy.float64'>
Type of mf64 after squaring <class 'numpy.float64'>


<br><br><br>
### Response to Exercise 4, parts A and B in this *Markdown* cell
<br><br>

#### Part A
1. Yes, I believe this is the smallest np.float32 representation because based on the evidence of my code showing that we could not find a smaller decimal value in the negative infinity direction from the nextafter numpy function. We can see this through equations when we try to represent a 32-bit floating point equation, theres a fixed number of bits, with 1 bit for the sign, 8 bits for the exponent, and 23 bits for the mantissa. Looking at the equation, $value = (-1)^{sign} * 2^{(exponent - bias)} * (1.mantissa)$, we can see that the smallest presentable subnormal float is determined by the minimum value of the mantissa and the lowest possible exponent, which is -126. This can be proved through $value = (-1)^{0} * 2^{(-126 - 127)} * (1.00000000000000000000000) = 2^{-149} * 1 = 2^{-149}$ which is approximately 1.401298464324817e-45.3
3. Yes it does, because now when turning the array into a float32 representation, the smallest number that np.float32 can represent is controlled by the number of bits to represent the number, and if you are trying to obtain a number that is smaller than the smallest representable subnormal float in the 32-bit representration through the squaring operation, the package will not be able to represent something so small through it limited number of bits, and will show 0 as shown in my code result above. 
4. The values that cause the underflow error are values that result in 0, which in this case is for the value of 0.5. This is because when the smallest subnormal value of float 64 is multiplied by a small enough float value, the result is too small to be represented as a float32 and is rounded to 0.0. The other resulting values, 1e-45, are large enough to be represented by the system, and so those are not underflow errors. 

#### Part B

1. Interestingly, based on my obversations, when we square mf64, the result is still a 64-bit floating point number, however, when square mf32, the result is a 64-bit float because the square of a number requires more precision as the resulting number is too large to be represented by 32-bits. This conversion from 32-bits to 64 bits is called a type promotion to increase the size of a variables data type automatically when the value it holds exceeds the range of the data type. It seems that numpy has a type promotion mechanism, and it can convert the result to 64 bits to get the full representation.


[//]: <> (Make this look decent. Use #### sized headings for the Parts, and a numbered markdown list for the responses.)
[//]: <> (Also, I think Batman would win, as long as he isn't taken by surprise.)

##### Round-off/Truncation Errors (floating point)

Before starting this exercise, please read the [Round-off Errors](https://pythonnumericalmethods.berkeley.edu/notebooks/chapter09.03-Roundoff-Errors.html) section in Chapter 9 of *Python Numerical Methods*.

### Exercise 5

For this exercise, were are going to examine [Truncation](http://nifty.stanford.edu/2003/pests/2002/lectures/07.1_FloatingPoint/truncation.htm?CurrentSlide=6) errors, which are related to Round-off errors. An example of when you might encounter truncation errors is when solving integrals (a sequence of summations) on a computer. Given that we mostly work with computers that have finite precision arithematic, we must represent integrals as finite summations.

To demonstrate truncation errors we are going to sum all the integer values from $1$ to $N$, but were are going to represent each integer as a numpy.float32 type. In the cell below, I've writen a function that finds the analyical solution to $\sum\limits_{i=1}^{N}i$, for a given $N$.

Your job (write responses in the markdown cell below):
* First, create a Numpy array of size $N$ containing numpy.float32 values representing the integers from $1-N$, and have the values stored in consective indexing order (i.e. arr[0] = 1.0, arr[1] = 2.0, ..., arr[N-1] = N.0).
* Next, you need to create function that calculates and returns the sum of this array (make it fast, and feel free to use numpy functions). Besure to verify that your sum is the same as the analytical sum. I suggest starting with an array of just 10 elements, so $N=10$.
* Once you have verified that your sumation function is returning the correct results, you have two tasks:
    1. Find the value of $N_{inc}$ at which your summation is no longer corret. Discuss in the markdown cell below how you found this value. 
    2. Write an improved summation that can correctly sum all the values to find the answer of $\sum\limits_{i=1}^{2\cdot N_{inc}}$ (the sum upto $2\cdot N_{inc}$). Discuss your reasoning about how you chose to solve this task.
    3. Special Case: If you are taking this course for 4-units, then write a recursive function recur_sum() that can solve problem 2. All values must remain as numpy.float32 values (no type-casting). Then test your fuction for $4\cdot N_{inc}$ (it should be accurate without any changes, but if it's not, that's ok).



In [56]:
import numpy as np
from decimal import Decimal, getcontext

# a function that returns the analytical sum of the numbers 1 through N
def analytic_sum(N):
    return np.array([0.5*N*(N+1)]).astype(np.dtype(np.float32))[0]

# your summation function that takes numpy array of type numpy.float32
def my_sum(myarr):
    #your code
    temp = np.array([float(i) for i in range(1, len(myarr)+1)], dtype=np.float32) # we loop through all values starting from 1 and since we end at N-1, we need to add 1 to get to N
    return temp

# your improved summation function 
def my_impr_sum(myarr):
    #your code
    temp = np.array(np.arange(1,len(myarr)+1), dtype=np.float32) # we loop through all values starting from 1 and since we end at N-1, we need to add 1 to get to N
    return np.sum(temp)

def create_float_array(N):
    
    return np.array(np.arange(1, N+1), dtype=np.float32)  # create a float array of size n

# function to find the value of N at which summation becomes incorrect
def find_incorrect(myarr,tolerance,my_func, multiplier):
    """
    This funcion checks to see where the analytical and my function differ and returns that summation index at which 
    
    they differ. 
    
    Output: N (Index at first difference)
    """
    N = 0
    iterations = 0 
    while iterations < multiplier*100000:
        random_arr = create_float_array(N)
        diff = abs(analytic_sum(len(random_arr)) - my_func(random_arr))
        if(not np.isclose(analytic_sum(len(random_arr)), my_func(random_arr), rtol=tolerance)):
            print(f'My Summed Value with 32 bits: {my_impr_sum(random_arr)}')
            print(f'Analytical Value with 32 Bits: {analytic_sum(len(random_arr))}')
            print(f'Difference at N={N}: {diff}')
            return N
            break
        N += 1 
        iterations += 1
    print(diff)
#     raise Exception("Your results are too close given the tolerance, you have high precision in your function")
  
# improved summation up to 2*inc
def my_impr_sum_64bit(myarr):
    #your code
    temp = np.array(np.arange(1,len(myarr)+1), dtype=np.float64) # we loop through all values starting from 1 and since we end at N-1, we need to add 1 to get to N
    return np.sum(temp)

def analytic_sum64bit(N):
    return np.array([0.5*N*(N+1)]).astype(np.dtype(np.float64))[0] # change the analytical sum to 64 bits 

def test_improved_summation_function(N):
    arr = create_float_array(2*N)
    calculated_sum = my_impr_sum_64bit(arr)
    analytical_sum_val = analytic_sum64bit(len(arr))
    print('My Calculated Sum After Changing precision to 64 bits to 2*N_inc: ', calculated_sum)
    print('Analytical Sum After Changing precision to 64 bits to 2*N_inc: ', analytical_sum_val)
    print('Difference: ', abs(calculated_sum - analytical_sum_val) )
    assert np.isclose(calculated_sum, analytical_sum_val).all(), f"{calculated_sum} != {analytical_sum_val}"


# Special Case: recursive sum function here

# put your code to test the functions here. Use comments!
print()
print('Ordered Array:', my_sum(np.array([1,4,3,4]))) # testing to see if I am creating an array with a consecutive index order
print()
N = 10 # setting an array size of 10
random_arr = np.random.rand(N) # creating a random array from range 0,1 of size 10
print(f'Analytical Sum at N=10: {analytic_sum(10)}') # printing the analytical sum
print(f'My Function Sum at N=10: {my_impr_sum(random_arr)}') # printing my function with the improved sum
print()
tolerance = 0 # setting me tolerance or error point in which once the difference is more than this value, we will get our N_inc val
N_inc = find_incorrect(random_arr, tolerance, my_impr_sum, multiplier=1) # call our function above to find the N in which the sum is incorrect
print(f'Value of N_inc at which the summation becomes incorrect: {N_inc}') # print the value 
print()
test_improved_summation_function(2*N_inc) # this function will raise an AssertionError if 
print()


Ordered Array: [1. 2. 3. 4.]

Analytical Sum at N=10: 55.0
My Function Sum at N=10: 55.0

My Summed Value with 32 bits: 67169840.0
Analytical Value with 32 Bits: 67169848.0
Difference at N=11590: 8.0
Value of N_inc at which the summation becomes incorrect: 11590

My Calculated Sum After Changing precision to 64 bits to 2*N_inc:  1074647980.0
Analytical Sum After Changing precision to 64 bits to 2*N_inc:  1074647980.0
Difference:  0.0



<br><br><br>
### Responses to Exercise 5
<br><br>

1. I found this value by creating a function titled 'find_incorrect' that checked to see if the analytical function result and my summation function differed at a certain N value. Since we are trying to find a point of difference at a certain iteration that we don't explicitly know, we need to generate a while loop to find that point at which the difference is greater than a certain tolerance level. The difference will be a whole integer so that tolerance input will be 0. Once we find that the difference is greater, the while loop will break due to the provided if statement which is based on np.isclose which can see if the difference between two values hits a certain tolerance value. With that function, I found the difference to be at N=11590 with a difference value of 8. 
2. In order to improve the summation and correctly summ all the values to 2* Ninc, one method I could use is change the dtype to np.float64 from np.float32. The reason this should work is float64 has a higher precision than float32 and can reduce the truncation errors from rounding the last few digits and give us more accurate results. 

[//]: <> (Make this look decent. Use a numbered markdown list for the responses to each task.)