# GEOPHYS 257 (Winter 2023)

## Data Types and Basic Machine Errors

In this lab we will be covering Numeric Data Types for both Python and Numpy, as well as Machine errors. Although the problems in this lab are not from *Python Numerical Methods*, please read [Chapter-9](https://pythonnumericalmethods.berkeley.edu/notebooks/chapter09.00-Representation-of-Numbers.html). Pay particular attention to the *Round-off Errors* secion, but the first two sections are mostly FYI; however, these sections provide an understanding of how most computational systems represent fractional numbers via binary numbers.

[//]: <> (Notebook Author: Thomas Cullison, Stanford University, Jan. 2023)

## External Resources
If you have any question regarding some specific Python functionality you can consult the official [Python documenation](http://docs.python.org/3/).

* [Python Numeric Types](https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-complex)
* [Numpy Data Types](https://numpy.org/doc/stable/user/basics.types.html)

## Python Built-in Numeric Types (not the same as Numpy types)

In Python it is not necessary to declare variables. Once a value is assigned to a variable the variable will have the type of that value. That behaviour is known as **dynamic typing**. A variable can also change its type at any point of the program execution. That has its advantages but there are also some pitfalls.

1. Int
1. Float
1. Complex (really the 3rd type is *imaginary* and the complex number is like a tuple)
1. Type Casting

## Exercise 0

#### First read [Chapter 9](https://pythonnumericalmethods.berkeley.edu/notebooks/chapter09.00-Representation-of-Numbers.html) in *Python Numerical Methods*.

Then begin this lab by first reading the code below; then running it, and, finally, by examing the results. Continue to the next exercise afterwards.

In [9]:
# 1
x=2
print('x =',x)
print('x type:',type(x))
print()

# 2 manual casting int to float
cx=float(x)
print('cx =',cx)
print('cx type:',type(cx))
print()

# 3
y=1.2
print('y =',y)
print('y type:',type(y))
print()

# 4 manual casting float to int
cy=int(y)
print('cy =',cy) # !!! notice the rounding !!!
print('cy type:',type(cy))
print()

# 3
z=1 + 2j
print('z =',z)
print('z type:',type(z))
print()

# 4 dynamic casting of y to complex
print('z*y =',z*y)
print('z*y type:',type(z*y))
print()

x = 2
x type: <class 'int'>

cx = 2.0
cx type: <class 'float'>

y = 1.2
y type: <class 'float'>

cy = 1
cy type: <class 'int'>

z = (1+2j)
z type: <class 'complex'>

z*y = (1.2+2.4j)
z*y type: <class 'complex'>



## Exercise 1: Dynamic Type Casting

You might have noticed that $z \cdot y$ above returned a complex number. This is related to something called *type casting*. The value stored in $y$ was actually casted (think transformed) into a complex number and then the multiplication was computed. Thus, the two lines below will yield equivalent answers.

```python 
z*y
z*(y + 0.j)
```

Note, the dot in '0.j' is needed to tell Python that I'm adding a floating point imaginary number. This dot is only neccessary when using '0' as the imaginary part, or when you wish to explicitly define a values as being a float type.

To confirm that the above two lines are equivelant, try the below line for yourself. 

```python 
z*y == z*(y + 0.j)
```

It will return a value of 'True'. The '==' is a comparison operator, which we will discuss later.

### For this exercise please do the following in the cell below.
1. Run the comparison code above and print the result. If you don't believe the answer, change '0.j' to '1j' and run the code again.
1. Type-cast 0.5 and 0.99 to integers and print the results
1. Similar to above, but now type-cast 100.5 and 100.99 to integers
1. Set $x$ equal to a floating point number with a non-zero value left of the decimal point. Then finish the line below such that $x$ is rounded to the nearest integer no mater what value is to the right of the decimal point. Print that result.
1. Store the results of the following statements into a list (*hint: use .append()*). After that, run the code for section 5 below which maps each element in the list to it's corresponding boolean value. Using comments (i.e. #comment), discuss the results. 
```python 
True
True + 1
True - 1
True*1
True*0
True*3.1425
True*-3.1425
False
False + 1
False - 1
False*1
False*0
False*3.1425
False*-3.1425
```

In [10]:
# 1. comparison
print('#1')
is_same = z*y == z*(y + 0.j)
print(f'Equivalent?: {is_same}')

# 2. type-cast
print('\n#2')
m = 0.5
n = 0.99
cm = int(m)
cn = int(n)
print(cm,cn)

# 3. type-cast
print('\n#3')
m = 100.5
n = 100.99
cm = int(m)
cn = int(n)
print(cm,cn)

# 4. round to nearist integer (you need to add something before casting)
print('\n#4')
x = 1.5
cx = int(round(x))
print(cx)

# 5. 'Boolean' is a number
print('\n#5')
## name your list: mylist
mylist = []
# your code
mylist.append(True)
mylist.append(True + 1)
mylist.append(True - 1)
mylist.append(True*1)
mylist.append(True*0)
mylist.append(True*3.1425)
mylist.append(True*-3.1425)
mylist.append(False)
mylist.append(False + 1)
mylist.append(False - 1)
mylist.append(False*1)
mylist.append(False*0)
mylist.append(False*3.1425)
mylist.append(False*-3.1425)
print(f'mylist: {mylist}')
print(f'bool(mylist): {list(map(bool,mylist))}')

#1
Equivalent?: True

#2
0 0

#3
100 100

#4
2

#5
mylist: [True, 2, 0, 1, 0, 3.1425, -3.1425, False, 1, -1, 0, 0, 0.0, -0.0]
bool(mylist): [True, True, False, True, False, True, True, False, True, True, False, False, False, False]


## Exercise 2: "Type-casting" Numpy arrays

Type casting Numpy arrays is a little different than it is for built in types.  Here, we can make use of the following function, [**astype()**](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html).

### For this exercise please do the following in the cell below.
1. Run the code below for this problem below to create an array of random 64bit floats. Then type cast the array to float32 types, and then subtract the two arrays and print the results. Discuss the result of the subtraction.
1. Now, using the same 64bit array, type cast all the values to int32 values. How were the numbers rounded? Was this what you expected compared to rouding a Python built in type?
1. Now type cast the float64 array to the nearst int32.
1. Dynamically cast the int32 array created above to into an array with float64 types. Print the array and something that verifies the arrays new numeric type.
1. For the last problem in this section. Type cast the float64 array created in problem 1, to an array of unsigned integers ([Hint](https://numpy.org/doc/stable/reference/arrays.scalars.html#numpy.uint32)). What do you notice with the results compared to problem 2?

In [11]:
import numpy as np

# 1 type cast to float32
print('#1')
np.random.seed(42) # leave seed alone until all 5 problems and discussion are finished, then feel free to tinker.
isc = np.random.randint(5,high=10)
rx64 = (np.random.rand(10)-0.5)*isc
# your code
rx32 = np.float32(rx64)
diff = rx64-rx32
print(diff)
# The differences are all smaller than the 32 bit precision can resolve

# 2 type cast to int32
print('#2')
# your code
diff32 = np.float32(diff)
print(diff32)
# The numbers were rounded to the closest precision numbers in 32 bit representation

# 3 type cast to nearest int32
print('#3')
# your code
diffint = np.int32(diff)
print(diffint)

# 4 dynamic type cast int32 array above to float64
print('#4')
# your code
diff64 = np.float64(diffint)
print(diff64)
print(diff64.dtype)

# 5 type cast to unsigned-int32
print('#5')
# your code
mylist2 = np.uint32(mylist)
print(mylist2)
# The negative number values are completely lost

#1
[-1.08275724e-07 -1.31314399e-08 -2.40296032e-08 -3.30309424e-08
 -1.13250320e-07 -9.18359229e-08 -9.93187070e-08  8.51552517e-09
  3.87959762e-08  3.36239125e-09]
#2
[-1.0827573e-07 -1.3131440e-08 -2.4029603e-08 -3.3030943e-08
 -1.1325032e-07 -9.1835922e-08 -9.9318704e-08  8.5155252e-09
  3.8795974e-08  3.3623913e-09]
#3
[0 0 0 0 0 0 0 0 0 0]
#4
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
float64
#5
[         1          2          0          1          0          3
 4294967293          0          1 4294967295          0          0
          0          0]


## Python and Numpy Overflow Errors (integers)

The behavior you just saw in problem 5 above is related to a type of computational error called an Overflow, specifically, a Binary Overflow (not related to a "Stack Overflow" which is a real thing, and not just a website for finding solutions to coding problems). I think the following is a reasonable discription of a [Binary Overflow](https://www.geeksforgeeks.org/overflow-in-arithmetic-addition-in-binary-number-system/).

### Exercise 3

For this exercise, extend the code below to int64 type integers (copy and modify including the comments). Then in the markdown cell for this exercise below, please explain:
1. Why does adding *1* to the numpy varibales cause the values to become negative, while the same operation does not cause the Python variable values to become negative. ([Hint-1](https://docs.python.org/3.3/reference/lexical_analysis.html#numeric-literals), [Hint-2](https://numpy.org/doc/stable/reference/arrays.scalars.html#numpy.int32))
1. Suppose you are writting a specific code, and you know that none of the operations will result in negative values, what other interger type (dtype) could use for 32-bit and 64-bit numpy arrays that would extend the maximum interger value that the arrays could correctly represent? ([Hint](https://programmercave0.github.io/blog/2019/10/19/Bit-Manipulation-in-C-and-C++))
1. Why is a Numpy array '+=' operator being used in Problem 1 instead of just doing something like the code shown below? Try this code your self, and then modify if so that you can understand and explain what happens after the addition of $1$ to the Numpy variable vs. the addition to the Numpy array.
```python
np_x32 = np.array([2**31-1],dtype=np.int32)[0]
#                               look here --^
```
```python
np_x32 += 1 #Single variable += operator
print(f'val(np_x32+1), type(np_x32+1): {np_x32},{type(np_x32)}')
#                         what is not here --^  nor here --^
```


In [13]:
# Python int32 
print('Python int32')
x32 = int(2**31 - 1) #remember order of operations!
print(f'val(x32), x32.bit_length: {x32},{x32.bit_length()}')

x32 += 1
print(f'val(x32+1), (x32+1).bit_length: {x32},{x32.bit_length()}')

print()

# Numpy int32 
print('Numpy int32')
np_x32 = np.array([2**31-1],dtype=np.int32)
print(f'val(np_x32), type(np_x32): {np_x32[0]},{type(np_x32[0])}')

np_x32 += 1
print(f'val(np_x32+1), type(np_x32+1): {np_x32[0]},{type(np_x32[0])}')

# Python int64 
print('Python int64')

#   Your Code Here
x64 = (2**63 - 1)
print(f'val(x64), x64.bit_length: {x64},{x64.bit_length()}')

x64 += 1
print(f'val(x64+1), (x64+1).bit_length: {x64},{x64.bit_length()}')

# Numpy int64 
print('Numpy int64')

#   Your Code Here
np_x64 = np.array([2**63-1],dtype=np.int64)
print(f'val(np_x64), type(np_x64): {np_x64[0]},{type(np_x64[0])}')

np_x64 += 1
print(f'val(np_x64+1), type(np_x64+1): {np_x64[0]},{type(np_x64[0])}')



#Problem 3: Varible Numpy int32

#   Your Code Here
np_x32 = np.array([2**31-1],dtype=np.int32)
#                               look here --^
np_x32 += 1 
print(f'val(np_x32+1), type(np_x32+1): {np_x32[0]},{type(np_x32)[0]}')
#                         what is not here --^  nor here --^


Python int32
val(x32), x32.bit_length: 2147483647,31
val(x32+1), (x32+1).bit_length: 2147483648,32

Numpy int32
val(np_x32), type(np_x32): 2147483647,<class 'numpy.int32'>
val(np_x32+1), type(np_x32+1): -2147483648,<class 'numpy.int32'>
Python int64
val(x64), x64.bit_length: 9223372036854775807,63
val(x64+1), (x64+1).bit_length: 9223372036854775808,64
Numpy int64
val(np_x64), type(np_x64): 9223372036854775807,<class 'numpy.int64'>
val(np_x64+1), type(np_x64+1): -9223372036854775808,<class 'numpy.int64'>
val(np_x32+1), type(np_x32+1): -2147483648,numpy.ndarray[0]


<br><br><br>
### Response to Exercise 3 in this *Markdown* cell
<br>
1. The numpy variables are signed integers with a specific amount of data assigned to each variable. Due to the way signed integers are represented in binary, adding 1 to the maximum positive value gives the binary value for the maximum negative value. The python variables are stored without a specific amount of data, so when a value is at the maximum size for its amount of bits and is increased, python simply increases the number of bits assigned to the variable.

2. Numpy unsigned integers uint32 and uint64 have twice as many available positive values as the signed integer types.

3. When 1 is added to the single variable, numpy catches the overflow. When it is added to the overall array, the overflow slips by and the number becomes negative.

#### Opinion
Probably Batman, depending on setting

[//]: <> (Delete one of the <br> tags from above the heading but leave the rest.)
[//]: <> (Then fill in your responses and number your responses via a numbered, markdown list.)
[//]: <> (Also, please give me your oppinion on who would win in a battle between Superman and Batman.)
[//]: <> (Make the heading of your Opinion just one-size smaller then the heading of this cell.)
[//]: <> (One sentence or even just one word about the battle will suffice)

## Numpy Underflow Errors (floating point)

Now that you have seen Overflow errors, lets look at what Underflow errors are.  Do you have a guess?  The following [Wiki](https://en.wikipedia.org/wiki/Arithmetic_underflow) provides a nice explination of this type of error. There is more information on that page then you probably need to know, but be sure to read the first section. Then work on the exercise below.

### Exercise 4

For this exercise, besure to comment the code, and provide print statments between the problems.  Make the output look nice. For the questions, please fill in your responses in the markdown section that follows.

* Part A
    0. Print the smallest possible numpy.float32 value. (This has been done for you.) [Extra-Info](https://en.wikipedia.org/wiki/Subnormal_number)
    1. print the result of mf$32^2$. Do you believe that the value in A0 really is the smallest numpy.float32? Provide evidence that supports your answer (Hint: this is possible with one line of code.)
    2. Now creat a numpy array, np_mf32, with a size one and the value of mf32 in it. Print both the array and the numpy.dtype of the array. (no discussion needed)
    3. Now square the the np_mf32 array.  Does this confirm your thoughts about A1?
    4. Create a list or numpy array of floats for $x$ such that $ x \in \left[0.5,0.9\right]$ by increments of 0.1 (there shold be 5 elements). No loop over the values of $x$ and multiply the np_mf32 array by x. Comment on which value or values cause an Underflow error.
* Part B: do all of the above in Part A with a numpy.float64 dtype
    5. And, answer this question related to A1 vs B1. Why were the results differen when we squared mf32 in A1 vs squaring mf64 in B1? (Hint: what is the dtype of both values after squaring? It takes just two lines of code to see what was different between the two.)


In [37]:
#Part A

#A0
#This should be the smalles float32 possible, but ...
mf32 = np.finfo(np.float32).smallest_subnormal
print(f'#A0\nval(mf32): {mf32}')


#A1 square mf32
mf32sq = mf32**2
print(f'#A1\nval(mf32sq): {mf32sq}')
# Try to make a float32 with value smaller than mf32
mf31 = np.float32(1.4e-46)
print(f'val(mf31): {mf31}')

#A2 mf32 in a numpy array (look at the Numpy int32 example in the previous exercise)
np_mf32 = np.array([mf32])
print(f'#A2\narray(np_mf32): {np_mf32}')
print(f'type(np_mf32): {np_mf32.dtype}')

#A3 squre the np_mf32 array 
np_mf32sq = np_mf32**2
print(f'#A3\narray(np_mf32sq): {np_mf32sq}')
print(f'type(np_mf32sq): {np_mf32sq.dtype}')

#A4 loop over x in [0.5:01:0.1] show x*np_mf32
x = np.arange(0.5,1.0,0.1)
for i in range(0,4):
    x[i] *= np_mf32
    
print(f'#A4\narray(x*mf32): {x}')

#Part B (same as above but for float64)


#B0 - B4: Your code

#B0
#This should be the smalles float64 possible, but ...
mf64 = np.finfo(np.float64).smallest_subnormal
print(f'\n#B0\nval(mf64): {mf64}')


#B1 square mf64
mf64sq = mf64**2
print(f'#B1\nval(mf64sq): {mf64sq}')
# Try to make a float64 with value smaller than mf64
mf63 = np.float64(5e-325)
print(f'val(mf63): {mf63}')

#B2 mf64 in a numpy array 
np_mf64 = np.array([mf64])
print(f'#B2\narray(np_mf64): {np_mf64}')
print(f'type(np_mf64): {np_mf64.dtype}')

#B3 squre the np_mf32 array 
np_mf64sq = np_mf64**2
print(f'#B3\narray(np_mf64sq): {np_mf64sq}')
print(f'type(np_mf64sq): {np_mf64sq.dtype}')

#B4 loop over x in [0.5:01:0.1] show x*np_mf64
for i in range(0,4):
    x[i] *= np_mf64
    
print(f'#B4\narray(x*mf64): {x}')

#B5 The extra qustion: A1 vs B1
print(f'\n#5\ntype(mf32sq): {mf32sq.dtype}\ntype(mf64sq): {mf64sq.dtype}')


#A0
val(mf32): 1.401298464324817e-45
#A1
val(mf32sq): 1.9636373861190906e-90
val(mf31): 0.0
#A2
array(np_mf32): [1.e-45]
type(np_mf32): float32
#A3
array(np_mf32sq): [0.]
type(np_mf32sq): float32
#A4
array(x*mf32): [0.00000000e+00 1.40129846e-45 1.40129846e-45 1.40129846e-45
 9.00000000e-01]

#B0
val(mf64): 5e-324
#B1
val(mf64sq): 0.0
val(mf63): 0.0
#B2
array(np_mf64): [5.e-324]
type(np_mf64): float64
#B3
array(np_mf64sq): [0.]
type(np_mf64sq): float64
#B4
array(x*mf64): [0.  0.  0.  0.  0.9]

#5
type(mf32sq): float64
type(mf64sq): float64


<br><br><br>
### Response to Exercise 4, parts A and B in this *Markdown* cell
<br>

#### Part A
1. A0 definitely shows the smallest float32. This is shown by trying to make a slightly smaller float32 than mf32, which results in a value of 0.0.

3. This confirms my thoughts in A1

4. The first and last values cause underflow errors.

#### Part B
1. B0 definitely shows the smallest float64. This is verified in the same way as part A.

3. Once again, this verifies my thoughts in B1.

4. All values cause underflow errors with float64s.

5. There is not a standard 128 bit data type. In A1, mf32sq turned into a float64.

[//]: <> (Make this look decent. Use #### sized headings for the Parts, and a numbered markdown list for the responses.)
[//]: <> (Also, I think Batman would win, as long as he isn't taken by surprise.)

## Round-off/Truncation Errors (floating point)

Before starting this exercise, please read the [Round-off Errors](https://pythonnumericalmethods.berkeley.edu/notebooks/chapter09.03-Roundoff-Errors.html) section in Chapter 9 of *Python Numerical Methods*.

### Exercise 5

For this exercise, were are going to examine [Truncation](http://nifty.stanford.edu/2003/pests/2002/lectures/07.1_FloatingPoint/truncation.htm?CurrentSlide=6) errors, which are related to Round-off errors. An example of when you might encounter truncation errors is when solving integrals (a sequence of summations) on a computer. Given that we mostly work with computers that have finite precision arithematic, we must represent integrals as finite summations.

To demonstrate truncation errors we are going to sum all the integer values from $1$ to $N$, but were are going to represent each integer as a numpy.float32 type. In the cell below, I've writen a function that finds the analyical solution to $\sum\limits_{i=1}^{N}i$, for a given $N$.

Your job (write responses in the markdown cell below):
* First, create a Numpy array of size $N$ containing numpy.float32 values representing the integers from $1-N$, and have the values stored in consective indexing order (i.e. arr[0] = 1.0, arr[1] = 2.0, ..., arr[N-1] = N.0).
* Next, you need to create function that calculates and returns the sum of this array (make it fast, and feel free to use numpy functions). Besure to verify that your sum is the same as the analytical sum. I suggest starting with an array of just 10 elements, so $N=10$.
* Once you have verified that your sumation function is returning the correct results, you have two tasks:
    1. Find the value of $N_{inc}$ at which your summation is no longer corret. Discuss in the markdown cell below how you found this value. 
    2. Write an improved summation that can correctly sum all the values to find the answer of $\sum\limits_{i=1}^{2\cdot N_{inc}}$ (the sum upto $2\cdot N_{inc}$). Discuss your reasoning about how you chose to solve this task.
    3. Special Case: If you are taking this course for 4-units, then write a recursive function recur_sum() that can solve problem 2. All values must remain as numpy.float32 values (no type-casting). Then test your fuction for $4\cdot N_{inc}$ (it should be accurate without any changes, but if it's not, that's ok).



In [31]:
import numpy as np
import math

# a function that returns the analytical sum of the numbers 1 through N
def analytic_sum(N):
    return np.array([0.5*N*(N+1)]).astype(np.dtype(np.float64))[0]

# your summation function that takes numpy array of type numpy.float32
def my_sum(myarr):
    return np.sum(myarr)


# your improved summation function 
def my_impr_sum(myarr):
    return math.fsum(myarr)

# Special Case: recursive sum function here


# put your code to test the functions here. Use comments!

# starting with N = 10 and verifying results
N = np.arange(1.,11,dtype=np.float32)
print(my_sum(N),analytic_sum(N))

# now trying with N = 2**13-1
Ninc = np.arange(1.,2**13,dtype=np.float32)
print(my_sum(Ninc),analytic_sum(Ninc)[int(2**13-2)])

# now trying 2Ninc with improved summation
Ninc2 = np.arange(1.,2**14,dtype=np.float32)
print(my_impr_sum(Ninc2),analytic_sum(Ninc2)[int(2**14-2)])

55.0 [ 1.  3.  6. 10. 15. 21. 28. 36. 45. 55.]
33550336.0 33550336.0
134209536.0 134209536.0


<br><br><br>
### Responses to Exercise 5
<br>

1. I found Ninc by experimenting with different N arrays with lengths that were powers of 2 minus 1. I started with 2^16 and went backwards, quickly finding 2^13.
2. I found the math fsum method in the numpy documentation. This method is slower than np.sum, but creates intermediate sums throughout the summation in order to avoid loss of precision that comes from adding floating point numbers with very different orders of magnitude.

[//]: <> (Make this look decent. Use a numbered markdown list for the responses to each task.)