# Week two: Classes and errors


The goal of filling in the requested pieces is twofold: you should be able to run the worksheet and get the requested answer with the given dataset, and you should also be able to pass with different datasets (not given). These will often check unusual inputs, etc., so try to make sure all possible input datasets are accounted for.

To be graded, your notebook must be runnable start to finish. If you can't make an in-notebook test pass, comment it out for to attempt to get partial credit. You should replace the `...` markers with your code. Do not change the names of the pre-defined variables and functions.

In [None]:
# EID is your 6+2 UC Electronic ID
EID = 'sixplus2'
NAME = 'Joe Smith'

# Problem 1: A simple class

Let's design a simple 2D vector class. You have some freedom in how you design the internals; but the external behavior should match the tests below. Be sure that adding two vectors makes a new vector, rather than modifying one of the vectors in place. Also, it's usually a good idea to avoid mentioning the name of the class inside the class - if you are tempted to do so, use `self.__class__` or `type(self)` instead (see the `__repr__` method for an example).

In [None]:
class Vector2D:
    def __init__(self, x, y):
        ...
    
    def __add__(self, other):
        ...
        
    def dot(self, other):
        ...
    
    def __repr__(self):
        return f'{self.__class__.__name__}({self.x}, {self.y})'

### Tests

In [None]:
assert repr(Vector2D(2, 3)) == 'Vector2D(2, 3)'
assert repr(Vector2D(1,2) + Vector2D(3,4)) == 'Vector2D(4, 6)'
assert Vector2D(1,2).dot(Vector2D(3,4)) == 3 + 8

# Problem 2: Uncertainties

Design an uncertainties class. You'll want the usage to look like this:

```python
a = Uncertain(1.0, .01) # 1.00 ± 0.01
b = Uncertain(2.0, .01) # 2.00 ± 0.01

c = a + b # Should give correct uncertainty
c = a * b # Ditto
assert a == Uncertain(1.0, .01) # __eq__ should be defined
```

A simple definition of `__eq__` is fine using normal floating point comparisons. Don't worry about supporting non-uncertain numbers (that is, `a + 3` or `a * 2`).
Make sure that copy-and-pasting the results of `repr(a)` into a python input cell exactly reproduces a; that is,
`eval(repr(a)) == a`. Also define `__str__`, but you can invent any pretty display you'd like. You can use unicode symbols.

(Note: usually, when designing a class, you should start with the usage.)

Don't worry about correlated errors, that is, `a + a` will not be special compared to `a + b` (some [uncertainty libraries](https://pythonhosted.org/uncertainties/user_guide.html) are smart enough to know the difference). Our uncertainty notation is $a=a_0 \pm \delta a$. For addition use $\delta c^2 = \delta a^2 + \delta b^2$ and for multiplication use $\frac{\delta c^2}{c_0^2} = \frac{\delta a^2}{a_0^2} + \frac{\delta b^2}{b_0^2}$.

In [None]:
import math

class Uncertain:
    def __init__(self, value, err):
        self.value = value
        self.err = err
        
    def __add__(self, other):
        ...
        
    def __mul__(self, other):
        ...
        
    def __eq__(self, other):
        ...
        
    def __repr__(self):
        ...
        
    def __str__(self):
        ...

In [None]:
# Tests
a = Uncertain(1.0, .01)
b = Uncertain(2.0, .01)
assert a  == Uncertain(1.0, .01)
c1 = a + b
assert c1.value == 3.0
assert 0.014 < c1.err < 0.015
c2 = a * b
assert c2.value == 2.0
assert 0.020 < c2.err < 0.025
# Note: due to rounding errors, it can be hard to use == on the uncertainty.

In [None]:
print(f"{a} + {b} = {c1}")
print(f"{a} × {b} = {c2}")

# Problem 3: Approximations

Summing is commonly used in science. Even something this simple can provide issues with numerical precision. If we are summing a series,
$$
\sum_0^n x^i,
$$
and we make the simple assumption that the terms $x_i$ are roughly similar in size, we can look at one step of the simple sum using the standard 0 to $n$ approach to summing (called the naive sum):

$$
\sum_0^{j-1} x^i + x^j + \cdots.
$$

Given our previous assumption, the first term of the above equation is much larger than $x^j$ when $j$ is large, and large + small is bad for precision, because we are limit the small item's precision by the larger item's precision. This error happens on all the later parts of the sum, adding up to have a large effect.

A method to improve the precision of the sum without resorting to more digits is to rearrange the order of the sum to always add similar size items.

Let's start by using 32-bit floats to sum a series. You can get into the same problem with 64-bit floats, but it's a much smaller effect / takes much longer to show up, so we'll use 32-bit floats.

<font color="grey">

> We will avoid using numpy itself for the arrays, which would be much better, since
> we have not really covered numpy yet. But the 32-bit float type is too ideal to miss
> for this problem!

In [None]:
import numpy as np

values = [np.float32(.12345678) for _ in range(2**16)]

# Using an underscore indicates you won't use the loop value

# Note: We could also write this as:
# values = [np.float32(.12345678)] * 2**16

Let's prove to ourselves that the 32-bit floats can store our number correctly by looking at one:

In [None]:
values[0]

Let's calculate the true sum by using a multiplication instead. I'm going to convert the result to the 32-bit float type just to keep the total number of displayed significant figures consistent:

In [None]:
real_value = np.float32(.123456789 * len(values))
real_value

Now, let's sum this up. We need to add a "start" value of 32-bit float 0.0 so that the sum remains 32-bit. Otherwise, the first sum in the series would be 64-bit 0 + 32-bit 0.123456789 = 64-bit 0.123456789, which then cause the total to remain 64-bit for the rest of the sum.

In [None]:
simple_sum_value = sum(values, np.float32(0.0))
simple_sum_value

I don't know about you, but I think that's positively awful! Even though 32-bit floats have enough precision to store about 7-8 values, we fail in the 4th place here! Let's see if we can be smarter and do better. If instead we sum up "pairs" of values, then sum pairs of value again, we can do the sum without ever adding significantly different size numbers.

Implement an algorithm that does a better job of summing 32-bit floats. One idea would be a pairwise sum, which could look like this in pseudocode:

```
new_values = values
while the length of new_values is more than 1:
    sum new_values by pairs into new list with 1/2 the length of the old list
    set new_values = the new shorter new_values
    Optional: print the new sum() of pairs
```
```python
# This will "unpack" pairs into 1 value. Will fail if pairs has more or less than one value.
final_value, = new_values 
```

Once you are done, new_values will now have only one value (the final answer). See how close that is. If you print as you go, you should be able to see the simple sum getting closer to the real value. It's okay to use Numpy to do the pair sum if you know how to use it - just remember to convert pairs to an array at the top. Or you can use a list comprehension with zip to do it the body of the while loop in 1 pure Python line. Either way, one idea for summing pairs could be `new_values[::2]` and `new_values[1::2]`, which is every other item, and every other item offset by one.

There are other ways to do a pairwise sum; this is just one possible way to to to it in a small number of lines of Python. Feel free to use any method you like, as long as you implement the algorithm (for example, `np.sum` already does pairwise summation, so you can't directly use that). A discussion and different psuedocode example can be found [here](https://en.wikipedia.org/wiki/Pairwise_summation).

<font color="green">

> Note: you can assume that the `values` list will always have a length $2^n$ where $n$ is an integer to make the algorithm simple for now.

<font color="grey">
    
> Side note: Pair-wise sums are also easier to run in parallel, so when we talk about performance, you might see them again!

In [None]:
def better_sum(values):
    ...
    return final_value

In [None]:
better_sum_value = better_sum(values)

print()
print("Simple sum:", simple_sum_value)
print("Real value:", real_value)
print("Better sum:", better_sum_value)
print("Numpy sum:", np.sum(values))
print("Simple error:", abs(real_value - simple_sum_value))
print("Better error:", abs(real_value - better_sum_value))

I will assume that you can get somewhat similar errors - the exact numerical result I get is not quite the same as Numpy, and I won't expect you get the exact same numerical result either, just get to within a factor of 10 or so of the ideal error.