# Tutorial 4: Understanding Binary Representation

## PHYS 2600, Spring 2019

## T4.1 - Binary arithmetic and integers

### Part A

Let's start with some simple binary math.  __Compute the following sum in binary:__

```
  101001
+ 010001
--------
= ??????
```

Put your answer in the Markdown cell below.

__SOLUTION:__ Working entirely in binary, $101001 + 010001 = 111010$.  (This works just like adding in base-10: here 1+1 gives 10, so we carry the 1 to the next column.)

Check your answer: what are 101001 and 010001 in base 10, and then what is their sum?  Use the code cell below to answer, and remember that you can write binary integers with the `0b...` prefix.

In [1]:
a = 0b101001
b = 0b010001
print("a = %d, b = %d" % (a,b))
print("a + b = %d" % (a+b) )
print("Answer from adding in binary: ", 0b111010)

a = 41, b = 17
a + b = 58
Answer from adding in binary:  58


### Part B

What if we want to go the other way, i.e. find the binary representation of a base-10 number?  Python has a built-in function called `bin()` that does exactly that.  However, if we try to repeat our binary addition exercise from above using `bin()`, something goes wrong:

In [None]:
# Example cell - run me
print(bin(41) + bin(17))

__What is happening to give the output above?__  

_(Hint: if you remove the `print()` statement, the Jupyter automatic printing works a little differently, and its output might give you a clue...)_

__SOLUTION:__ The `bin()` function is giving us strings back, and not numbers!  When we use `+` on the result, it concatenates the two strings together.

If you removed `print()`, you saw that the result is printed out with quotes `'...'` around it, indicating that the expression is a string.  It has to be a string, because it doesn't make sense as a number - there's an extra `0b` in the middle of it!

As an aside, it is possible to convert the output of `bin()` back into a number by typecasting with `int()`:

```python
print(int(bin(41), 2))
>> 41
```

This requires the use of the _optional_ second argument to `int()`, which tells it to use base-2 for typecasting.  (Otherwise, it's ambiguous - is `int('10')` 10, or is it 2?)

## T4.2 - Floating point arithmetic

For physics, most practical applications of computing use decimal numbers - integers are more of a curiosity.  Thus, we'll be seeing a lot of _floating point_ representation.  

### Part A

First, a quick crash course in using decimal (floating-point) numbers in Python.  Python supports __scientific notation for entering numbers__, with the notation `X.YYeZZ` or `X.YYEZZ` (there is no difference between using little-`e` or big-`E`.)  

Here are some examples - the `>>` shows the output from a given input statement.

```python
print(1.23e-4)
>> 0.000123
print(97E6)
>> 97000000.0
```

This notation is explicitly defined only for decimal numbers, which is why there's a `.0` at the end of the second example.

__Try it yourself:__ the molar mass of aluminum is about 0.027 kg/mol, and Avogadro's number is about $6.0 \times 10^{23}$.  How many atoms are there in two tons (2,000 kg) of aluminum?  (Use "e" notation to enter these numbers.)







In [2]:
Al_molar_mass = 27e-3  # kg/mol
N_A = 6.0e23
sample_mass = 2e3 # kg

N_atoms = sample_mass / Al_molar_mass * N_A
print(N_atoms)

4.444444444444445e+28


### Part B

Now let's have a deeper look at the implementation.  Here's the sketch from lecture again to remind you how floating point representation works:

<img src="https://physicscourses.colorado.edu/phys2600/phys2600_sp19/img/floating-point.png" />

Once again, we're physicists so we recognize that this is just scientific notation.

When we ask for a float in Python, the default on most systems is to use __64 bits__ (known as "double precision" for being twice the old default of 32 bits) for each number.  This is divided up into __52 bits__ for the significant digits, __11 bits__ for the exponent, and __1 bit__ for the sign.

Let's think about the implications of floating-point representation, starting with the exponent.  The exponent has its own sign, of course, since we want to be able to represent $0.001$ just as well as $1000$.  That leaves 10 bits for the value of the exponent itself, which is an integer.

Given the number of bits available, __what is the largest base-10 number that we can store in 64-bit floating point?__  (Remember that we're working natively in binary, so if the value of the exponent is $E$, the significand is multiplied by $2^E$.)

__SOLUTION:__ Leaving off the sign bit, we have 10 bits, which means the exponent can be at most $2^{10} = 1024$.  Thus, 11 digits for the exponent allows us to store values from roughly $2^{-1024}$ to $2^{1024}$, corresponding to a maximum of about  $1.7 \times 10^{+308}$.

Try to enter a _larger_ number than the maximum allowed - make sure it's a decimal so that Python will use floating-point to store it.  You should get an error message about "overflow", which means the number is too big to store!

In [3]:
# Threshold is about 10^{308}, so if we go slightly larger...
slightly_too_big = 10.**309

## Important note: using scientific notation actually protects you from
## overflow error!  If you try "1e309", for example, you'll get "inf" instead of 
## an error message.

OverflowError: (34, 'Numerical result out of range')

We can also have "underflow", where a number is too _small_ to represent in floating point - some more advanced numerical modules will give you errors or warnings about this too, but by default Python will just replace an underflowed float with zero.

### Part B

Now let's worry about the _significand_, which has 52 bits available.  You can think of the significand as a fixed-precision binary number of the form `1.001001...`.  In fact, the standard convention is that the leading `1` is always there (we're free to change the exponent to accomplish that), which gives us a total of 53 bits of precision - for example, we can represent the number `1.111...11` with 52 ones after the decimal exactly.

If we have 53 bits of precision available for the significand, __what is the approximate relative error__ of a decimal number stored in this format?  (I say approximate because we can store something like 1.5 _exactly_ in binary; but for numbers that we have to truncate after 53 bits, how big is the truncation error expected to be?)

__SOLUTION:__ If we truncate a binary decimal after 53 bits, we're basically setting the rest of its digits to zero; so the largest error we could make would be if they were all 1 instead.  On average, it's more likely that the _next_ digit is 1 and we set it to zero, and the rest of the difference is negligible.

This gives us a truncation error of roughly 

$$
(1/2)^{53} \approx 1.11 \times 10^{-16}
$$

So, the rule of thumb is: 64-bit floating point numbers are completely trustworthy out to 15 digits of precision, beyond that you should start to worry.

To test your results, _run the cell below_ to print out 0.6 to 20 digits of precision.  Since 0.6 isn't an exact decimal in binary, you will see deviations occur due to the truncation to 53 bits of significand: __what decimal place does the difference happen in?__  Does that match your results?

In [None]:
# Format code prints first 20 digits
print('%.20f' % 0.6)

__SOLUTION:__ The difference between the print-out above and 0.600... occurs in the __17th digit__, which is about what we expect given the rough truncation error above.  But if we round to the first 15 digits we get 0.600... exactly, as promised!

### Part C

As we've emphasized, the exact amount of truncation error depends on what number we're representing.  The number $0.6$ is exact in base-10 decimal notation, but infinitely repeating in base-2.

This situation is not symmetrical: _there is no such number_ which can be represented exactly in base-2, but not in base-10!  __Why?__

__SOLUTION:__ We can see a pattern if we just look at the base-10 decimal representations of powers of 1/2:

$$
1/2 = 0.5 \\
1/4 = 0.25 \\
1/8 = 0.125 \\ 
1/16 = 0.0625 \\
...
$$

The ubiquitous 5 at the end is a clue: we can take any fraction $1/2^n$ and rewrite it as

$$
\frac{1}{2^n} = \frac{5}{2^{n-1} \times 10}.
$$

So we have a recursive relation: if $1/2^{n-1}$ is an exact decimal, then we just multiply by 5 and divide by 10 (shift to the right one place) to get a new exact decimal.

Basically, this happened because 2 is contained in our base $10 = 2 \times 5$, but not vice-versa: there's no finite way to write $1/5$ using powers of $1/2$.

