<img align=right src="images/inmas.png" width=130x />

# Notebook 05 - Data Representation and Arithmetics

A deeper look at data representation and memory sizes in Python

### Prerequisite
Notebook 04


### Boolean arithmetics
Boolean variables record logical True or False

They support logical operations and arithmetics

In [None]:
x = False
y = True
x and y

In [None]:
y = 1 < 5
y = y and (4 < 8)
y

Can you predict the content of list z?

In [None]:
z = [(1 < 0), (2 >= 4), (5 <= 5), (2 > 4)]
z

### Revisiting smart printing 

As we have seen before, we will use the Python construct `string % (values)` to build a string where we substitute data into the string. The values are inserted where the %\*'s are located in the string.
- %d a decimal representation of an integer
- %f a float
- %s a string
- %r a default representation of the object, typically a string

Let's look at a specific example:

In [None]:
print('This string includes a string %s, a decimal %d, a float %f, and a list %r.' % ('yes!', 1, 2.0, [1,2]))

Obviously, `(values)` can include literals or variables

### Booleans
We use the `getsizeof()` function from the *sys* module to determine the size of an object in memory, and use the smart printing we just learned


In [None]:
import sys

x = True
print("x is %r" % x)
print("Type of x is %s" % type(x))
print("x takes %d bytes in memory" % sys.getsizeof(x))

In principle, a Boolean decision variable can occupy 1 bit, but it is represented as an integer (4 bytes) and the other 24 bytes are part of the variable's header which contains metadata such as a hash and a reference counter used by the memory manager (garbage collector)

In [None]:
print(bool.__bases__) # See that boolean is a subclass of integer

### Integers
Typically, a signed integer is represented by 4 bytes, and referred to as an int32. It can take values up to \$-2^{31}\$ to \$+2^{31} - 1\$ or from -2,147,483,648 to 2,147,483,647. One bit is used to store the sign, thus the 31.

Let's look at the size for a small integer:

In [None]:
x = 2
print("x is %r" % x)
print("Type of x is %s" % type(x))
print("x takes %d bytes in memory" % sys.getsizeof(x))

The total size including the header is 28 bytes, as what we found for Boolean

Would the arithmetic overflow if we go beyond an int32's capacity? Can it store a number with 512 bits, i.e., 64 bytes?

In [None]:
x = 2**512
print("x is %r" % x)
print("Type of x is %s" % type(x))
print("x takes %d bytes in memory" % sys.getsizeof(x))

Now the size has increased, allowing to store large arbitrary integers. Therefore, large calculations with integers are safe in Python as it adjusts the memory for integers to fit large numbers!

### Integers size
How does the memory use increase with integer size?

In [None]:
import sys
x = 2
for i in range(10):
    x**=2
    print('Memory("%d") = %d bytes' % (x, sys.getsizeof(x)))

### Arithmetics with integers

In [None]:
a = 155
b = 7
c = a + b
print("%d + %d = %r (%s)" % (a, b, c, type(c)))

In [None]:
c = a - 50*b
print("%d - 50*%d = %r (%s)" % (a, b, c, type(c)))

The *int* type is preserved as long as all variables are integers, and operations are well defined between integers

### Automatic casting

In [None]:
a = 155
b = 7
c = a**b
print("%d ^ %d = %r (%s)" % (a, b, c, type(c)))
c = a/b
print("%d / %d = %r (%s)" % (a, b, c, type(c)))

Python adjusts the *type* of integers as to fit the result of the arithmetic operation!

### Integer floor and modulo divisions

In [None]:
a = 155
b = 7
c = a // b
print("%d // %d = %r (%s)" % (a, b, c, type(c)))
c = a % b
print("%d %% %d = %r (%s)" % (a, b, c, type(c)))

These two operators always leave the type unchanged

What if only `a` or `b` was a float? 

<small>(to find out, add a period at the end of the literal 7 or 155 and run the cell again)</small>

### Arithmetics mixing integers and booleans

In [None]:
z = (1 < 5) - 5
z

In [None]:
x = 15
z = x * (x > 10) + x**2 * (x < 10)
z

What if `x` is 10 instead of 15 in the last example?

### Floating point numbers (float)
Let's have a look of how floats are stored in memory

In [None]:
x = 183.0
print("x is %r" % x)
print("Type of x is %s" % type(x))
print("x takes %d bytes in memory" % sys.getsizeof(x))

Each float takes 24 bytes in memory. What are the limits? This can be found from the following command:

In [None]:
sys.float_info

### Maximum float
Exceeding the capacity of a floating point number results in an OverflowError exception being raised:

In [None]:
x = 2.0
for i in range(10):
    x **= 2
    print('Memory("%e") = %d bytes' % (x, sys.getsizeof(x)))

### Espilon
Roughly speaking, epsilon is the largest value such that

$$
1 + \epsilon \ne 1
$$

Per sys.float_info, epsilon=2.220446049250313e-16.

In [None]:
epsilon = 2.220446049250313e-16
1. + 0.5 * epsilon == 1.

Using epsilon is very important when verifying the equality of floating point numbers

### Three special values: inf, -inf and NaN
What if we exceed the limits listed in `sys.float_info`?

In [None]:
sys.float_info

Set x to a number sligthly larger than max:

In [None]:
x = 1.79769314e+308
print("x is %r" % x)
print("Type of x is %s" % type(x))
print("x takes %d bytes in memory" % sys.getsizeof(x))

### Not a Number (NaN)
In some cases, NaN is returned as a value if the computation do not make sense. One example is the following division:

$$
x = \infty/\infty
$$

The value NaN is also used by modules (such as pandas) when entries in data tables are missing

In [None]:
import math as m
x = m.inf/m.inf
print('The value of x is', x, 'and the function math.isnan() returns', m.isnan(x))

### Playing with infinity

In [None]:
import math as m
a = 12
b = m.inf
print('a/infinity = ', a/b)
print('-a/infinity = ', -a/b)
print('a + infinity = ', a + b)
print('a - infinity = ', a - b)
print('a * infinity = ', a * b)
print('infinity - infinity = ', b - b)
print('infinity/infinity = ', b/b)


### Expressing log(0)
The behavior of packages and libraries can vary from one to another

- Run the cell below once with the math module and **then again** with importing the numpy module

In [None]:
import sys
import math as m
# import numpy as m
x = m.log(0)                         # Implementations may vary in different packages/libraries!
print("x is %r" % x)
print("Type of x is %s" % type(x))
print("x takes %d bytes in memory" % sys.getsizeof(x))

### Complex numbers

Representation for real and imaginary numbers. The literal `f + gj`, where *f* and *g* are floats or integers indicates a complex number.

In [None]:
x = 1j + 5
print("x is %r" % x)
print("Type of x is %s" % type(x))
print("x takes %d bytes in memory" % sys.getsizeof(x))

### Euler formula
Let's verify Euler formula using the complex math library `cmath`:
$$
e^{i \pi} + 1 = 0
$$

In [None]:
from cmath import exp, pi
x=1j
x = exp(x*pi)+1
print("x is %r" % x)
print("|x| is %1.20f" % abs(x))
print("Type of x is %s" % type(x))
print("x takes %d bytes in memory" % sys.getsizeof(x))

### Strings
How are string represented?

In [None]:
s = 'Hello world'
print("s is %r" % s)
print("Type of s is %s" % type(s))
print("Length of \"%s\" is %d" %(s,len(s)))
print("s takes %d bytes in memory" % sys.getsizeof(s))

### Size of string objects
How does the memory use increase with string size?

In [None]:
s = ''
for i in range(10):
    s += 'a'
    print('Memory("%s", %d symbols) = %d bytes' % (s, len(s), sys.getsizeof(s)))

### Each variable has an ID

Python implements complex memory management system to avoid unnecessary memory allocation. The id() function returns unique int for a variable (reference). Here is an example:

In [None]:
x = 10
print("Initial id(x) is %s" % id(x))

y = x
print("        id(y) is %s" % id(y))

y += x
print("    Now id(y) is %s" % id(y))

### Composite variable types

- **List** Collection of variables of any types, can be sliced like strings  
- **Tuple** Same as list but immutable (can not be edited)
- **Dictionary** Pairs of keys and values  
- **Set** Unique elements of a collection (also has immutable counterpart)
- **Range** Sequence of integers, useful for loops in the code!

### Lists and tuples

In [None]:
x = [True, 5, 5.2, 'string',]  # trailing comma is ignored
y = (True, 5, 5.2, 'string',)

print("x is %r" % x)
print("Type of x is %s" % type(x))
print("x takes %d bytes in memory" % sys.getsizeof(x))
print()
print("y is (%r,%r,%r,%r)" % y)
print("Type of y is %s" % type(y))
print("Y takes %d bytes in memory" % sys.getsizeof(y))

### Difference between lists and tuples
Lists are mutable while tuples are not:

In [None]:
x = [True, 5, 5.2, 'string',]  # last comma is ignored
y = (True, 5, 5.2, 'string',)

x[0] = 567;  # lists are mutable
y[0] = 567;  # tuples are immutable -> ERROR

In a nutshell, immutable objects can never be on the left-had side of an assignment

### Key Points
- Integers have arbitrary size on Python
- Floating point numbers have a fixed representation
- Floats have three special values: +/- inf and NaN
- Integers get upcasted to float when appropriate
- Python supports complex numbers
- Floating point operation behavior can vary depending on the library or module being used


### What's Next?
- Complete the exercises in this associated exercise notebook [X-05-Arithmetics.ipynb](X-05-Arithmetics.ipynb)
- Next notebook is [N-06-CreatingModules.ipynb](N-06-CreatingModules.ipynb)