# Core Python

This notebook is primarily example based.

The examples cover issues most commonly encountered by someone new to Python and interested in Data Science.

## Numeric Equality

The same value, even if it is a different numeric type, compares equal.

In general, objects having different data types do not compare equal.

In [1]:
a = 2.0**3
print(f'type(a) is {type(a)}, a = {a}')

b = 2**3
print(f'type(b) is {type(b)}, b = {b}')

print(f'type(a) == type(b) is {type(a) == type(b)}')
print(f'a == b is {a==b}')

type(a) is <class 'float'>, a = 8.0
type(b) is <class 'int'>, b = 8
type(a) == type(b) is False
a == b is True


## Floating Point Comparisons

Floating point representation is only accurate to within epsilon.

In [2]:
import sys
sys.float_info.epsilon

2.220446049250313e-16

In [3]:
1.1 + 2.2

3.3000000000000003

In [4]:
3.3

3.3

In [5]:
1.1 + 2.2 == 3.3

False

In [6]:
import numpy as np
x = 1.1 + 2.2
y = 3.3
print(np.isclose(x, y))
print(np.isclose(x, y+sys.float_info.epsilon))
print(np.isclose(x, y-sys.float_info.epsilon))

True
True
True


In [7]:
# relative tolerance
def rel_tol(x, y, tol):
    """similar to np.isclose(x, y, atol=0, rtol=tol)"""
    if abs(1.0 - x/y) < tol:
        return True
    else:
        return False

In [8]:
# absolute tolerance
def abs_tol(x, y, tol):
    """similar to np.isclose(x, y, atol=tol, rtol=0)"""
    if (abs(x-y) < tol):
        return True
    else:
        return False

In [9]:
x = (1.1+2.2)**10
y = 3.3**10
x == y

False

In [10]:
# usually relative tolerance is most useful
print(f'{x}')
print(f'{y}', '\n')
tols = [1e-14, 1e-15]
for tol in tols:
    print(f'Relative Tolerance: {tol}')
    print(f'x == y {rel_tol(x, y, tol)}')   
    print(f'x == y {np.isclose(x, y, atol=0, rtol=tol)}', '\n')

153157.89852644503
153157.89852644483 

Relative Tolerance: 1e-14
x == y True
x == y True 

Relative Tolerance: 1e-15
x == y False
x == y False 



In [11]:
# however when comparing with zero, absolute tolerance is necessary
print(f'{x}')
print(f'{y}', '\n')
tols = [1e-9, 1e-10]
for tol in tols:
    print(f'Absolute Tolerance: {tol}')
    print(f'x == y {abs_tol(x, y, tol)}')   
    print(f'x == y {np.isclose(x, y, atol=tol, rtol=0)}', '\n')

153157.89852644503
153157.89852644483 

Absolute Tolerance: 1e-09
x == y True
x == y True 

Absolute Tolerance: 1e-10
x == y False
x == y False 



### math.isclose() vs numpy.isclose()

When both of atol and rtol are nonzero, math.isclose() and numpy.isclose() do not produce the same results!

The proper definition, when both atol and rtol are specified, is math.isclose().

See for example: https://apassionatechie.wordpress.com/2018/01/09/isclose-function-in-numpy-is-different-from-math/

### Money Calculation Note

Floating point numbers should not be used for financial applications.  Use Decmial instead.

Although the relative difference between using float and decimal is usually very small, the absolute difference could be a penny or more.  Financial calcuations must be exact, or your code will not be considered correct (to an accountant).

## Python's Copy Semantics

The following is a descriptive overview.  The examples will make this clearer.  Each of the following topics are related.

### 'is' vs '=='

If two variables refer to the same object in memory, then 'a is b' returns True. 

If 'a is b' is True, then by definition 'a == b' is also True.

If two variables refer to different objects in memory, and those objects are allowed to be compared with one another, and those objects have values which are equal, then a == b returns True.

### Mutable vs Immutable

An immutable object is one whose contents cannot be changed.  Examples include:
1. numeric literals (e.g you cannot reassign the value of 2 with 2 = 3)
2. strings
3. tuples
4. namedtuples

A mutable object is one whose contents can be changed.  Examples include:
1. list
2. dictionary
3. set 

If a variable refers to an object in memory, and that object is immutable, it is not possible to change the contents of the memory referred to.

Understanding whether or not two variables refer to the same object in memory, and whether or not the object in memory is mutable or contains references to mutable objects, is key to understanding Python's copy semantics.

In [16]:
# a refers to the immutable object in memory that represents the integer 3
a = 3
print(hex(id(a)))

# add 1 to a
# the contents of object in memory that a refers to cannot change
# instead a new object, representing the integer 4, is created in memory
# and 'a' is rebound to this new object
a = a + 1
print(hex(id(a)))
print(a)

0x55e5f5ec63c0
0x55e5f5ec63e0
4


In [17]:
# another example
a = "Hello"
print(hex(id(a)))

a = a + " World"
print(hex(id(a)))
print(a)

0x7fe684455148
0x7fe6751d6f30
Hello World


### Interpreter Memory Optimization Creates Some Unexpected Results

Small integer values and some number of strings, are likely to be used many times. In order to save memory, the same immutable object might be reused by the interpreter.

In [None]:
# memory optimization uses the same memory location for the integer 4 for both a and b
a = 1+3
b = 2+2

# hex location in memory for a and b is the same!
print(hex(id(a)))
print(hex(id(b)))

# True!
print(a is b)
print(a == b)

In [18]:
# there is no memory optimization, a and b refer to different locations in memory
a = 987654321
b = 987654321

# False!
print(a is b)
print(a == b)

False
True


#### Conclusion: 'is' vs '=='

Never use 'is' if what you want to compare are the values of objects.

'a is b' implies 'a == b'  
'a is not b' implies nothing.  a and b may or may not have equal values as shown above.

In [19]:
a = 987654321
b = a

# now b refers to the same integer object in memory that a refers to
print(hex(id(a)))
print(hex(id(b)))
print(a is b)
print(a == b)

0x7fe675807dd0
0x7fe675807dd0
True
True


In [20]:
# integers are immutable, therefore two variables, even if named the same, 
# must refer to two different objects in memory
a = 987654321
print(hex(id(a)))
a = a+1
print(hex(id(a)))

0x7fe675807a90
0x7fe675807df0


### Awesome Python Visual Aid

To understand memory management of mutable objects types such as lists and sets, I highly recommend the use of: http://www.pythontutor.com/visualize.html#mode=display

Cut and paste the code from the entire cell.  Click on the "visualize execution" button.  And step through it line by line by clicking on "forward".

For learning purposes, this visualization is better than using a debugger and better than anything I can describe in text.

In [21]:
# copy this entire cell to http://www.pythontutor.com/visualize.html#mode=display
# click: visualize execution and step through code

# lists are mutable!
# this means the same memory location can hold different values
a = [1, 2, 3]
b = a
print(a)
print(b)
print(hex(id(a)))
print(hex(id(b)))
print(a is b)

# modify the contents of the list object in-place
a[1] = 99
print(hex(id(a)))
print(hex(id(b)))
print(a is b)

# the contents of a[1] changed, but this is referred to by b, so now b is different
print(b)

# in Python, the convention (which most packages follow) is for an in-place operator to return None
# a notable exeception is pop() which both modifies the list and returns the value it removed
z = a.append(-99)
print(z)
z = a.pop()
print(z)
print(a)
print(b)
print(a is b)

[1, 2, 3]
[1, 2, 3]
0x7fe6751d47c8
0x7fe6751d47c8
True
0x7fe6751d47c8
0x7fe6751d47c8
True
[1, 99, 3]
None
-99
[1, 99, 3]
[1, 99, 3]
True


In [22]:
# copy this entire cell to http://www.pythontutor.com/visualize.html#mode=display
# click: visualize execution and step through code

# [:] is equivalent to a shallow copy
a = [1, 2, 3]
b = a[:]

# although the contents of the a and b are the same, they refer to different objects in memory
print(a)
print(b)
print(a is b)
print(a == b)

# so modifing a has no effect on b
a[1] = 100
print(a)
print(b)

# alternative ways to create shallow copies of a list
a = [1, 2, 3]
b = a.copy()
print(a is b)

from copy import copy
b = copy(a)
print(a is b)

# if the contents are themselves references to mutable objects
# the situation is more complicated
a = [1, 2]
b = [3, 4]
c = [a, b]
d = c.copy()
print(c is d)

print(c)
print(d)

# even though c and d refer to different objects in memory
# c[0] refers to the SAME object in memory as d[0], which is also 'a'
print(hex(id(a)))
print(hex(id(c[0])))
print(hex(id(d[0])))

# therefore changing a in-place will change c and d
a[0] = 99
print(a)
print(c)
print(d)

# likewise changing c[0][0], which refers to the same object as a, will change a and d
c[0][0] = 22
print(a)
print(c)
print(d)

[1, 2, 3]
[1, 2, 3]
False
True
[1, 100, 3]
[1, 2, 3]
False
False
False
[[1, 2], [3, 4]]
[[1, 2], [3, 4]]
0x7fe6751e3a08
0x7fe6751e3a08
0x7fe6751e3a08
[99, 2]
[[99, 2], [3, 4]]
[[99, 2], [3, 4]]
[22, 2]
[[22, 2], [3, 4]]
[[22, 2], [3, 4]]


In [23]:
# copy this entire cell to http://www.pythontutor.com/visualize.html#mode=display
# click: visualize execution and step through code

from copy import deepcopy
# if the contents are themselves references to mutable objects
# the situation is more complicated
a = [1, 2]
b = [3, 4]
c = [a, b]

d = deepcopy(c)

# shallow copy ensures that c is not d
print(c is d)

# deep copy ensures that c[0] is not d[0]
print(c[0] is d[0])

# however c[0] was not a copy of a
print(c[0] is a)

a[0] = 108
print(a)
print(c)
print(d)

# similar but ensure all variables are deep copies
a = [1, 2]
b = [3, 4]
c = [a.copy(), b.copy()]

d = deepcopy(c)
print(c is d)
print(c[0] is d[0])

# this time c[0] was a copy of a
print(c[0] is a)

# d is a deepcopy of c, do d[0] is not c[0]
# however c[0] was a shallow copy of a
print(hex(id(a)))
print(hex(id(c[0])))
print(hex(id(d[0])))

False
False
True
[108, 2]
[[108, 2], [3, 4]]
[[1, 2], [3, 4]]
False
False
False
0x7fe6751e3908
0x7fe6751d05c8
0x7fe6844ba548


In [24]:
# copy this entire cell to http://www.pythontutor.com/visualize.html#mode=display
# click: visualize execution and step through code

# similar to above examples, but with sets instead
s = set([1, 2, 3])
t = s

# in-place modification of s, changes t
s.add(4)
print(t)

# use shallow copy
t = s.copy()
s.add(5)
print(s)
print(t)

{1, 2, 3, 4}
{1, 2, 3, 4, 5}
{1, 2, 3, 4}


In [25]:
# copy this entire cell to http://www.pythontutor.com/visualize.html#mode=display
# click: visualize execution and step through code

# lists are mutable, adding to it does not create a new one
my_list = [1,2,3]
my_list.append((4,5))
ret = my_list.extend((6,7,8,9))
print(ret)
print(my_list)

# Note: a list can be used as a stack with append and pop
ret = my_list.pop()
print(f'{ret} {my_list}')

None
[1, 2, 3, (4, 5), 6, 7, 8, 9]
9 [1, 2, 3, (4, 5), 6, 7, 8]


In [26]:
import collections.abc

def list_flatten(my_iterable, a=None):
    """Flatten a list/tuple
    """
    
    # idiom for a mutable default argument
    if a is None:
        a = []

    for item in my_iterable:
        if isinstance(item, collections.abc.Iterable):
            list_flatten(item, a)
        else:
            a.append(item)
    return a

help(list_flatten)

print(f'{my_list}')
my_flat_list = list_flatten(my_list)
print(f'{my_list} -> {my_flat_list}')

Help on function list_flatten in module __main__:

list_flatten(my_iterable, a=None)
    Flatten a list/tuple

[1, 2, 3, (4, 5), 6, 7, 8]
[1, 2, 3, (4, 5), 6, 7, 8] -> [1, 2, 3, 4, 5, 6, 7, 8]


## String Formatting

For an excellent discussion see: https://pyformat.info/

In [27]:
class Data(object):

    def __str__(self):
        return '__str__  was called'

    def __repr__(self):
        return '__repr__ was called'
    
# create instance
d = Data()

# display instance in both str and repr forms
print('{0!s}'.format(d))
print('{0!r}'.format(d))

# use Python 3.6+ f strings to do the same
print(f'{d!s}')
print(f'{d!r}')

__str__  was called
__repr__ was called
__str__  was called
__repr__ was called


In [28]:
# string formatting
s = 'test'
print('{:10}'.format(s))
print('{:>10}'.format(s))
print('{:-^10}'.format(s))

test      
      test
---test---


In [29]:
# string formatting with f strings
print(f'{s:10}')
print(f'{s:>10}')
print(f'{s:-^10}')

test      
      test
---test---


In [30]:
pi = 3.141592653589793
print('%7.5f' % pi) # old style, not recommended
print('{:7.5f}'.format(pi)) # recommended
print(f'{pi:7.5f}') # reommended for Python 3.6 and above

3.14159
3.14159
3.14159
